Computer scientist Sherif Saad says fake news is a major international problem that is continuing to grow, but help is one the way.
A computer detection model he helped to design can identify a fake news story with up to 98 per cent accuracy.
“Fake news is getting more international attention, especially in the wake of the 2016 U.S. presidential election, when we saw fake news stories shared millions of time on social media,” says Dr. Saad, computer science professor.
“We are teaching the computer to distinguish between truth and fiction by giving it known examples, real and fake, until the algorithm develops sensors so it can work on its own in the future,” he says.
Saad and his research team started by feeding articles that have already been deemed real or fake, into a computer program and wrote an algorithm to create “learning models.” Essentially, they are teaching the computer model how to analyze, interpret and predict which stories are true, and which are false.
These machine learning techniques created an algorithm that, in future, can be fed stories on the learned topic, to successfully pick out fake news from legitimate stories. The team started with a combination of news articles from different years with a broader variety of political topics.
“We collected our data set of fake and real articles and limited the scope to revolve around the 2016 US elections and the articles that discuss topics around it. In total, we picked 2,000 articles: 1,000 fake articles and 1,000 real articles,” he says.
“Our model achieved upwards of 98 per cent accuracy when using this type of data. This is a popular but complicated research area and with our model, we’ve had amazing results, better than other attempts.”
Their model can now accurately identify a fake news story about elections, however, Saad says at this point it cannot distinguish stories on other topics. That would require retraining with a new set of articles.
“At this point we can’t design a general news detector, so we must train it for a specific type of stories,” says Saad. “It’s all about context.
“It is expensive and time-consuming to train the program. It can be updated to follow an election from 2008 or 2016, but we need to make a general model that could switch between any topic and didn’t require learning — that is the next goal.”
Saad co-authored the paper Detection of Online Fake News Using N-Gram Analysis and Machine Learning Techniques, published as part of a conference notes series from the International Conference on Intelligent, Secure and Dependable Systems in Distributed and Cloud Environments.