Wise After The Event, Or Why Big Data Failed To Predict President-Elect Trump

All the dazzling technology, the big data and the sophisticated modeling that American newsrooms bring to the fundamentally human endeavor of presidential politics could not save American journalism from yet again being behind the story, behind the rest of the USA

The news media by and large missed what was happening all around it, and it was the story of a lifetime. The numbers weren’t just a poor guide for election night, they were an off-ramp away from what was actually happening.

One computer forecasting system predicted Trump’s victory - the one with the least human input.

Donald Trump’s win surprised many around the world, but none more than the modelers and big league prognosticators who were calling for a likely Clinton victory. That outcome doesn’t mean that data-driven forecasting died on Tuesday 8th November. In fact, the best performance went to an artificial intelligence able to crunch more data than its human rivals. The takeaway?

Forecasters need new ways to talk about the uncertainty of their models and need to expand beyond “polling data.” The Defense Department and the intelligence community, who have also grown fond of machine aided prediction, would do well to heed that lesson.

There’s an important and overlooked distinction between a prediction (which suggests certainty) and a forecast, which acknowledges more than one possible outcome and then weighs multiple outcomes in terms of their relative probability. But humans, and particularly media types that report on polling information, like certainty. So forecasts are cast as predictions.

Nate Silver, the famous prognosticator, forecast Hillary Clinton’s win as likely, giving it an about 71 percent probability. Importantly, that does not mean he was technically wrong. It just means that the less likely outcome occurred.

The site Election Analytics, maintained by computer scientist Sheldon Jacobson of the University of Illinois, outputs a number of possible outcomes for elections. In the Clinton-Trump matchup, the site had had 21 possible scenarios. In only one did Donald Trump win.

“We didn’t report it widely” said Jacobson of the winning scenario. “I woke up on Tuesday and wondered, ‘Should I report the most extreme scenario? Or should I report the average?’ It’s like buying a basket of stocks versus buying an individual stock.” Bottom line, he says, “People expect a level of precision that the data cannot provide.”

Jacobson also does forecasting for TSA, part of the effort to screen for terrorists effectively while still keeping lines moving at airports. He says a key difference between forecasting elections and forecasting for airport security is that the risk of getting an election prediction wrong is small. “If we had 21 scenarios and one came up red” in a national security policy, “We would not implement that security policy. That risk was too great.”

Jacobson’s method, like Silver’s, employs Bayesian statics. That means that both models employ the following formula.

P (A | X) = p (X | A) p (A)
p (X)

P means probability, A is the answer, and X is something that will influence the probability. The more information you provide, the more times you run the formula, the greater your confidence level about A. But Bayesian statistics never tell you exactly what will happen, only what might happen based on available data.

That means your output is only going to be as good as the data you’re putting into the model. The pollsters that failed on Tuesday all relied on conventional polling information. “I don’t think the polling data was the problem,” Jacobson said. “It was the uncertainty in the data because of the undecided factor. In a few polls, it was 11 percent. That’s like trying to drive 80 miles an hour in fog. We just had an accident.”

University of Maryland computer scientist V.S. Subrahmanian creates forecasting methods that the intelligence community uses to predict the activities of terrorist groups. He said that the Clinton-Trump matchup presented three “Black Swan” factors that traditional polling data could not capture.

“The first was a candidate with a background totally different from major recent presidential candidates – one who had held no political office, and who spoke his mind freely, unlike the more guarded speech of seasoned politicians,” Subrahmanian said.

“The second was the unprecedented disclosures by the FBI during the final few days of the election. A third was the influence campaign run by a foreign state through the drip-by-drip disclosure of internal campaign deliberations. I believe that existing predictive models were unable to account for these three types of variables, leading to predictions based on other historical factors, which were eventually proven wrong.”

Kalev Leetaru used big data to accurately discover the location of Osama Bin Laden in Abbottabad, Pakistan, using not polling information but open-source published news articles. Leetaru chalked up Tuesday 8th’s failure to bad judgment about which data was relevant.

“When it comes to the kinds of questions that intelligence personnel actually want forecasting engines to answer, such as ‘Will Brexit happen’ or ‘Will Trump win,’ those are the cases where the current approaches fail miserably. It’s not because the data isn’t there. It is. Is because we use our flawed human judgment to decide how to feed that data into our models and therein project our biases into the model’s outcomes.”

Beyond Conventional Polling: The Future of Forecasts

Today, of course, researchers can use more than just phone data to make forecasts, including broad social phenomena. In 2012, Virginia Tech computer scientist Naren Ramakrishnan was able to correctly predict social protest movements in Mexico and Brazil with nothing but Twitter data, part of a research program sponsored by the Intelligence Community through the Intelligence Advanced Research Projects Activity, or IARPA.

In his own theory about Tuesday 8th November’s outcome, Ramakrishnan also cited Black Swan high-impact, low-probability events. Bottom line, many forecasters did, in fact, predict it. They just predicted that outcome alongside many others. To the average person, that works out the same as not predicting the actual outcome.

“There are a lot of best practices that one could use, the important thing is to rely on multiple sources of data, the other important thing, quantify uncertainty. You have to ask the question, ‘When will I be wrong?’” he says. Forecasters need to make that that self-analysis as public as their forecasts.

Predata, a predictive analytics company established by former members of the intelligence community, also said a Clinton win was more likely than a Trump win. They  crunch hundreds of thousands of data points a day from over a thousand web sites, such as YouTube.

In a note to subscribers of their newsletter, Predata officials explained that three things led them to their conclusion on a Clinton win being more likely based on the complexity of the electoral college system, overweighting polls and “other experts;” and Trump generating a lot of online content and excitement that should have made it into their analysis, but didn’t, for the same reason pollsters and elite liberal newspapers wrote off Trump as a joke until his victory was a reality.

“Trump did not win because he uploaded some ads to his official YouTube channel. Rather, his victory was a function of the support that thrived in the seamier, populist corners of the Internet, the corners, that is, we made an early decision to exclude from our curation of the signal sets.”

In other words, it came down to precisely the sort of human judgement error that Leetaru highlighted.

It’s thus not surprising that the big winner in the polling contest was not a human at all but an artificial intelligence named MogIA. Developed by Indian entrepreneur Sanjiv Rai, it takes in 20 million data points from a variety of open public websites such as YouTube, Google, Twitter, and Facebook to uncover trends in “user engagement.”

Rai said his exclusive arrangement with CNBC prevents him from talking about how the system works or crunches the data points.

“While most algorithms suffer from programmers/developer’s biases, MoglA aims at learning from her environment, developing her own rules at the policy layer and develop expert systems without discarding any data,” Rai told CNBC reporter Arjun Kharpal.

Big data and computer-aided projection is hardly dead. It’s just getting started.

DefenseOne:    NYT:     Was Donald Trump's Surprise Victory Hidden In The Data?:

 

 

« Internet of Things – For Smart and Secure Cities
Virtual Reality – Just Getting Started »

CyberSecurity Jobsite
Check Point

Directory of Suppliers

CYRIN

CYRIN

CYRIN® Cyber Range. Real Tools, Real Attacks, Real Scenarios. See why leading educational institutions and companies in the U.S. have begun to adopt the CYRIN® system.

Jooble

Jooble

Jooble is a job search aggregator operating in 71 countries worldwide. We simplify the job search process by displaying active job ads from major job boards and career sites across the internet.

Directory of Cyber Security Suppliers

Directory of Cyber Security Suppliers

Our Supplier Directory lists 8,000+ specialist cyber security service providers in 128 countries worldwide. IS YOUR ORGANISATION LISTED?

Authentic8

Authentic8

Authentic8 transforms how organizations secure and control the use of the web with Silo, its patented cloud browser.

CSI Consulting Services

CSI Consulting Services

Get Advice From The Experts: * Training * Penetration Testing * Data Governance * GDPR Compliance. Connecting you to the best in the business.

Tech Industry Forum (TIF)

Tech Industry Forum (TIF)

Tech Industry Forum is a not-for-profit, membership driven trade body. We bring together end users and some of the UK’s leading cloud, software, platform, infrastructure, and service providers.

Splunk

Splunk

Splunk provide real-time Security Information & Event Management solutions for Enterprise Networks, Cloud and small-scale IT environments

Bayshore Networks

Bayshore Networks

Bayshore Networks was founded to safely and securely protect Industrial IoT (IIoT) networks, applications, machines and workers from cyber threats.

OpenSphere

OpenSphere

OpenSphere is an IT company providing security consultancy, information system risk management and security management services.

Sepior

Sepior

Our vision is to make Sepior the leading provider of cloud-encryption software in the world.

Cybertech

Cybertech

Cybertech Conference & Exhibition presents commercial problem solving strategies and solutions for the global cyber threat that meet the diverse challenges for a wide range of sectors.

Virtru

Virtru

Virtru's Data Protection platform protects and controls sensitive information regardless of where it's been created, stored or shared.

Acuant

Acuant

Acuant is a leading global provider of identity verification, regulatory compliance (AML/KYC) and digital identity solutions.

CAPSLOCK

CAPSLOCK

CAPSLOCK delivers career-changing cyber training to help adults re-skill. Learn online to become a cyber security professional and pay no tuition until you land a high-paying job.

Arqit Quantum

Arqit Quantum

Arqit's mission is to use transformational quantum encryption technology to keep safe the data of our governments, enterprises and citizens.

Deft

Deft

Deft (formerly ServerCentral Turing Group) is a trusted provider of colocation, cloud, and disaster recovery services.

Liberman Networks

Liberman Networks

Liberman Networks is an IT solutions provider company that provides security, management, monitoring, BDR and cloud solutions.

Magna5

Magna5

Magna5 is a managed IT service provider focusing in network and server monitoring, backup and disaster recovery, cybersecurity, help desk and SD-WAN.

TrustCloud

TrustCloud

TrustCloud is a global company specializing in the orchestration and custody of secure digital transactions including identification, signature, payments, and electronic custody.

NewsGuard Technologies

NewsGuard Technologies

NewsGuard provides transparent tools to counter misinformation for readers, brands, and democracies.

RESTIV Technology

RESTIV Technology

RESTIV Compliance Copilot is your partner in continuous compliance. Real-time monitoring, continuous testing, and transparent evidence—no surprises, just peace of mind.