The Data That Turned the World Upside Down
Hannes Grassegger and Mikael Krogerus report at Motherboard:
Psychologist Michal Kosinski developed a method to analyze people in minute detail based on their Facebook activity. Did a similar tool help propel Donald Trump to victory? Two reporters from Zurich-based Das Magazin (where an earlier version of this story appeared in December in German) went data-gathering.
On November 9 at around 8.30 AM., Michal Kosinski woke up in the Hotel Sunnehus in Zurich. The 34-year-old researcher had come to give a lecture at the Swiss Federal Institute of Technology (ETH) about the dangers of Big Data and the digital revolution. Kosinski gives regular lectures on this topic all over the world. He is a leading expert in psychometrics, a data-driven sub-branch of psychology. When he turned on the TV that morning, he saw that the bombshell had exploded: contrary to forecasts by all leading statisticians, Donald J. Trump had been elected president of the United States.
For a long time, Kosinski watched the Trump victory celebrations and the results coming in from each state. He had a hunch that the outcome of the election might have something to do with his research. Finally, he took a deep breath and turned off the TV.
On the same day, a then little-known British company based in London sent out a press release: “We are thrilled that our revolutionary approach to data-driven communication has played such an integral part in President-elect Trump’s extraordinary win,” Alexander James Ashburner Nix was quoted as saying. Nix is British, 41 years old, and CEO of Cambridge Analytica. He is always immaculately turned out in tailor-made suits and designer glasses, with his wavy blonde hair combed back from his forehead. His company wasn’t just integral to Trump’s online campaign, but to the UK’s Brexit campaign as well.
Of these three players—reflective Kosinski, carefully groomed Nix and grinning Trump—one of them enabled the digital revolution, one of them executed it and one of them benefited from it.
How dangerous is big data?
Anyone who has not spent the last five years living on another planet will be familiar with the term Big Data. Big Data means, in essence, that everything we do, both on and offline, leaves digital traces. Every purchase we make with our cards, every search we type into Google, every movement we make when our mobile phone is in our pocket, every “like” is stored. Especially every “like.” For a long time, it was not entirely clear what use this data could have—except, perhaps, that we might find ads for high blood pressure remedies just after we’ve Googled “reduce blood pressure.”
On November 9, it became clear that maybe much more is possible. The company behind Trump’s online campaign—the same company that had worked for Leave.EU in the very early stages of its “Brexit” campaign—was a Big Data company: Cambridge Analytica.
To understand the outcome of the election—and how political communication might work in the future—we need to begin with a strange incident at Cambridge University in 2014, at Kosinski’s Psychometrics Center.
Psychometrics, sometimes also called psychographics, focuses on measuring psychological traits, such as personality. In the 1980s, two teams of psychologists developed a model that sought to assess human beings based on five personality traits, known as the “Big Five.” These are: openness (how open you are to new experiences?), conscientiousness (how much of a perfectionist are you?), extroversion (how sociable are you?), agreeableness (how considerate and cooperative you are?) and neuroticism (are you easily upset?). Based on these dimensions—they are also known as OCEAN, an acronym for openness, conscientiousness, extroversion, agreeableness, neuroticism—we can make a relatively accurate assessment of the kind of person in front of us. This includes their needs and fears, and how they is likely to behave. The ”Big Five” has become the standard technique of psychometrics. But for a long time, the problem with this approach was data collection, because it involved filling out a complicated, highly personal questionnaire. Then came the Internet. And Facebook. And Kosinski.
Michal Kosinski was a student in Warsaw when his life took a new direction in 2008. He was accepted by Cambridge University to do his PhD at the Psychometrics Centre, one of the oldest institutions of this kind worldwide. Kosinski joined fellow student David Stillwell (now a lecturer at Judge Business School at the University of Cambridge) about a year after Stillwell had launched a little Facebook application in the days when the platform had not yet become the behemoth it is today. Their MyPersonality app enabled users to fill out different psychometric questionnaires, including a handful of psychological questions from the Big Five personality questionnaire (“I panic easily,“ “I contradict others”). Based on the evaluation, users received a “personality profile”—individual Big Five values—and could opt-in to share their Facebook profile data with the researchers.
Kosinski had expected a few dozen college friends to fill in the questionnaire, but before long, hundreds, thousands, then millions of people had revealed their innermost convictions. Suddenly, the two doctoral candidates owned the largest dataset combining psychometric scores with Facebook profiles ever to be collected.
The approach that Kosinski and his colleagues developed over the next few years was actually quite simple. First, they provided test subjects with a questionnaire in the form of an online quiz. From their responses, the psychologists calculated the personal Big Five values of respondents. Kosinski’s team then compared the results with all sorts of other online data from the subjects: what they “liked,” shared or posted on Facebook, or what gender, age, place of residence they specified, for example. This enabled the researchers to connect the dots and make correlations.
Remarkably reliable deductions could be drawn from simple online actions. For example, men who “liked” the cosmetics brand MAC were slightly more likely to be gay; one of the best indicators for heterosexuality was “liking” Wu-Tang Clan. Followers of Lady Gaga were most probably extroverts, while those who “liked” philosophy tended to be introverts. While each piece of such information is too weak to produce a reliable prediction, when tens, hundreds, or thousands of individual data points are combined, the resulting predictions become really accurate.
Kosinski and his team tirelessly refined their models. In 2012, Kosinski proved that on the basis of an average of 68 Facebook “likes” by a user, it was possible to predict their skin color (with 95 percent accuracy), their sexual orientation (88 percent accuracy), and their affiliation to the Democratic or Republican party (85 percent). But it didn’t stop there. Intelligence, religious affiliation, as well as alcohol, cigarette and drug use, could all be determined. From the data it was even possible to deduce whether deduce whether someone’s parents were divorced.
The strength of their modeling was illustrated by how well it could predict a subject’s answers. Kosinski continued to work on the models incessantly: before long, he was able to evaluate a person better than the average work colleague, merely on the basis of ten Facebook “likes.” Seventy “likes” were enough to outdo what a person’s friends knew, 150 what their parents knew, and 300 “likes” what their partner knew. More “likes” could even surpass what a person thought they knew about themselves. On the day that Kosinski published these findings, he received two phone calls. The threat of a lawsuit and a job offer. Both from Facebook. . .