How Metadata Can Reveal What Your Job Is
Jordan Pearson reports in Motherboard:
In November, a federal court ruling revealed that CSIS, Canada’s CIA analog, operated a secret metadata collection program for a decade; metadata being all of the information—time stamps, locations, names and numbers—wrapped around our digital communications.
The police line is often that you shouldn’t worry because they’re “just” collecting metadata. But as privacy advocates and technologists have noted over and over, metadata can reveal a lot of very personal information. Now, researchers from Norwegian telecom Telenor, the MIT Media Lab, and big data nonprofit Flowminderhave concluded that metadata from your cell phone can reveal if you’re unemployed, or even what you do for a living.
In a paper posted to the arXiv preprint server, which hasn’t been peer reviewed, the researchers describe how they were able to use metadata—again, not the content of communications—from a telecom in a South Asian country (the researchers say they can’t divulge the company or nation), to guess an individual’s occupation. The system ended up being 67.5 percent accurate overall, with the “clerk” profession peaking at 73.5 prediction accuracy.
Read More: Is Metadata Anonymous? Of Course Not
The researchers’ goal was to design a system that can determine employment statistics in developing countries without solid data. As Telenor researcher Pål Sundsøy told me in an email, it‘s possible to feed anyone’s formatted cell phone metadata into the system and have it predict whether you fit into one of the 18 profession “groups” they identified—a student, an agriculture worker, a landlord, etc.
It was made possible by deep learning, a type of software that trains itself to look for patterns in large amounts of data.
“As such applications emerge it is important to be transparent around the decision making process—especially as intelligent machines make errors sometimes, too,” Sundsøy wrote. “In the field of social sciences this includes always validating the methodology to actual ground truth data, and use it as a complementary source of insight.” . . .
Continue reading. The accuracy is between 65% and 75%, which is pretty good. Nice chart at the link.