Later On

A blog written for those whose interests more or less match mine.

What Happens When Computers Learn to Read Books?

leave a comment »

Caleb Garling writes in Pricenomics:

In Kurt Vonnegut’s classic novel Cat’s Cradle, the character Claire Minton has the most fantastic ability; simply by reading the index of the book, she can deduce almost every biographical detail about the author. From scanning a sample of text in the index, she is able to figure out with near certainty that a main character in the book is gay (and therefore unlikely to marry his girlfriend). Claire Minton knows this because she is a professional indexer of books.

And that’s what computers are today — professional indexers of books.

Give a computer a piece of text from the 1950s, and based on the frequency of just fifteen words, the machine will be able to tell you whether the race of the author is white or black. That’s the claim from two researchers at the University of Chicago, Hoyt Long and Richard So, who deploy complicated algorithms to examine huge bodies of text. They feed the machine thousands of scanned novels-worth of data, which it analyzes for patterns in the language — frequency, presence, absence and combinations of words — and then they test big questions about literary style. 

“The machine can always — with greater than a 95 percent accuracy — separate white and black writers,” So says. “That’s how different their language is.”

This is just an example. The group is digging deeper on other questions of race in literature but isn’t ready to share the findings yet. In this case, minority writers represent a tiny fraction of American literature’s canonical text. They hope that by shining a spotlight at unreviewed, unpublished or forgotten authors — now easier to identify with digital tools — or by simply approaching popular texts with different examination techniques, they can shake up conventional views on American literature. Though far from a perfect tool, scholars across the digital humanities are increasingly training big computers on big collections of text to answer and pose new questions about the past.

“We really need to consider rewriting American literary history when we look at things at scale,” So says.

Who Made Whom

A culture’s corpus of celebrated literature functions like its Facebook profile. Mob rule curates what to teach future generations and does so with certain biases. It’s not an entirely nefarious scheme. According to Dr. So, people can only process about 200 books. We can only compare a few at a time. So all analysis is reductive. The novel changed our relationship with complicated concepts like superiority or how we relate to the environment. Yet we needed to describe — and communicate — those huge shifts with mere words. 

In machine learning, algorithms process reams of data on a particular topic or question. This eventually allows a computer to recognize certain patterns, whether that means spotting tumors, cycles in the weather or a quirk of the stock market. Over the last decade this has given rise to the digital humanities, where professors with large corpuses of text — or any data, really — use computers to develop hard metrics for areas that might be previously seen as more abstract.

Ted Underwood at the University of Illinois specializes in 19th century literature. He oncetook on economist Thomas Piketty’s claim that financial descriptions fell from fiction after 1914 due to a devaluation of money after the World Wars — and using machine analysis, seemed to prove the celebrated economist wrong. But Underwood’s work primarily uses computers to understand how genre has evolved over time.

He tests how the machine matches books against thousands of already-scanned books within a genre. Detective stories? Easy. The general cadence of the plot — the crime, the interrogation, the resolution — has stayed fairly consistent since the genre began. . .

Continue reading.

Written by LeisureGuy

12 December 2015 at 2:12 pm

Posted in Books, Software, Technology

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.