Later On

A blog written for those whose interests more or less match mine.

Wanted: More Data, the Dirtier the Better

leave a comment »

Esther Landhuis reports in Quanta:

To distill a clear message from growing piles of unruly genomics data, researchers often turn to meta-analysis — a tried-and-true statistical procedure for combining data from multiple studies. But the studies that a meta-analysis might mine for answers can diverge endlessly. Some enroll only men, others only children. Some are done in one country, others across a region like Europe. Some focus on milder forms of a disease, others on more advanced cases. Even if statistical methods can compensate for these kinds of variations, studies rarely use the same protocols and instruments to collect the data, or the same software to analyze it. Researchers performing meta-analyses go to untold lengths trying to clean up the hodgepodge of data to control for these confounding factors.

Purvesh Khatri, a computational immunologist at Stanford University, thinks they’re going about it all wrong. His approach to genomic discovery calls for scouring public repositories for data collected at different hospitals on different populations with different methods — the messier the data, the better. “We start with dirty data,” he says. “If a signal sticks around despite the heterogeneity of the samples, you can bet you’ve actually found something.”

This strategy seems too easy, but in Khatri’s hands, it works. Analyzing troves of public data, Khatri and colleagues have uncovered signature genes that could allow clinicians to detect life-threatening infections that cause sepsis, classify infections as bacterial or viral, and tell if someone has a specific disease such as tuberculosis, dengue or malaria. Last year Khatri and two other scientists launched a company to develop a device for measuring these gene signatures at a patient’s bedside. In short, they’re deciphering the host immune response and turning key genes into diagnostics.

Over the past year Khatri discussed his ideas with Quanta Magazine over the phone, by email and from his whiteboard-lined Stanford office. An edited and condensed version of the conversations follows.

What turned you on to biology?

I left India and came to the U.S. in the “fix the Y2K bug” rush with plans to get a master’s in computer science and become a software engineer. Months after arriving at Wayne State University in Detroit I realized that writing software for the rest of my life was going to be really boring. I joined a lab working on neural networks.

But then my adviser switched to bioinformatics and said he’d pay my tuition if I switched with him. I was a poor Indian grad student. I thought, “You’re going to pay my salary? I’ll do whatever you are doing.” That’s how I moved into biology.

You made a splash pretty quickly. How did that happen?

While my adviser was away on sabbatical in 2000-2001, I worked in the lab doing bioinformatics analyses with a postdoc in our collaborator’s lab, a gynecologist studying genes involved in male fertility. Microarrays for running assays on large numbers of genes at once were brand-new. From a recent experiment, he’d gotten a list of some 3,000 genes of interest, and he was trying to figure out what they were doing.

One day I saw him going from one website to another, copying and pasting text into Excel spreadsheets. I said to him, “You know, I can write software for you that will do all of that automatically. Just tell me what you are doing.” So I wrote a script for him — it took me three days — and with the results we wrote a Lancet paper.

We put the software on the web. There was huge interest. They presented it at some conference, and Pfizer wanted to buy it. I thought, wow, this is such low-hanging fruit. I can be a millionaire soon.

What does the software do?

It takes the set of genes you specify and searches annotation databases to tell you what biological processes and molecular pathways those genes are involved in. If you have a list of 100 genes, it could tell you that 15 are involved in immune response, another 15 are involved in angiogenesis and 50 play a role in glucose metabolism. Let’s say you’re studying Type 1 diabetes. You could look at these results and say, “I’m on the right path.”

This was 15 years ago, when I was getting my master’s degree. I developed more tools and expanded the work into a Ph.D. It’s now an open-access, web-based suite of tools called Onto-Tools. Last I checked a few years ago, it had 15,000 users from many countries, analyzing an average of 100 data sets a day.

Although the tools became very popular, they weren’t telling me how the results get used, how they help people. I wanted to see how research progresses from bioinformatics analyses to lab experiments and ultimately to something that could help patients.

How did you make that switch?

When I came to Stanford as a postdoc in 2008, one of my conditions was that somebody with a wet lab — someone running experiments on samples from mice or actual patients, not just analyzing data in silico — would pay half my salary, because I wanted their skin in the game. I wanted to make predictions using methods I’d develop in one lab, and then work with another lab to validate those predictions and tell me what’s clinically important. That’s how I ended up working with Atul Butte, a bioinformatician, and Minnie Sarwal, a renal transplant physician. [Editor’s note: Butte and Sarwal have both since moved from Stanford to the University of California, San Francisco.]

What shifted your attention to immunology?

Reading papers to learn the basic biology of organ transplant rejection, I had an “Aha!” moment. I realized that heart transplant surgeons, kidney transplant surgeons and lung transplant surgeons don’t really talk to each other!

No matter which organ I was reading about, I saw a common theme: The B cells and T cells of the graft recipient’s immune system were attacking the transplant. Yet diagnostic criteria for rejection were different — kidney people follow Banff criteria for renal graft rejection, heart-and-lung people follow ISHLT [International Society for Heart and Lung Transplantation] criteria. If the biological mechanism is common, why are there different diagnostic criteria? That didn’t make sense to me as a computer scientist.

I was starting to form a hypothesis that there must be a common mechanism — some common trigger that tells the recipient’s immune cells that something is “not self.” While thinking about this, I came across a fantastic paper titled “The Immunologic Constant of Rejection.” The authors basically laid out my hypothesis. They proposed that while the triggers for organ rejection may differ, they share a common pathway. And they were saying someone should test this.

What did you do at that point?

I started asking my colleagues, “Why don’t we  . . .

Continue reading.

Much more at the link, including a video.

Written by LeisureGuy

12 June 2017 at 10:42 am

Posted in Science

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.