LIFE
AND OTHER STORIES
Peter Kharchenko

Digitilazing the Expertise:

Yesterday, Today, and Tomorrow

  • Story

    on whether it's possible to digitize the expertise, cell boundaries, and when the quantitative genomic data will transform into superior practical outcomes
  • Story told by

    Peter Kharchenko, Computational Biologist, Associate Professor at Harvard Medical School (USA)
  • Story asked by

    Nikita Lavrenov, Biologist and Science journalist
  • Story recorded

    in August 2021
— What's the correct term for your field of work? Is it computational  biology? Bioinformatics? Genomics? Proteomics?
— We are computational biologists. We don't do experiments, but we collaborate with those who do. We mostly develop statistical computational methods, apply them, and interpret the results. This is  applicable to all levels of biological organization.
We operate at the molecular and cellular levels, examining cell states, gene expression, and genome packaging. With respect to our research topics, we pursue two major lines of work. One is tumors. We examine their microenvironment and heterogeneity in the cancer cells as such. The other is a bit more cheerful: development and coordination of cells during tissue growth. All of this research falls into the “genomics” category.

— Can genomics be called a branch of genetics? What is it that distinguishes this field from the rest?
— Traditional genetics is about genetic inheritance and things of that nature. Genomics came into being when it became possible to acquire a lot of genomic sequencing data, and people realized that we had no tools for interpreting the nucleotide sequences at that scale. To give you an example, scientists were trying to expose mutations linked to diseases, and related studies often identified a candidate mutation, but most of the time we couldn't figure out how that particular mutation functions and what it does exactly. Work commenced on a series of methods to help elucidate the function of each specific genome segment or cluster of segments. It turned out that this could be achieved by deploying certain experimental and data processing technologies. For example, you could record snapshots of genome activity at a given time and infer functions of different genomic loci. This is called functional genomics.

— How is this done from a methodological perspective?
— For the most part, it's done with the aid of sequencing methods: RNA sequencing, enriched chromatin sequencing, and sequencing of open chromatin fragments. These methods have already been around for 30-40 years. For instance, there's deoxyribonuclease I, the enzyme that slices the genome. A long time ago, scientists began to look around specific genes to see where these "cuts" landed. Deoxyribonuclease I is a large enzyme that can only interact with the open, unpackaged genome segments. You might wonder what the point is. It turns out that if a segment is open, it is typically doing some important work. For instance, it could be acting as a promoter or enhancer. Therefore, reading the open genome segments proves informative, it can tell you where something is happening. Eventually, it became possible to do the same thing for the entire genome, not just the open segments. The past 5-10 years have seen much refinement in single cell analysis. Now, even the genome of each cell in a sample can be studied individually.
Photographer: Engeny Gurko /
for “Life and Other Stories”
— Now I have a general idea of how your research is designed. What problems have you been working on in the past couple of years? Any issues that you worked hard to unravel and have successfully unraveled?
— We've been studying the metastases of prostate cancer. For some reason, in the later stages of this tumor, the metastases almost invariably form in the bone marrow. This selectiveness is a mystery to us.
Our intent was to compare the genetic state of bone marrow cells, immune cells, and metastatic cells to understand what's going on there. The material is difficult to obtain: we had to assemble a large team comprising oncologists, surgeons, hematologists, molecular biologists, and ourselves.
The findings of comparative analysis confirmed our expectations. When metastases are present, the bone marrow's immune system is in a state of dysfunction. Its T cells, which are supposed to be active and killing tumor cells, are inactive. This same picture recurs in many different cancers, not just metastasizing prostate cancer. Next, we directed our attention to the functional state of macrophages, monocytes, and other innate immunity cells that can modulate T lymphocyte activity, and we tried to think of ways to bring the immune system out of its “exhausted” state.
The standard immunomodulation methods didn't seem to work on prostate cancer metastases. For example, our attempt to block the PD-1 and CTLA-4 immune checkpoint molecules failed. This could have been related to the state of the immune cells that are supposed to fight the cancer or it could have stemmed from some other signals compromising the effectiveness of the attempted therapy.
Our task narrowed down to identifying such signals within the tissue that could inhibit the ability of T lymphocytes to attack the tumor. Via calculations, we selected a priority signal out of many — one that looked promising. Our experimenters took it from there, studying the signal's role in lab mice. That took a lot of effort. We utilized a purpose-designed mouse model, in which we disrupted the signal we'd found in two distinct ways: by blocking the ligand and by blocking the receptor. As a result, we noted a significant improvement in survival rate among the mice subjected to the blocking.

— Which option worked better: the ligand blocking or receptor blocking?
— In our study, the strategy of blocking the receptor proved more effective. Other than that, it is difficult to assess which one is better due to the limited amount of data. All I can say is, both strategies are better than not intervening at all.

— You can use the findings of these experiments to design a treatment protocol and make it to trials, preclinical and then clinical. Is that correct?
— It would be, in an ideal world. But, as you well know, it's a long way from a mouse to human being. Mice are just too different from humans. Many of the drugs that are highly effective in mice are either useless or too toxic in people.

— Apparently, before we can come up with a viable therapeutic strategy, we need to figure out the molecular profile of tumors, amassing a gigantic amount of data. Where is this data cataloged?
— Data builds up quite fast in biology, especially in genomics. There are databases where all this data is deposited, but they aren't organized very well. We have this project underway to create a database where we can store our single cell sequencing data, which we will be using to automate the interpretation of cell subpopulation recognition. You have to build a representative database for that, covering a diverse sample of cases.
One of the ideas behind the project is to encourage professionals to "borrow" interpretations from each other. Let's say there is this expert on helper T lymphocytes who knows how to divide and classify them. His expertise, encoded in algorithms, could be shared with other experts to be used in their research.
— This is, like, a digital expert! You're trying to birth an artificial intelligence entity that will supplant the narrowly specialized world-class experts by offering expertise to whom it may concern. Am I right?
— For practical intents and purposes, yes. In reality, the task is quite specific: we need to identify cell subsets and states representing a certain type of data. If there's a scientist who frequently does this, we can learn from them using artificial intelligence or machine learning technologies.

— From what I understand, genomics is currently busy seeking answers to some very specific questions while simultaneously gathering and acquiring a vast amount of data. Do you think quantity may translate into quality some day?
— Expert opinions differ on this matter. I don't think sheer mass of data will do the magic. The way we can study cells now is impressive, and we get a more statistically accurate sense of the processes inside the cells. And yet this is a restricted and distorted picture. It's restricted because with single-cell transcriptomes, we're in essence only looking at RNA. RNA is a crucial molecule. The full RNA lineup tells us what the cell is trying to do right now, which genes are active, which proteins are being synthesized. But there are other dimensions that we're not seeing.
For instance, the chromatin state, genome packaging, protein state, metabolite state — all these molecular aspects affect the overall cell condition. In essence, if you look at a single cell, you'll realize that many variables are interlaced. Therefore, once we get equipped to measure these variables simultaneously, we will catch on to how they are connected, and will be able to derive entirely different interpretations.
That's one point.
The other aspect is that we study multicellular organisms. Fundamentally, our capabilities are currently confined to a single cell in a vacuum, albeit with a high rate of repetition. That being said, tissue functionality, ontogenesis, immune response to a pathogen... These are all characteristics of a multicellular organism. It's critical that we understand the context of cell’s function, interactions, and the feedback loops occurring therein. We would like to be able to see and study all of this, but we need some new methods to do it.
In the past few years, spatial transcriptomics methods have been in the works, the idea being to use mRNA molecular analysis to study this multidimensional picture in the context of tissue — either on a thin section or a thicker three-dimensional piece. You can readily see the context with this method. You can see a cell in a certain state, located within a specific tissue context. I think this is a giant leap forward. With the aid of statistical methods, we could try to deduce the cell's dependence on its context from this data: what affects it and what leads to what. We are all set to embark on a long journey toward the molecular description and modeling of tissues, and perhaps eventually organs.

— What are you working on right now?
— We have several ongoing projects. We have recently completed our lengthy project, which looks technical at a first glance. The task was to define the molecular boundaries of a cell.
In spatial transcriptomics, basically, the molecules are measured on a tissue cross-section. There may be millions of cells on that cross-section, and thousands of molecules in each cell. The precise boundaries of each cell may not be clearly visible. It may be exceedingly hard to trace the boundaries with accuracy, despite the use of special coloring. It hinders further analysis when you are having a hard time trying to figure out what's going on in the system. We were compelled to invest considerable effort to devise a method that delineates the boundaries of a cell with reasonable accuracy. It sounds like a purely technical task, but further accurate interpretation of data is simply impossible without it.
— Did you define the molecular boundaries of a cell in the end?
— We did, though there’s still room for improvement. We hope this method will serve many research teams. When the boundaries of your data are clearly delineated, you can do more interesting stuff with the data. Such as build computational models of tissue structure, analyze how cell condition depends on the context, and so on. This is where we are aiming, but technical stuff needs to be thoroughly worked through before more ambitious tasks can be tackled.
— What are the next steps most likely to be?
— Well, we could use single cell analysis to study heterogeneity in tumors. Resistance to existing therapeutic strategies is a common setback in the treatment of cancers. One way this resistance happens is through new mutations. Mutations are capable of overstimulating the gene that allows cells to grow faster or to eliminate the drug, or they can deactivate the gene that, under specific conditions, will restrict cell growth. Traditionally, this aspect has been studied through genome sequencing, which only provides a static view. But the cell has other mechanisms — epigenetic and protein — that the cell can wield to achieve similar outcomes.
We meant to examine heterogeneity inside the tumor concurrently from both the genomic perspective and that of other cellular components. But we had trouble combining the two strategies in the experimental environment. We've created an algorithm that identifies genetic clones and subclones inside the tumor using data that shows the transcriptional or epigenetic conditions of the cells. This method worked, but its accuracy left much to be desired: we could not detect the subclones that had only a small number of variations. We consulted with experts in a completely different field of biology, population genetics. We borrowed their idea of using population data for genome phasing[1] to improve the accuracy of our algorithms. By aligning population data with single cell data, we are now poised to obtain more accurate results.
Photographer: Engeny Gurko /
for “Life and Other Stories”
— Could you please go a little more into detail on that? What specific population mechanism did you find helpful in describing cell types?
— It wasn't really a mechanism. It was the phenomenon of genome variation in a population. There are parts of the genome that stay together during recombinations, and there are sites where recombination is extra active. Because of that, it is possible, in light of several adjacent mutations, to predict further mutations in the population. In other words, if you take a linear look at the genome, the mutations are not independent, and can be predicted.
For instance, a cancer cell can eliminate a chromosome copy or a large piece of it. This is a common mechanism that lets cells change the way they behave. For us, human population statistics indicate which mutations should occur in one chromosome copy, and which ones should occur in another.

— Sounds like you're working on precise medical problems and existential dilemmas all at the same time.
— Our key expertise is methodological in nature: we create computational and statistical methods within the context of biological or medical research. This is something I always try to communicate to my team: a computational biologist can work in different fields, opportunities are limitless.

— A personal question: how does one become a successful scientist? When I looked up your Scopus profile, I noticed that your Hirsch index is 48 and your articles have appeared in some high-profile journals. Any words of advice for young scientists starting their path in science?
— There's high level advice, and then there's practical advice.
First, there has to be genuine interest. Honestly, I even tell my students that it's okay to complete one's degree without thinking a lot about this, but one shouldn't take up a research path afterwards unless they feel genuinely interested. Because practically speaking, the path is going to be long, hard, and unpredictable. One needs an internal drive to do that. There are plenty of other exciting opportunities to be successful in today's world, especially for someone with enough expertise to do computational biology. You can be an analyst in a bank or you can get a job at Google or Facebook. There's a plethora of other opportunities, so don't commit to science unless you are driven by genuine interest.

— That's a piece of conceptual mentoring.
— Seems to me that without it, too many people waste too much time. It's important to understand that the essence of science is grappling with hard, arcane questions that have never been easy to answer. One ought to be mentally prepared to accept the fact that they may never be able to resolve some, or in fact, many of these questions at all, or to resolve them in a satisfactory manner.
Secondly, choose your research target very carefully. The correct choice of research target is fifty percent of success. You need to think it through, take a good look around, read, talk to people before investing several years of your life in some field. Things take time, lots of time. Make sure your research task is worth your time, get the assurance that it holds at least a hypothetical potential to answer the questions that interest you, and that those questions cannot be answered in any other way.
The task you pick has to be solvable, at least potentially. I always encourage my students to do more than one project at a time, because if one project fails to work out, perhaps the other one will.
I think computational biologists can really benefit from communicating with other scientists, and other biologists as well of course. It's important to network and share ideas. Sometimes people may steal your ideas, but the benefits outweigh the risks: you'll be getting a lot more ideas and good advice from the community than you could possibly lose.

— As I was preparing for the interview, I looked up your lab's website at Harvard and I couldn't help noticing that you have many young Russians on staff. Is this because Russian universities graduate strong bioinformaticians? Or is it that you simply feel more comfortable working with Russians?
— That's a good one. To my regret, there aren't so many good specialists around. I think education is excellent in Russia, especially in secondary school and in university. Russian schools provide solid training in mathematics and computer science. And I think that, with such formidable mathematical and analytical grounding it's just a lot easier to do great things in my field of science. When a person is driven by genuine interest, they will quickly catch up on what they need to know in a specific field. But catching up on math or analytical methods is extremely hard work.
So yes, I think there are plenty of young people with strong training in Russia who have an interest in science. In the U.S., the country where I work and that I feel I know well enough, students with comparable qualifications enjoy a lot more alternative career opportunities. We compete for talent all the time with major corporations and financial sector entities.

— One last question: what's your dream?
— Also good question! Well... the dream of every computational biologist is clean data.

This interview was first published on Biomolecule website, August 3, 2022
Made on
Tilda