Ekaterina Khrameeva / Life and other stories / Skoltech x RSF

LIFE
AND OTHER STORIES

Ekaterina Khrameeva
How to Fold DNA?

LIFE
AND OTHER STORIES

Ekaterina Khrameeva
How to Fold DNA?

Story

on why bioinformaticians are so in demand, why it's important to study chromosome packaging, and how working in industry differs from working in academia
Story told by

Ekaterina Khrameeva, Bioinformatician and Associate Professor at Skoltech Center for Molecular and Cellular Biology
Story told to

Kristina Ulasovich, Science journalist
Story recorded

in October 2021

— You have a rich background, having worked in a bunch of different universities and countries, in Europe and Asia. Did you always work in bioinformatics?
— Maybe my background differs from that of an average scientist, but it's pretty typical for a bioinformatician. We are often involved in multiple projects simultaneously. As bioinformaticians, we analyze data on our computers. Our colleagues obtain data from experiments in the labs, and our job is to process their results. I'm not trying to trivialize our input, research can't get anywhere without analysis these days. All I'm saying is that our job is part of a much larger project. I suppose all projects in modern science are these huge collaborations involving many scientists and labs, often based in different countries, each contributing its bit of work. We also do our small but crucial piece of work.
I enjoy what I do. On top of my 15 years in science, I've spent 5 years working in industry. So I have the necessary data for comparison. Working for a corporation is not as fun as it seems, you have to do all these routine jobs. In science, at least we are our own bosses. We study what we want, anything we feel like studying.

— Are you saying you enjoy creative freedom in science?
— Science is definitely a creative process because we never know what the outcome will be. We can make plans and assumptions, but that doesn't mean things will go as planned. Sometimes you succeed where you least expect it. That's the thrill of it: everything is uncertain, unpredictable. We also enjoy a greater degree of freedom compared to corporate jobs, even though we have to think of grants, work plans, deadlines, and we are required to submit articles by a certain date. Still it's a much more flexible deal compared to other fields.
And let me tell you I feel I'm in no danger of professional burnout. I'm not one of those researchers who will sacrifice everything on the altar of science. They'll go without sleep, without food, working nonstop, nights and weekends. That's not my cup of tea. We have a schedule, finishing work no later than 7-8 pm, and the lab is closed on weekends. I don't let my employees and students overwork, nothing good comes out of it in the end. So many people hit a brick wall with job burnout, ending up in therapy. Who needs that? A relaxed schedule helps me keep my interest in science alive. If I overexerted myself, I wouldn't want to do research anymore. Research is essentially hard work. Like everyone else, I get tired, I get stressed out with so many deadlines to keep. But if you accept it as a given and try to maintain your work-life balance, you can live with it.

Photographer: Stas Liubauskas /
for “Life and Other Stories”

— Is there much demand for bioinformatics these days?
— I've never had a problem finding a job, and neither have my students. On the contrary, it's hard to find staff for your lab. Fewer specialists are available than are needed. And why this happens is a good question. Too much data is generated in science overall, and in biology in particular.
All technologies are now cheaper, experiments no longer cost as much as they used to. Genome sequencing was insanely expensive in the 1990s. It used to cost millions, if not billions, of dollars to decode the very first genome. Now it costs a mere couple thousand dollars to do a single genome. And it's even cheaper if we talk about transcriptomics. With the drop in prices, it's only logical that accessibility has improved, but not so many people know what to do with the data. Bioinformaticians are among those who do know that. Although, such specialists are still rare because the skill set itself is new. When I enrolled in the university, our class was only the third admission in Russia's first school for bioinformatics. This was 15 years ago – to give you a general idea of how old this major and this profession is. The admissions have since increased, for sure, but still there aren't enough qualified bioinformaticians out there.

— Meanwhile, massive amounts of data are produced. Is it possible to estimate the rate at which the amount of data increases?
— There is a constant in computer science, known as Moore's Law. It describes the rate at which computer power increases. The volume of data increases at about the same rate. If I wanted to show it with a curve, the curve would take a steep climb from the 90s on, shooting straight up. This, again, connected with lower technology costs and better access to affordable instruments.

— How come we don't have enough bioinformaticians? Is it not a profession for everyone?
— Anyone can become a bioinformatician. It's not rocket science. But you have to be good at writing code, which will probably take a few years of your life to master. This skill is essential.
The diversity of data types may be a minor problem. Sequencing encompasses many different processes, all targeting different biological subjects. One direction is transcriptomics, it studies our gene activity. If we need to see how gene regulation works, we will deploy ChIP-seq or ATAC-seq technology. The issue here is that each of these data types requires its own take on analysis and its own software.
It's easier now since many people are into data analysis worldwide, and they develop data processing software. Essentially, if the technology came on the scene some time ago (which means more than 3 years in our field of science), it means that the matching data processing software already exists. But this would be the best case scenario. In reality, you rarely simply run the program and get the results. Unless of course if your data is really high quality. Wet lab researchers often face issues that compromise data quality. Those issues can be tackled, but this also requires specific skills. Anyway, you cannot simply run the program without understanding how it works. You'll have to look into details, tweak the program, add bits of code, modify it... We do this routinely, because there is no such thing as perfect data.
And science isn't an industry, you cannot do things step by step in science. We always need something from this data. Standard, plain methods rarely, if ever, do the job. They are good enough for the initial data processing stages. But they won't do where you need to answer a biological question or test a hypothesis. At this point, we pretty much always have to come up with a creative method of analysis. The standard methods no longer apply at this final stage.

— Speaking of the more up-to-date tools: do you use machine learning?
— We've started to use machine learning a lot in the past 3-5 years. Technically, we work with all types of data, but some types we deal with more often than others. There's this experimental procedure that gives you a sense of how DNA is folded in the cell nucleus, and how the chromosomes are packaged. They aren’t just stuffed randomly, but arranged in a specific order, which makes all the difference for gene regulation. Let's say we have received one chromosome copy each from our mom and dad. This chromosome set is the same in all our cells. But we have eyes, hair, a liver, kidneys, a heart. Those cells are all different, although each has the same set of genes. How does this happen? The sophisticated gene regulation system is predicated on the way chromosomes are packaged in the nucleus. The way chromosomes are packaged in different cells determines which genes will be active in the cell at any given time. This is essentially what my lab does.
There's this experiment, called Hi-C, which is designed to decode the chromosome packaging chart in minute detail. The data type involved there is more complex than usual because the data is two-dimensional. If we look at chromosome packaging, we can visualize it as an extra high-resolution heat map showing all chromosomes connections with each other. The heat map will be unique to each organ. The kidneys, the liver, the heart will each have their own chart. When we translate the heat map into data, we get this connectivity matrix. And this goes deeper than what bioinformaticians usually deal with. Usually, genome sequencing is merely about the sequence of nucleotides on a chromosome. Most data here is one-dimensional. What we do involves a rare kind of data that few researchers work with. There aren't so many out-of-the-box programs for this, so we try to use unconventional methods to analyze this data. Machine learning, deep learning — all these technologies come in handy because traditional machine learning works with images. If we put biology aside for a minute, machine learning started with things like face recognition, image recognition, and automatic object detection in photos, and those are still the primary applications for ML. Our two-dimensional heat maps can be visualized as images, such as by color-coding the numbers. This is the approach we're trying to practice. It's what people do a lot worldwide.

— How accurate are the models currently used in bioinformatics?
— Not very. When it comes to biological problems, there's this issue with machine learning: we have too many features — say, for genes, — but not nearly enough samples. Machine learning thrives in an exactly opposite situation: with plenty of samples and few features. We have it the other way round. In the end, we are not able to sustain the effectiveness of the same models that work well in other fields. Simply put, let's say we are trying to recognize a vase in an image. Machine learning will do a perfect job if we have 10,000 images and one vase. The thing with us is, we have 100 images and 1,000 vases in each one. See the difference? There's a lot of adjustment yet to be done.

— What insights can we gain from such an image of internal organs?
— We can compare the images to identify the differences. For instance, how the functioning of the liver differs from that of the spleen. We can try to figure out which genes are responsible for how the cells look and function in a specific organ. This fundamental endeavor helps us to figure out how everything works. As for the more mundane applications, similar images can be built for healthy and ailing people, such as for a person with a healthy brain and a patient with glioma, a type of brain cancer. Then, by matching the results and spotting the differences, we will be able to tell what had malfunctioned and possibly caused the disease. With many diseases, including glioma, we still have no idea what triggers them. If we could get to the bottom of this, we would then be able to identify the target for therapy and develop a drug that would either fix or somehow compensate for the failure.
I chose glioma as the example for a reason. It is known for a fact that the way in which the chromosomes are mis-packaged will determine the progression of the disease, at least for some subtypes of glioma. But this wouldn't be the only example. There are many diseases, primarily cancers, that are similarly caused.

— What fundamental questions is your lab trying to answer?
— Our lab's current interest corresponds with everyone else's interest in this field, namely how chromosome packaging is organized in humans and other organisms. One thing is certain: even the basic principles of chromosome packaging are different in different organisms. Humans have their own, fruit flies have theirs, and more primitive life forms such as yeast have entirely different ones. Why they're different is a mystery to us. We still don't understand the underlying principles, we have no idea why things are organized in this way, and we don't know what mechanisms sustain the whole thing. If we really want to know, it would make sense to work our way up from the basic life forms. Going back to trace how the packaging of chromosomes gradually complexified from primitive organisms like bacteria all the way to humans. Ultimately, we might just be able to capture the way of evolution and sort out all the intricacies. Significant advances have been made in recent years in the methods of decoding chromosome packaging. The earliest decoding methods appeared very recently, in 2009.

Photographer: Stas Liubauskas /
for “Life and Other Stories”

— What other groundbreaking developments have occurred in your field?
— One was this technology, Hi-C, which provides a high-resolution picture of how the chromosomes are packaged in the nucleus. Before Hi-C, the most we were equipped to see was, literally, how close together two genome fragments were. It wasn't until 2009 that we were able to replicate this process for millions of tiny chromosome pieces, track their location in the nucleus, and figure out what they preferred to abide next to. Thanks to Hi-C, we have discovered a lot of new information about the nuances of chromosome packaging. Back in the 1980s, it was believed that chromosomes are packed into this rigid tube called "30-nanometer fibrilla". With the help of Hi-C and high-resolution data we saw there is no fibrilla tube. Instead, DNA is packaged in tight knots interspersed with sections of loosely packaged DNA. Our genome looks like beads on a string, with alternating beads and loose sections. The beads are found in roughly the same locations across all the different cells of our body, indicating that the structure is stable. In fact, if we match human chromosomes with those of a mouse, we'll find the beads in approximately the same positions.
Scientists are currently of the opinion that those beads or knots are gene regulation units, noting that most active genes reside at the edges of the knots, where everything is unpacked, while the inactive genes, those that the cell does not currently need, stay inside. We can suspect why this happens. When everything is so tightly packed, the proteins that could activate a hidden gene cannot physically enter this unit. But on the surface where things are loose, the situation is reversed. This is likely why structure is key to the proper regulation of cell function. It varies somewhat between the different cell types in the heart, the skin, and so on, but it seems that in here lies the reason why different cells work differently in our body. But we would never know any of this without Hi-C.

— But technically we still don't know why the liver is the liver, do we?
— That's right. We do not. We know it has something to do with gene regulation. We also know that gene regulation is somehow related to the knots, that is, chromosome packaging and the differences in chromosome packaging between the organs, the liver and the heart, for instance. But we are still in the dark about some details. One thing we don't know is why these knots are so stable or why the structure remains largely unchanged cell to cell. There must be a mechanism that maintains those knots since they never seem to move around. Hypotheses do exist, but they're just hypotheses. At least we know for certain that the formation mechanisms of the genome knots are different in the human being and the fly. But we don't know why that is so.
Some hypotheses have been corroborated and will likely change their status to "proven" in the next few years. But some questions are sure to remain unanswered. And then again, those formation mechanisms have been confirmed, first of all, for humans. But there are organisms out there no one has ever paid any attention to. If we aim to trace the entire history of chromosome packaging complexity, there will be a lot more experiments to do and data to mine. Someone will have to take on this task, in order to obtain, by means of experiment, similar data for all living organisms, so we may get a sense of how their chromosomes are packaged.

— Will we be able to use those data to study other diseases, not just cancer? Neurodegenerative diseases, for instance?
— You're spot on with that one. We do research on mental disorders like schizophrenia and autism. Some of it is funded by Russian Science Foundation grants.
Why do we believe chromosome packaging may be a factor in mental disorders? Because in the case of autism — at least for three genomic locations — a correlation has been established between the disease and packaging defects. We have obtained indirect evidence that that same may be true about schizophrenia. Now we plan to do some hands-on experiments. We will use samples taken from deceased schizophrenia patients and from healthy people, and match their charts to pinpoint the differences.
We might locate the genes that are important in schizophrenia development. We might find, for example, that a healthy person has a knot in a certain place, but in an ailing person that knot has disintegrated, causing the gene to malfunction, which may have promoted the disease. This is our hypothesis. We will be testing it experimentally and analytically.

Photographer: Stas Liubauskas /
for “Life and Other Stories”

— How do you envision your field in the near term or, possibly, in the distant future?
— I probably shouldn't try to look too far into the future, since our field didn't even exist 10 years ago. It's difficult to imagine what can happen in the next decade. But in the foreseeable future, I can say that the gap between snowballing data and the supply of people willing and able to analyze it in-depth will be only getting worse. On the bright side, ready-to-use software is now available for all the major classes of experiments and data. It makes our job easier. That being said, software availability doesn't solve all problems — you can't run a program and get an article for Nature done. You still have to come up with a creative approach for each project in its final stages. But at least there are some tools for the starting stages of data processing. I think more software will appear in the next 5 years. The programs will become progressively user-friendly until eventually they will make us redundant at most, if not all, data processing stages. The new software will be easy to use. The operator won't have to be a programmer or a bioinformatician. Basic computer literacy will suffice. But I'm pretty sure a human touch will still be crucial for that small but ultimately important final step. It's what transforms a basic job into quality work, the kind of work that may potentially lead to a scientific discovery. So I don't think we're in any danger of losing our jobs. On the contrary, our work may gain more relevance as it becomes more creative after all technical, routine processes are delegated to the algorithms.

— So, you're saying the routine tasks are all going to be automated?
— I think so. The monotony is on its way out, we can witness this already. The programs released in the past few years automatically analyze the data. Every work stage used to need its own program, which had to run separately. It had to be customized, and one had to figure out how it works and how to adjust it. Now, there's just one all-in-one program where you input the results, and the program puts out almost fully processed data.
This frees up time to do more interesting stuff. But there's a downside to it. Thus far, a lot of bioinformaticians have become full-time software developers. Many labs keep making a living by publishing articles about data processing tools. There's nothing wrong with that. It's just that these people are going to fall on hard times. Good tools already exist. What else can you possibly think of to improve on what works well already? Thankfully, we are not in that category. We are the kind of bioinformaticians who like to come up with creative ideas during our final work stage. Looks like we'll stay in high demand for a while.

This interview was first published on Biomolecula.ru July 6, 2022