LIFE
AND OTHER STORIES
Georgii Bazykin
Decoding the Virus
  • Story

    on the genetics of the coronavirus: where it derives from, whether it could be artificially produced, where its evolution is going, how new variants emerge, why it is important to get vaccinated, and for how long vaccines will be effective
  • Story told by

    Georgii Bazykin, Visiting Scientist in Biomedical Informatics, Harvard Medical School (USA)
  • Story asked by

    Marina Astvatsaturyan, Science journalist
  • Story recorded

    in October 2021
Let's start from the very beginning. What exactly is bioinformatics?
— In the most general sense, bioinformatics is the process of working with biological data using mathematical methods. However, sometimes when people use the term, they mean its narrower application, working with genetic sequences. In this sense, some applications of numerical methods in biology (like analysis of complex images using computational methods) are not considered to be a part of bioinformatics. But genome sequences, protein structures, and phylogenetic trees are the things that make us who we are.

Are phylogenetic trees the kin relationships between closely related species?
— Yes, we will inevitably talk about it a lot. The concept of bioinformatics is closely tied to big data, large arrays of information, which can be difficult to comprehend. It also requires systematic processing if one needs to detect, for instance, some weak signal.  It is also inevitably about creating new computational algorithms. Moreover, computational biology not only utilizes numerical methods but also inspires them. Many areas of computer science have advanced due to the biological problems they were used to solve. For instance, methods for constructing trees based on graph data have significantly advanced because of it.

May I clarify another term you used, a signal. You're analyzing a gene sequence, for instance, you observe a smooth sequence, and then something stands out, is that correct?
— Yes, but it describes a variety of phenomena. For instance, my niche specialty is to know what evolutionary events influenced genetic sequences by comparing texts, that is, the sequences. Also, for example, can we somehow figure out if some variants of this specific gene were under the influence of natural selection? If we use a device that reads the DNA sequence, we get a long text in a four-letter alphabet. And if we read such sequences, for example, from different animal or plant species, or different coronavirus samples, we get sequences that originated from a common ancestor through an evolutionary process, due to the accumulation and further spread of mutations. Although, mutations can be diverse. Some of them, for example, were beneficial for a human, or a plant, or a virus, and were favored by natural selection. Due to natural selection, according to Darwin, such mutations became increasingly common. So, the basic concept was formulated by Darwin a century and a half ago. Although,  how to actually detect this signal in the sequences is something people are only beginning to understand now. This is one of the things we can study.
Photographer: Evgeny Gurko /
for “Life and Other Stories”
Do these data hold predictive capabilities to understand the direction of the organism's evolution?
— Once again, not all bioinformatics, and not even all evolutionary genetics, deals with this issue. Evolutionary genetics is primarily a descriptive science, it narrates what has occurred in the past. But can we predict the future based on evolutionary-biological reasonings? This is indeed an intriguing question and perhaps the most challenging unresolved issue in evolutionary genetics as of now. I believe that if it is to be solved (and some progress is made), it will be based on the viruses, instead of some more complex organisms. Viruses are relatively simple after all, their genome consists of about 30,000 letters if we're talking coronavirus, and around 10,000 for HIV. In contrast, our genome has 3.5 billion letters. Naturally, it's easier to study the patterns of change in a shorter piece of text. We can separately discuss how and to what degree this evolution can be predicted. This is indeed a significant and important task.

Does evolutionary biology intersect with medicine when studying viruses and other pathogens?
— Yes, this is one of the places where evolutionary biology and medicine do in fact meet. I am a biologist by training, not a mathematician like many bioinformaticians are, so I find it easier to judge from a biological standpoint. About a third of deaths in humans are associated with the evolution of certain simple clonal systems. These could be viruses or bacteria that adapt to the antibiotics we use to combat them. However,  it could also be cells from our own body, which due to evolution escape the body's control and start living their own life, acquiring mutations that help them spread faster and metastasize. This also gives them the ability to not respond to the signals from surrounding cells telling them to stop dividing and commit suicide. These are evolutionary processes as well, hence evolutionary approaches are widely used in modern cancer research. For instance, in an oncology article on cancer genetics, you can easily encounter evolutionary trees that correspond to the evolutionary origins of different cells within a single patient's organism, which evolved and acquired mutations during this evolution, making the tumor increasingly malignant.

When the coronavirus pandemic started, its genome was decoded within days. What did this information yield?
— Bioinformaticians identified that this virus belonged to the already known group of Sarbecovirus and named it SARS-CoV-2. This virus turned out to be quite closely related to the one that caused SARS in 2002–2003. For me personally, this was an immediate red flag showing that it needed to be taken very seriously. However, in the early months of the epidemic, there were a few misleading paths that led the wrong way. These were in particular related to the origin of this virus. Reliable data on this matter emerged only in the last few weeks, not back then, a year and a half ago. Part of the problem was that the virus was somewhat isolated on the evolutionary trees that people were trying to reconstruct, and there was nothing out there quite similar to it. There was a sequence called RaTG13 that was somewhat close. However, it seemed that if one estimated the evolution rate of this virus and calculated how much time should have passed for the viruses to diverge so much from a common ancestor, the common ancestor of RaTG13 and our coronavirus lived many decades ago.
And that's too far back, there must have been something closer. The fact that nothing closer was found gave rise to various theories that this virus was artificially created. There wasn't any direct proof of this, and the primary argument was exactly the absence of anything similar, a closer “relative”. People said, "If we haven't seen anything like it, it must have been created by someone." However, the natural diversity of these viruses is indeed very poorly studied. Just a few weeks ago, a preprint was published that is currently literally revolutionizing the field. The paper examined a large number of bats from karst limestone caves in Laos, and a variety of different viruses were identified from these bats. Some turned out to be significantly closer to our coronavirus than the original RaTG13. In my opinion, this largely settles the debate on the artificial origin of the virus. Maybe we still can't entirely dismiss the possibility that it was somehow cultivated and studied in laboratories based on bioinformatics discoveries but this is becoming increasingly unlikely.

From a practical perspective, what does the genome offer for therapy and disease progression prediction? Due to its lethality, there's much discussion about the so-called furin cleavage site.
— A recent article in Nature featured experiments on hamsters where this specific furin cleavage site was modified. This is a 12-letter segment in the virus, but depending on the sequence that scientists experimentally modified, the hamsters either fell ill, died, or survived. And this was the same sequence that misled researchers on multiple occasions. Because initially, there were studies that were simply trying to determine the origin of this furin cleavage site. It's actually 12 nucleotides, but they only encode for 4 acids. And it's a spot that can be easily cleaved by a specific enzyme, which allegedly facilitates the virus's entry into cells. This furin site was initially found in snakes, then in fish, but these were all completely misleading studies that were picked up by the media due to a lack of information. Because it's a nontrivial task: you have only four letters and you're searching for them throughout the entire library. Imagine this, if you're looking for a sequence of four letters in all the texts in the Russian State Library, you're bound to find it numerous times in a variety of different texts. This is roughly the approach that was used, and as a result, many researchers found such sequences. But this simple sequence doesn't tell us a whole lot.
I've already mentioned the Laotian coronaviruses obtained from bats and found to be closest in sequence to our coronavirus. No furin cleavage sites were detected in them! Nevertheless, these viruses have been experimentally studied and it turns out they can bind to the ACE2 receptor on the surface of human cells and penetrate cells as effectively as our coronavirus. So this furin cleavage site isn't absolutely necessary for the virus to be pathogenic in humans.
Photographer: Evgeny Gurko /
for “Life and Other Stories”
Then what kind of mutation is required?
— Technically, no one really knows. It is due to the lack of information about all the potential viruses that possibly are able to interact with human cells, and why some are able to begin with and others not. In other words, we can't construct such a function and say that certain nucleotide sequences will behave in certain ways and others will behave differently. To confirm this, we need experiments, and experiments are quite expensive. Bioinformatics helps as we attempt to model and hypothesize. We can propose numerous ideas, hypothesize which viruses can invade human cells, and which of them can do it easier than others; which molecules can serve as medications, and so on. However, until this is experimentally confirmed, it is still just speculation and, unfortunately, they are essentially worthless. Regrettably, biology isn't an entirely exact science. "Dry" biologists can still hardly manage without "wet” biologists, things are still quite challenging.

Let’s circle back to the coronavirus. Evolutionary biologists claim to have timed the emergence of this new coronavirus to October 2019. What's your take on the theory about Chinese miners who got infected in 2012 while cleaning a mine from the waste of horseshoe bats, which carried several related coronaviruses? These different coronaviruses combined to form a new one, causing the miners to fall severely ill or, I think, even die.
— Let's discuss a bit how we know that this virus jumped to humans specifically at the end of 2019. I've previously mentioned evolutionary trees, a topic we'll keep coming back to. We can examine how different the viruses are from each other (the ones that people are currently infected with during this epidemic) and see how this diversity depends on when the sample was taken. What you'll get is an ascending graph: the later the virus sample was taken, the more diverse the viruses are at that moment. You can draw a straight line through this graph and do two things.
Firstly, estimate the rate at which this virus is evolving. It appears to evolve at roughly the same rate as other coronaviruses, gaining about 1 substitution per 1000 letters (nucleotides) yearly. Secondly, we can observe when this line hits 0, indicating when all the current coronaviruses that we're studying in humans now, had their last common ancestor. Then, on the other axis, we can determine exactly when this common ancestor existed. This is referred to as the "molecular clock model," where accumulating mutations act like the clock ticking.
What we know is that, on average, every two or three weeks each sequence, passed from one person to another, acquires one mutation. Sometimes more, sometimes less, but still. This is a random process, but when you have many sequences and a lot of evolutionary processes happening, these molecular clocks tick quite accurately, allowing you to trace back in time and see when exactly was a "midnight" and when this stopwatch started. As it appears it started right at the end of 2019.
There could have been some outbreaks before that, it could have happened. Coronavirus is a fairly adaptable virus, it easily jumps from one species to another. This is evident with the current coronavirus. It came to us from bats (though this isn't entirely certain), and now it's infecting domestic and wild animals (jumping from humans), indicating that the interspecies barrier is easily breached. We observe this with other coronaviruses as well, like MERS, which humans have contracted multiple times from camels since the outbreak in 2012.
Returning to the topic of the Chinese miners, there might have been other virus jumps back then, long ago. Someone contracted a variant of coronavirus that had never been seen before. But those cases from long ago had no impact on the current diversity, on the current coronavirus epidemic. I would be very surprised if we suddenly sequenced a virus in someone that is significantly more distant from all other modern coronaviruses according to its molecular clock. It would suggest that a portion of the epidemic had gone unnoticed before. We can now assert that our entire ongoing pandemic inevitably originated in 2019 from a virus that jumped to humans towards the end of 2019.

From my amateur viewpoint, the main danger of the current coronavirus, compared to the first SARS-MERS, is its asymptomatic nature in a certain part of the population.
— Here your amateur viewpoint aligns perfectly with the expert opinions, as this largely accounts for coronavirus “viral success”. The first SARS, atypical pneumonia, essentially lacked this asymptomatic phase at the onset. Symptoms were almost immediately apparent. With our current SARS-CoV-2, roughly half of the infections occur before people start showing any symptoms. Meaning, if I fall ill, I infect about half of the people before I start showing any symptoms, and the rest after that. This certainly played a significant role in "launching" the pandemic.
And does people’s behavior affect its evolution? To clarify: the coronavirus needs to quickly jump to continue its existence. So, if a quarantine is imposed and people stop interacting, will its evolution in terms of rapid spreading cease?
— The direction of selection doesn't change when we impose a quarantine. The virus variants that can transmit more easily from one person to another are still being selected. What changes is the intensity of this selection, or more precisely, its effectiveness. It's important to understand that the rate of adaptive evolution, the rate at which the virus gains new beneficial mutations that make it increasingly adaptable to survival and transmission from one person to the next, depends on the power of natural selection. The question is how beneficial a particular mutation is for the virus. However,  this rate is also defined by the amount of the virus that is currently circulating in the world, and how much of it is influenced by natural selection. The more virus there is, the more opportunities it has to adaptively evolve. That means, by allowing the virus to multiply within us and permitting such a large-scale pandemic despite having more effective means to combat it than those currently used, we are giving natural selection more opportunity to hit its target. We're giving it more chances to evolve into variants that are better adapted to us.
This is happening inevitably. Indeed, at the start of the pandemic, I mentioned in this very studio that studying the evolution of the virus was primarily needed to track its spread, but these mutations didn't particularly affect it. What they were is little markers that allowed us to distinguish one variant from another, nothing more. Since then, the situation has changed.

Almost a year ago, I had a conversation with your American colleague, Eugene Koonin. I asked him about the mutation rate. He responded, "Compared to the flu, it's a very slow mutation rate indeed."
— But some of these mutations spread rapidly because they are advantageous to the virus. We now see that the virus is adapting. We could separately discuss why this was less noticeable, and seemingly, this process was less intense at the start of the pandemic and has become more intense now. However, these substitutions, these changes currently occurring in the virus are making it more fit. Primarily, of course, we are referring to the changes that form the basis of the well-known named variants such as "alpha" and "delta."
Photographer: Evgeny Gurko /
for “Life and Other Stories”
Are bioinformaticians able to predict upcoming developments?
— We can attempt to predict how prevalent or rare the variants we have now will be in the future, or we can try to predict the emergence of completely new variants. The first task is partially solvable. Currently, one of the objectives of this massive global initiative to read a large number of coronavirus genomes is to detect variants while they are still uncommon. When a variant is still rare, if you have sequenced enough genomes, you can estimate its superiority over its competitors by its growth rate, assess this very strength of selection in virus’s favor (for instance, if it's a new variant, let's say, "delta") and forecast what the epidemic will look like in a particular region, for example, a few weeks later. When "alpha" emerged, it was quickly assessed, while it was still uncommon, that it was approximately one and a half times more transmissible, meaning that selection essentially favors this variant about one and a half times more intensely than its ancestral variants. If you are aware of this, you can simply input this into the relevant equation and determine how much "alpha" you'll have in a week, two weeks, three weeks, a month, or a month and a half. And you'll quickly see that even though it's currently at 5%, in a month and a half there will be nothing else but "alpha". We applied the same method for "delta" in Russia.
Was your prediction accurate?
— Yes, but it doesn't require a major brain. You simply draw a curve through these points and typically from that point on you know what will occur. Of course, surprises can happen, new variants may emerge. For instance, we assumed that "alpha" would spread and establish itself in Russia, but this didn't happen because "delta" arrived and wiped out all the diversity of viruses we had observed. Still, this first task is comparatively simple. But there's a more complex task out there, which involves predicting how prosperous the new variants will be which we haven’t seen before. This task is significantly more challenging and remains unsolved.
There are several issues. You can estimate how beneficial individual mutations will be for the virus because you've seen them before. For instance, a mutation at position 215 increased the virus spread rate by 10% and at position 584 – by another 10%. Then the question arises: if these two mutations occur together, what will be a combined effect? 20%? Or perhaps 50%, or maybe 0%, because these mutations may interfere with each other. This is what is called epistatic interaction. It is the most complex aspect that prevents making such predictions. We can predict the effect of individual mutations, but predicting the effect of a combination of mutations is very challenging. Based on our current knowledge, it seems that epistasis within this coronavirus genome is quite potent. We've observed this previously in other viruses as well: HIV and influenza. However, there is increasingly more data suggesting that it is quite influential in the case of coronavirus as well. In other words, this [genome] text, these mutations cannot be considered separately. They must be considered in combinations, which instantly makes the entire picture significantly more complex.

How do you feel about the theory of the coronavirus being cultivated artificially?
— Let's clarify what exactly could be meant by this statement. There are essentially three different theoretical scenarios of coronavirus origin. Some person’s involvement in its creation itself. Some person’s involvement in storing it in a lab: in this case, nothing was  done with it. Last option is its purely natural origin. The first option, in my opinion, is highly unlikely. There are absolutely no signs of genetic manipulation or modification of this virus. There is no direct evidence that it hasn’t happened, but there is a great deal of indirect evidence indicating that nothing of the sort occurred. The second and third scenarios are more difficult to tell one from another. Currently, there is quite compelling evidence, in my opinion, that its origin is completely natural, that it was a jump or even multiple jumps directly from animals to humans with whom these humans were interacting. Some of this evidence comes from the evolutionary trees we continually refer back to.  Here's one example. At the onset of the epidemic, two major evolutionary lineages of this virus were clearly present in the human population. They were found in slightly different locations in Wuhan and were partially linked to several different markets selling a wide range of wild animals. One variant emerged in one market, while a very similar, yet different variant appeared in another market. And all of those events are very hard to imagine in a “leak from the lab” scenario. This would mean that it wasn't just one person who got infected in the lab, but two such individuals who would have been infected in labs and then visited two different markets. Not very plausible. On the other hand, it is absolutely natural to imagine such a thing happening from the natural zoonotic hypothesis perspective: in that case the virus simply jumped from certain animals, especially given that these markets share suppliers. For instance, in Moscow, you can find the same products from the same producer in different markets.

Are we talking about contraband animals, like pangolins?
— Once again, we don't know exactly which species was the intermediate host or from which species the jump occurred. It could have been not bats, but raccoon dogs or some other species. However, the point is that apparently these animals already carried slightly different variants of the virus, hence different variants spread in different locations. These are the results we have now. It's not a 100% proof, but it does support the idea that coronavirus origin was purely natural.
Photographer: Evgeny Gurko /
for “Life and Other Stories”
But there have been cases of lab experiments and leaks, haven't there?
— In 2008, Dutch scientist Ron Foucher did experiments with the newly emerged avian influenza virus, trying to understand which variants could be transmitted in ferrets. Ferrets are a typical model organism for studying human influenza, as we have many similarities in our respiratory systems, as far as I remember. He got a variant of the virus that had five different mutations and was capable of infecting mammals. Scientific publications were released, with no leaks occurring. Despite a significant delay in publishing because you really had to assemble various committees and assess the potential danger of disclosing information about these mutations, the information was eventually made public. This information was published more than a decade ago, and as far as we can tell, no harm has resulted from it.

Only the benefits!
— Indeed. The benefit lies in the fact that knowing the mutations that allow a virus to jump from one species to another helps us understand how that virus works. Research involving viruses gaining new functions, known as gain-of-function research, is an area that is strictly and rigorously regulated.
Was there a leak in 1977?
— Yes, it's possible that there was a lab leak then, although again we don't have any direct evidence supporting that scenario. A variant of influenza virus thought to have been extinct for decades suddenly reappeared in the exact form it was last seen. Conveivably, it was simply not noticed for a long time, being transmitted infrequently in specific locations from person to person, or perhaps among some animals. However, in either of these situations, it would have changed, evolved. But in fact it re-emerged decades later in the exact form as before. This led people to speculate that it might have been stored in a freezer somewhere during that time. But once again, this is just an indirect conclusion…

So, no one was caught at the crime scene?
— No one was, but anyways it could have been about a leak, not a modification. Again, it reappeared in the exact form it had disappeared.

We've smoothly switched from coronavirus to influenza. When did the influenza virus first appear in humans? Do we know that?
— The influenza virus, as we know it, is a constantly changing notion. It has been with humans for as long as we can trace back. Over the past 100 years, when our joint history has been somewhat traced, we have actually had a number of different influenza viruses. Generally speaking, there is much more diversity among influenza viruses than what we encountered among human infections. We don't know what happened before the 20th century. There is historical evidence that there was some Russian flu in 1889, and some guesses about what it could have been, but we only know its history well starting with the "Spanish flu."
I wanted to ask specifically about it. Four waves were detected, similar to COVID. Can we draw parallels?
— Influenza is the most common and well-studied respiratory disease, so it often serves as an analogy for the coronavirus when there isn't a better one. However, the "Spanish flu" wasn't the last influenza pandemic. There were several others during the 20th century: in 1957, 1968, 1977, and 2009 (the so-called "swine flu"). And each pandemic has been linked to the emergence of a drastically new influenza variant. In this sense, we have several models of what's going on right now with COVID. The variant that appeared in 1918 ("Spanish flu") resembles the virus that still exists today. The mortality from this flu is now much lower than it was back then, as it initially spread in an immunologically naive population, whereas now it circulates in a population that has been partially immunized through previous exposure or vaccination. This is similar to what will happen with COVID: it's hard to imagine us completely getting rid of this virus, it will likely continue to coexist with us…

Will it become seasonal?
— It's hard to judge how pronounced its seasonality may be. It might become seasonal like the flu, or maybe its seasonal fluctuations will be less pronounced, or they might not appear at all. What is clear is that we will continue to coexist with it, so I hope that all reasonable people will get vaccinated. And after that, the virus will have to “deal” with a population that is immunized: either through vaccination or, unfortunately, through infection. Falling ill is far worse than getting vaccinated, because when you fall ill, there's a chance you could die, but if you get vaccinated and still get sick, you almost never die from it. Of course, you can still get sick, as no vaccine offers 100% protection even against the variant it is based on. As a result of viral evolution and the emergence of new variants, post-vaccination illnesses are becoming more common. I myself can serve as an example here. I contracted coronavirus a few months after getting vaccinated. But, I must stress it, it's important to keep the numbers in mind. The vaccine's efficacy in preventing reinfection is relatively low, but its efficacy in preventing severe illness, death, or hospitalization is extremely high. At the population level, both nationally and globally, there will be such cases: not just of infection, but sadly also of severe illness and even death among those vaccinated. And if (ideally) the entire population were vaccinated, then all deaths from the virus would be among the vaccinated, because there would be no one else. However, that doesn't mean the vaccine doesn't work. There is solid data showing that the vaccine remains effective against the currently circulating variants. Theoretically, it is possible that some day, at some point, variants will arise against which the efficacy of available vaccines will be even lower. This will mean that we need to develop a new vaccine. Modern technologies (especially mRNA) allow us to update vaccines quite quickly, though not instantly. Scientists are thinking one step ahead and planning what they will do when a variant comes along that current vaccines can't handle. But as of now, there's no need for this, as vaccines continue to be effective against new variants and are handling the situation well.

Are you currently monitoring the existing variants of influenza? Should we get vaccinated amidst the COVID situation?
— Vaccination is definitely necessary, although in a funny way, COVID has had a positive impact on the fight against the flu as well, since measures to prevent COVID also help prevent the flu. Interestingly, one of the flu variants, B/Yamagata, seems to have fallen victim to our fight against COVID. It has disappeared as of 2020, although it might still be hiding somewhere. However, there are three other influenza variants that are dangerous. I got my flu shot this year, as it remains a threat. Particularly in a scenario where last year's flu outbreak was relatively mild due to COVID measures, many of us have seen our immunity to it weaken, and this flu could hold surprises for us. So, I believe that getting vaccinated against it now is a very good idea.
Photographer: Evgeny Gurko /
for “Life and Other Stories”
Can we speculate what life is like after COVID, or at least after the active phase of the pandemic?
— Making clear predictions seems to be a difficult task, but we should already be considering how we will need to coexist with this virus in the future. A few things can already be noted at this point. Firstly, there is already pretty solid evidence that immunity diminishes over time, so you will need to get vaccinated numerous times repeatedly. Secondly, we are witnessing the continuous adaptive evolution of this virus, as it gains new mutations that make it increasingly effective. This evolution can be of two kinds. Either the mutations enhance its transmissibility, allowing it to adapt more effectively to living within us. The virus will eventually collect these low-hanging fruits. It will quickly improve its ability to transmit from one person to another. However, its adaptive evolution won't stop there, as it will continue to be influenced by the shifting natural selection related to our immunity: our adaptive immune system will develop defenses against certain variants of this virus, and it will constantly need to evolve, gaining new mutations that bypass existing defenses. Essentially, it will be a continuous arms race.

Isn't there a limit to the number of mutations?
—More likely no than yes. If you have a genome of 30,000 nucleotides, the number of combinations of possible mutations in this genome is vast, astronomical, more than astronomical. Mutations occur in a wide variety of proteins. The spike protein is something that is most often discussed, but it's not the only one. For example, this virus, for reasons not entirely clear, is systematically trying to eliminate the orf-8 gene, which it seems to not “appreciate” much in humans, even though it was apparently beneficial to it before. It likely plays some role in interacting with the immune system. Other mutations improve its polymerase, a protein that allows it to reproduce more intensely, and so on. Virus-adaptive mutations occur throughout the genome, as we see in our practice. In collaboration with colleagues from St. Petersburg, we studied a patient who had been infected with the coronavirus for 11 months, almost a year. This is not what is referred to as a long-lasting COVID, where people experience lingering side effects from the disease for many months, but the virus itself is not present. In this case it was possible to isolate the virus from the patient during this whole period. Several such cases have been reported.

This patient had lymphoma, didn't they?
— All these cases are linked to a suppressed immune system. This also implies that the virus has the capacity to evolve within an individual's body. In this scenario as well, we observe the same adaptive mutations that are present in the broader population. In our research, we have demonstrated that this virus can persist in the body for a long time. So far, our 11-months case is the longest of its kind. Why is this important? Because it's possible that new variants and mutation combinations, which prove to be beneficial for the virus, arise in this very way.
A human incubator basically?
— Yes, if you will. It seems that when such individuals fall ill, special measures should be taken to minimize further transmission of the virus. This isn't the only mechanism for virus evolution, but it seems to be quite significant. With the "alpha" variant, there are substantial reasons to think that this is how it came about, because we weren’t able to see any intermediate variants. Typically, in a well-studied phenomenon like a coronavirus, you can observe all the intermediate stages: when exactly one mutation appeared, then another, and then another, and so on. But in "alpha" we suddenly observed a new combination of ten new mutations in the genome that no one had detected before and they were all accumulated quite quickly, in a fairly short period of time. This could have happened in a patient with a suppressed immune system. This is a hypothesis, as no one has seen this exact patient, but it's quite well-substantiated.

What important questions related to the new coronavirus are still unanswered for you as a bioinformatician and evolutionary biologist?
—We have already touched on some of those issues. To me, the most interesting thing is the epistatic interactions between different mutations within the genome. Besides, there's a mechanism of virus evolution that has not manifest itself yet, but I fear it will,  recombination. This is a situation where a cell becomes infected with two different viral particles, so that it produces new variants of the virus with its beginning taken from one “parent” and the end from the other. We know that this process was important in the origin of our modern coronavirus: it is originally the product of several recombination events that occurred before it jumped onto humans. However, so far it hasn't recombined much among humans. Still, as the diversity of this virus in humans continues to increase, I believe this process will become increasingly significant, and recombination with other coronaviruses currently residing in other animals is a possibility. This needs to be closely monitored.

If SARS-CoV-2 hadn't occurred, what would you be doing now? What research would you have continued or initiated, also here, at Skoltech in particular?
— When the coronavirus occurred, we had to redirect a part of our team to this new subject and distract them from their ongoing tasks at the moment. But we still have a lot of diverse work on other viruses unrelated to the coronavirus. We're publishing papers on influenza and tick-borne encephalitis virus. Moreover, our research group doesn't only focus on viral evolution, we also study the evolutionary genomics of other objects and try to understand how natural selection impacts the evolution of higher organisms, like us, for instance. This is a large separate field. A preprint by an exceptional and highly capable PhD student Anastasia Stolyarova was published recently. It is dedicated to the evolutionary properties of an organism with extremely high variability, the Schizophyllum, a type of wood fungus. We randomly picked two genomes from two fungi gathered from the same tree, and they differ as much as a human differs from a squirrel. It turns out that in such a variable object, evolutionary forces can be studied with a much higher resolution. This is a fascinating area that we are currently exploring.
The interview was aired by Echo of Moscow on December 7 and 8, 2021
Made on
Tilda