— Do these data hold predictive capabilities to understand the direction of the organism's evolution?
— Once again, not all bioinformatics, and not even all evolutionary genetics, deals with this issue. Evolutionary genetics is primarily a descriptive science, it narrates what has occurred in the past. But can we predict the future based on evolutionary-biological reasonings? This is indeed an intriguing question and perhaps the most challenging unresolved issue in evolutionary genetics as of now. I believe that if it is to be solved (and some progress is made), it will be based on the viruses, instead of some more complex organisms. Viruses are relatively simple after all, their genome consists of about 30,000 letters if we're talking coronavirus, and around 10,000 for HIV. In contrast, our genome has 3.5 billion letters. Naturally, it's easier to study the patterns of change in a shorter piece of text. We can separately discuss how and to what degree this evolution can be predicted. This is indeed a significant and important task.
— Does evolutionary biology intersect with medicine when studying viruses and other pathogens?
— Yes, this is one of the places where evolutionary biology and medicine do in fact meet. I am a biologist by training, not a mathematician like many bioinformaticians are, so I find it easier to judge from a biological standpoint. About a third of deaths in humans are associated with the evolution of certain simple clonal systems. These could be viruses or bacteria that adapt to the antibiotics we use to combat them. However, it could also be cells from our own body, which due to evolution escape the body's control and start living their own life, acquiring mutations that help them spread faster and metastasize. This also gives them the ability to not respond to the signals from surrounding cells telling them to stop dividing and commit suicide. These are evolutionary processes as well, hence evolutionary approaches are widely used in modern cancer research. For instance, in an oncology article on cancer genetics, you can easily encounter evolutionary trees that correspond to the evolutionary origins of different cells within a single patient's organism, which evolved and acquired mutations during this evolution, making the tumor increasingly malignant.
— When the coronavirus pandemic started, its genome was decoded within days. What did this information yield?
— Bioinformaticians identified that this virus belonged to the already known group of Sarbecovirus and named it SARS-CoV-2. This virus turned out to be quite closely related to the one that caused SARS in 2002–2003. For me personally, this was an immediate red flag showing that it needed to be taken very seriously. However, in the early months of the epidemic, there were a few misleading paths that led the wrong way. These were in particular related to the origin of this virus. Reliable data on this matter emerged only in the last few weeks, not back then, a year and a half ago. Part of the problem was that the virus was somewhat isolated on the evolutionary trees that people were trying to reconstruct, and there was nothing out there quite similar to it. There was a sequence called RaTG13 that was somewhat close. However, it seemed that if one estimated the evolution rate of this virus and calculated how much time should have passed for the viruses to diverge so much from a common ancestor, the common ancestor of RaTG13 and our coronavirus lived many decades ago.
And that's too far back, there must have been something closer. The fact that nothing closer was found gave rise to various theories that this virus was artificially created. There wasn't any direct proof of this, and the primary argument was exactly the absence of anything similar, a closer “relative”. People said, "If we haven't seen anything like it, it must have been created by someone." However, the natural diversity of these viruses is indeed very poorly studied. Just a few weeks ago, a preprint was published that is currently literally revolutionizing the field. The paper examined a large number of bats from karst limestone caves in Laos, and a variety of different viruses were identified from these bats. Some turned out to be significantly closer to our coronavirus than the original RaTG13. In my opinion, this largely settles the debate on the artificial origin of the virus. Maybe we still can't entirely dismiss the possibility that it was somehow cultivated and studied in laboratories based on bioinformatics discoveries but this is becoming increasingly unlikely.
— From a practical perspective, what does the genome offer for therapy and disease progression prediction? Due to its lethality, there's much discussion about the so-called furin cleavage site.
— A recent article in Nature featured experiments on hamsters where this specific furin cleavage site was modified. This is a 12-letter segment in the virus, but depending on the sequence that scientists experimentally modified, the hamsters either fell ill, died, or survived. And this was the same sequence that misled researchers on multiple occasions. Because initially, there were studies that were simply trying to determine the origin of this furin cleavage site. It's actually 12 nucleotides, but they only encode for 4 acids. And it's a spot that can be easily cleaved by a specific enzyme, which allegedly facilitates the virus's entry into cells. This furin site was initially found in snakes, then in fish, but these were all completely misleading studies that were picked up by the media due to a lack of information. Because it's a nontrivial task: you have only four letters and you're searching for them throughout the entire library. Imagine this, if you're looking for a sequence of four letters in all the texts in the Russian State Library, you're bound to find it numerous times in a variety of different texts. This is roughly the approach that was used, and as a result, many researchers found such sequences. But this simple sequence doesn't tell us a whole lot.
I've already mentioned the Laotian coronaviruses obtained from bats and found to be closest in sequence to our coronavirus. No furin cleavage sites were detected in them! Nevertheless, these viruses have been experimentally studied and it turns out they can bind to the ACE2 receptor on the surface of human cells and penetrate cells as effectively as our coronavirus. So this furin cleavage site isn't absolutely necessary for the virus to be pathogenic in humans.