Mikhail Belyaev / Life and other stories / Skoltech x RSF

LIFE
AND OTHER STORIES

Mikhail Belyaev
AI as Doctor’s Ally

LIFE
AND OTHER STORIES

Mikhail Belyaev
AI as Doctor’s Ally

Story

on how machine learning can lead from "Formula 1" to medicine, what artificial intelligence can do for patients and doctors today, and what it might achieve tomorrow
Story told by

Mikhail Belyaev, Ph.D. in Physics and Mathematics, Founder of IRA Labs
Story told to

Kristina Ulasovich, Science journalist
Story recorded

in Septemder 2021

— Mikhail, over the course of your professional career, you've been testing halos for Formula 1, have constructed models for the cooling system of a nuclear power plant, and now you've moved to medical research. How come?
— Probably, the key term here is machine learning. The first part of my career is indeed associated with industrial applications. We developed new machine learning algorithms for engineers and helped solve various problems: primarily at the modeling stage. For example, we worked on planes with the aerodynamics and the structural capabilities of Formula 1 race cars. At some point, I wanted to change my field, find a direction where, on the one hand, there's a lot of data, and on the other hand, there are many open questions that could potentially be answered with machine learning algorithms.
For a while, we were searching for our niche and initially got engaged with neural interfaces. The brain-computer interface generated a lot of data as the electrical activity of the brain changed during thinking processes. However, when using non-invasive, cutaneous electrodes which (unlike Elon Musk's implanted chip) signals turn out to be too weak, which is a significant limitation when processing the data. Later, we switched to neuroscience, where there is also a lot of complex data: functional and structural MRI, which can help understand the structure of an individual's brain and how it functions when solving a particular modeled task.
But then I realized that data analysis in neuroscience is still too far from real life. These are absolutely fundamental studies, and it's not clear if they can lead to any practical results. Therefore, we started analyzing medical images, but in the context of assisting doctors, meaning we began solving specific problems.

— How did you transition from invertebrate zoology to molecular biology?
— Well, it's not the only unusual thing to have happened in my life...

— How is your interaction with doctors organized?
— Generally, there are two types of research in this field. The first genre involves solving a purely algorithmic fundamental problem. For example, we are currently working on such a project within the scope of my current grant from the Russian Science Foundation (RSF). Typically, when searching for something in an image, the goal is to highlight a specific area or an object of interest. The most common example is facial recognition in photos, as many smartphones can outline faces and focus on them. However, in medicine, the task is much more complex. We don't just want to outline an area with a square but precisely delineate margins on an image (as in the case of a tumor). Moreover, the images we work with, MRIs and CTs, are three-dimensional. They more resemble a stack of pictures because the scanning is done level by level. If we want to solve problems like contour detection or segmentation, such as isolating a metastasis, on these stacks of images, it becomes challenging. It turns out that in medicine, people often search for not three-dimensional objects, for example, a sphere, but two-dimensional ones, surfaces or curves. There are currently no sufficient methods that can help highlight a complex surface on an image, having only the original image as input.
Another fundamental problem, also very important, is the following. Medical data has a specific feature. Images taken within one hospital are very similar to each other (yes, everyone's anatomy is undoubtedly different, but the style of images, their brightness, sharpness, typically matches). However, if you go to another facility, everything will look completely different. Now new areas of the image will be brighter, and others will be darker. From a human perspective, this is not a significant problem.
You and I can look at an image, and we can ask to explain if these white spots are demyelination foci of white matter, possible indicators of multiple sclerosis. Then we go to another hospital, look at their images, and essentially find the same spots. Algorithms can't do this. They dramatically break down when shown images that are fundamentally similar but different in style. This is an important science direction: how to devise algorithms that can better adapt between different data sources.
The second genre of research in this field involves very specific applied studies, the need for which arises daily for doctors in clinical practice, and they would benefit from the algorithmic assistance. Here, there are generally two potential positive effects. The first is that the algorithm can measure something automatically much faster and usually more accurately than it can be done manually. For example, doctors routinely outline tumor margins on MRI images before starting radiation therapy, and this task can be performed with the help of artificial intelligence. The specialist (doctor) will review and adjust the algorithm's results if necessary. We conducted such studies together with Burdenko Neurosurgery Center. A crucial result here is assessing how computer vision algorithms have changed the specialist's life. It turns out that thanks to the algorithms, we significantly improve the level of consistency among different doctors. Evaluating tumor margins is still a subjective procedure. Someone decides to include a suspicious-looking piece, while someone else does not. Now doctors have a scenario where they are first given a suggestion, and they can either agree or disagree.
The other, more apparent benefit is that we help identify associated pathologies, meaning the algorithm can "see" something the specialist may not notice. Here's a very practical example. During the pandemic, CT scans were routinely performed at a high rate. Doctors,being put in conditions of limited time and a specific task at hand, naturally focused on the lungs. They simply didn't have time to look at other organs, yet CT images of the chest show the heart, spine, major vessels, a piece of the liver, and many other organs where something might be abnormal as well. An algorithm that notifies the doctor if it notices something suspicious could be useful here.
These are two different branches. One involves fundamental research, where we start with some algorithmic problem. The other involves more practical implications, where we want to help doctors create a specific tool to help them work more effectively.

"I realized that data analysis in neuroscience is still too far from real life."

— So, your research is somewhat related to life sciences?
— What I do falls under the computer science category. My entire scientific group consists of people primarily with technical backgrounds. We know a bit about how medicine works, but the main requirement is still computer science. In this sense, our research is far from biology itself. We are rather on the path of finding productive partnerships. We understand the technical side well, and we need doctors to guide our energy in the right direction, explaining what needs to be done, what is crucial to them, and what is not. In a way, doctors are model users. They already know how to solve the problems they’re facing and what they lack.

— Algorithms began to develop around the 1960s, and image processing has been around for quite some time. How do things stand now in the field of AI in medicine?
— There was a breakthrough in computer vision, in the broad sense, about 6–7 years ago, which then began to spread into various fields, including medicine. So, at first, there was a lot of enthusiasm: everyone thought we would now solve a huge number of problems that couldn't be solved before. Now there exists some disappointment, or rather, a less biased reflection on what is happening. It turned out that the initial array of scientific papers that gave grounds to believe that everything would be just great was composed incorrectly in many ways to begin with. In particular, researchers would take data from one hospital and say: "Look, we developed a super algorithm." Then it turned out that in the neighboring hospital, the program didn't work at all. Medicine as an industry has long developed certain rules for testing efficacy. Multicenter trials are one form used for testing new drugs, for example, but their importance for artificial intelligence systems has only recently been recognized. And now, it seems, there is a real assessment of what algorithms can actually do and what they cannot.
At the same time, specialists are trying to understand more precisely what tasks remain unsolved. It turns out that they are numerous, and there is currently no all-in-one algorithm that could potentially replace a doctor, as some populists like to claim. Artificial intelligence has many limitations. It is still quite naive, and it is unclear where it will be able to progress in the coming years from this standpoint.

— Those are the challenges, but what about the biggest advancements in your field in recent years?
— If we talk about computer vision outside of medicine, a breakthrough occurred about 6 years ago in the ImageNet Challenge, one of the most prominent computer vision public competitions. At this competition participants are given around a million images, each depicting an object. There might be multiple objects, but one will be the main focus, for example, a dog. Each breed represents a class in this competition, and there are over a hundred breeds. The breakthrough happened when the accuracy of classification algorithms increased from about 60–70%, which was significantly inferior to humans, to 97–98%. Humans, by various estimates, make 3–4% errors. This means that the neural network turned out to handle the task better than a person does.
The next breakthrough was closer to artificial intelligence, but technologically still similar to the previous one. About 5 years ago, AlphaGo, a program created by Google DeepMind, defeated a human in the game of Go. Go has always been considered the most challenging game to construct an algorithm for, much more complex than chess because there are fundamentally more possible scenarios on the board.
In medicine, one of the most notable achievements was the solution to detection of lung cancer diagnostic features on chest CT scans (also from Google). It turned out that in the case of analyzing a single image (when a person comes into a medical facility, gets a scan, and the data is immediately sent for processing) the algorithm shows an even higher accuracy level than a doctor. In cases when the same patient had visited the facility before (which allows us tracking the dynamics), the performance of the algorithm and a doctor was approximately the same in efficacy.
Last year, an article was published in Lancet, the most authoritative and oldest medical journal, discussing how poorly the validation of artificial intelligence algorithms is done and how poorly it fits in with clinical requirements. Undoubtedly, there is potential in the technology, but claiming that it can already take on a significant part of the tasks is premature to say the least.
Medicine is a relatively conservative industry. The validation process which is required to enter it, whether with a new drug or new software, takes a certain amount of time. This fact somewhat slows down the innovations.

— What about interesting examples from Russia of applying AI to solve medical problems?
— Yes, of course there are examples. In Russia, the most well-known and well-organized experiment in the introduction of computer vision technologies has been ongoing for the past two years in outpatient clinics and city hospitals in Moscow. The Department of Health Services proposed that companies with their own computer vision solutions join the pool of medical images data (CT, MRI, fluorography, mammography) and provide doctors with the results of algorithmic work, either automatically highlighting certain pathologies or automatically conducting measurements and assisting doctors.
Initially, it was suggested that artificial intelligence would help find lung and breast cancer. However, then the pandemic happened, and in the end, the coronavirus was added to the list. The task was to understand how much the pattern observed in the lungs resembled coronavirus infection, whether there were other pathologies, and what percentage of lung tissue was affected. In the end, it yielded quite a curious result.
Of the 15 different Russian and foreign companies that participated in the project, 7 did not pass the initial testing, meaning their program either worked too slowly or produced poor results. From the remaining ones, 3 leaders were selected, which collectively processed data of about a hundred thousand people. I don't know of any project in the world which can compare in terms of its scale.

Photographer: Stas Liubauskas /
for “Life and Other Stories”

— So artificial intelligence is not here to replace a doctor yet. But is it a good assistant? To what extent can we rely on algorithms as of now?
— That's a very good question. Currently, there are actually few reliable assessments... There are some in scientific papers, but usually such assessments come with limitations. For example, data were taken from a particular hospital , the quality of the algorithm's performance was evaluated on that every dataset. However, it's far from guaranteed that when the product gets to market, it will yield the same high results. In this sense, Moscow serves as an independent testing ground for all algorithms suppliers, and therefore it will be curious to follow the publications.
There is a separate scientific center that analyzes the results, but I have not yet seen official reports yet. According to preliminary data, image description time has decreased by 20–30%, if I’m not mistaken. So, generally speaking, algorithms have already proven their usefulness. After all, this is quite a complex task: imagine you have 500 lung images, and you need to scrutinize each one for signs of disease such as "ground-glass opacities". Moreover, to make a diagnosis, one needs to assess the proportion of the lungs occupied by these "ground-glass opacities". It's quite a complex process and it’s difficult to be objective in it.
With the help of an algorithm, a doctor can simply look at the image, verify that the computer has identified everything correctly, and trust the calculations that have been made. And if the program has identified something incorrectly, then adjust the result somehow. Although, this is applicable to only a narrow task, one specific pathology. For instance, a radiologist, when describing the very same chest CT scan, must examine all the organs there, and there are many, as we have already discussed. They must assess whether there are any potential pathologies in the heart or spine, whether there are enlarged lymph nodes, what the major vessels look like, the aorta, the main pulmonary artery. In the lungs, there can be a huge variety of different changes besides viral pneumonia, which the doctor also needs to document.
Currently, the algorithms we have now cover a relatively small spectrum of tasks. Or they cover a relatively large spectrum, but different parts are designed by different manufacturers. Accordingly, there's no unified solution yet, just a diverse set of tools. Therefore, I would say that this is one of the most important open questions: what specific benefit does the healthcare system derive from these new technologies? It can be measured in minutes, seconds, or saved lives. So far, the assessments are quite weak. Weak not in the sense of algorithms working unsatisfactory, but in the sense of the evidence base being not developed enough.

— What are the leaps in artificial intelligence development associated with? Why is it suddenly possible to jump from 60 to 96%?
— A combination of several factors comes into play. The first is the growth in computational capabilities. All modern algorithms require very high quality hardware to be trained and then applied to specific tasks. Just in the last 10 years, there have been such major changes, (in particular, powerful modern Nvidia graphics cards have appeared), which have fundamentally changed how complex models can be trained. The complexity of the models has increased exponentially, while the training speed remains adequate. It's not like we need a hundred years to build one algorithm.
The second is that we have a lot of data now. That is, before there was no ImageNet Challenge, a million different images, there was not enough information for training the algorithms to begin with. And modern algorithms, although there is some progress in this area, still require a lot of data to actually learn how to solve a problem.
The third is the emerging of new classes of algorithms that either did not exist before, or were unpopular and under-researched. As it turned out, those algorithms can provide results of a fundamentally different level.

— Looking into the future, how do you think the field will develop?
— I think the main task now is a high level of generalisability of an AI tool we are to build. It's an interesting open issue: how to make the algorithm understand pathology based on a combination of all causes, not just changes in pixel intensity? How to teach it to transition from successful recognition of one lung pathology, of which it has seen hundreds of thousands of examples, to recognizing other diseases based on one or two images? A person can comfortably generalize their previous knowledge and say, "This is tuberculosis, and these are signs of chronic obstructive pulmonary disease," while the algorithm currently solves this task poorly. I believe this is the main research direction: a quick, easy learning process for the algorithm to detect new pathologies that it hasn't known before but now does.

"Firstly, I would like to help doctors. A doctor is a person who can be tired, sleep-deprived, unwell, and therefore prone to making mistakes. Just like all of us, all of us occasionally make mistakes."

— If we were to fantasize, what would you like? Do you maybe have an ultimate dream or something of the sort?
— Firstly, I would like to help doctors. A doctor is a person who can be tired, sleep-deprived, unwell, and therefore prone to making mistakes. Just like all of us, all of us occasionally make mistakes. An algorithm, even if not the most perfect, never gets tired, it works consistently all the time, from dawn till dusk, morning and night.
Secondly, I would like not only to automate some particular doctor’s tasks. This is definitely a solvable task. The question is mainly about the amount of data required. It's fascinating to try to understand if we can do something beyond human capabilities. For example, in radiology, particularly in oncology, the gold standard of diagnosis is usually not CTs or MRIs, but histologic examinations. That is, tissue is removed, looked at under a microscope, and it's understood, "Ah, there really is some kind of lesion here." When a radiologist learns to analyze images of potential cancer patients, they only have an image and nothing else.
In an algorithm, we can retrospectively incorporate not only data about the image itself with all its limitations but also data about the tissues from an histologic examination. Then we can make it find patterns that a human wouldn't notice. In other words, I would like to train the computer using the gold standard. And it's very intriguing to find tasks in which artificial intelligence, due to its features, could fundamentally change the ways of managing patients, diagnosis, and decision-making. That is, not just compensate for a doctor's inattention and overload, but also add something extra.

— Will we be able to solve some pressing issues in the next 10–20 years? Or is that still farther away?
— I think, in terms of algorithmic issues, meaning how to teach a computer to find new pathologies as quickly as a human or how to train it to generalize data, there will be good progress in 5 years. I assume that these problems will be solved by that time.
What will definitely remain an open question are more complex medical tasks. Right now, in most cases, it's enough to work only with images. Computer vision differs in other fields of science. For example, in automated control of driverless cars or robots, additional challenges arise because one has to interact with a dynamically changing environment. In this sense, medical images tell a static story, so there are no fundamental restrictions that would make us “hit a ceiling”. But as soon as we start talking about more complex medical tasks, those were one needs to analyze not only the image itself but also clinical record, which can be formulated not in a very well-structured text, the results of laboratory tests, and patient video recordings, let’s say, to understand if they have a tremor, things that a doctor sees with the naked eye and quickly understands, then everything becomes more complicated. But once again, I don't see fundamental limitations here, I’m sure progress will undoubtedly be achieved.

Photographer: Stas Liubauskas /
for “Life and Other Stories”

— Would you personally like artificial intelligence to replace a doctor?
— If you're asking whether I would prefer it or not, no, I wouldn't. As someone well acquainted with the limitations of artificial intelligence, I'm not ready to trust it with diagnosis or treatment. Therefore, I see all these tools as assistants, not replacements.
Certainly, there are quite a few routine tasks in the doctor's profession that can be automated. But just to be on the safe side, you need to double-check so that the algorithm didn't make a silly mistake somewhere. Here I can tell you a little anecdote. Colleagues from Moscow were testing various algorithms to work with lung cancer, they needed to find a lesion in the lungs, something that abnormally increased in size and looked like early-stage lung cancer. They shared a few amusing examples. For instance, lung cancer was found in the table for the tomography and also in the clasp of a bra. The funniest case was when lung cancer was found in someone’s chin. Imagine, during the scan, at some point, the entire body is lightened, there's a chin which looks like a small circle above the body, and the system identifies it as a tumor. These are all amusing examples, but they show that we need to keep an eye on technologies as for now.
And even if algorithms suddenly are improved significantly, for example, if 10 times more data is taken and algorithms are trained on it, there will still be some very complex tasks. We talked, for example, about diagnostics of lung cancer. But that's just the first step, and then comes the treatment! Oncology is probably the most difficult part to algorithmize because there's always a team of doctors involved in the case: not just the radiologist who analyzes images but also the surgeon, the chemotherapist, and they all together make decisions about the treatment strategy. And even if we imagine that we've come up with a perfect algorithm, helping the patient will be much more challenging. Errors will inevitably be made.

The interview was published on Biomolecula.ru on July 27, 2022.