Recent advances in algorithms provide physicians with new tools to predict, diagnose, and even treat disease. School of Data Science faculty and students are at the forefront of these advancements.
Algorithms create sets of rules for processing software to follow, allowing software to sort and analyze data. These algorithms developed in the medical field integrate new forms of data, including how patients talk about their symptoms, as well as very high resolution images that can be enlarged to the nuclear level.
Engineering professor Don Brown, founding director of the Data Science Institute, is working on the development of these algorithms and their implications for medicine.
“[The algorithm development] allowed us to examine and better diagnose diseases,” Brown said. “For example, when you look at an image of a biopsy, it’s hard to read that image. So it’s much easier for us to use computers to understand what’s going on in images like that.
Deep learning models computerize processes that humans do naturally, such as identifying pictures of dogs and cats. Image identification functions can also be applied to analyze medical data. For example, an image of a biopsied cell will have characteristics that will cause a doctor to identify the cell as abnormal or healthy.
These common features, or patterns in the images of healthy cells versus abnormal cells, are used as guidelines for the algorithm. The algorithm can then sort the new images and label them as healthy or abnormal, creating a deep learning model. The advantage of using a model is that many more images can be analyzed quickly.
Data Science graduate student Saurav Sengupta collaborated with University peers and others in Zambia, London and Pakistan on a synthesis project that applied these models to the diagnosis of celiac disease.
“We were able to build a model that could predict with high accuracy whether the picture we were seeing was a celiac disease picture, or a normal picture, or environmental enteropathy,” Sengupta said. “We had to categorize each image into the three classes and see if there was any medical information that could be gained when we study these patterns.”
Part of the Sengupta model worked on classified images of environmental enteropathy, a chronic inflammatory bowel disorder. These algorithms are now used to analyze a wide variety of diseases, including Barrett’s disease, Crohn’s disease, and Alzheimer’s disease at the School of Data Science.
“If you make the prediction that the person has a disease, you need to be very sure of that prediction and you need to be able to explain why you made that decision,” Sengupta said. “A lot of state-of-the-art real-world methods don’t really have these things and the main challenge for us is to make the models more explainable so that they give you a high degree of accuracy.”
The role of the physician in this process also remains important. Dr. Sana Syed, pediatric gastroenterologist at U.Va. Health, uses artificial intelligence for pattern recognition in biopsy images.
“You have to have a human because there are all these bias limitations,” Syed said. “And then the other thing is that an algorithm can’t tell you what to do if something goes wrong. So a human has to be part of it, but it can improve your decision-making.
Bias, or the model producing predilections for certain outcomes, comes from not having a large enough or representative data set, Syed said. ImageNet, a research project created by Professor Fei-Fei Li of Stanford University, allows researchers to train image recognition models and has had a huge impact on the field, according to Syed. The power of ImageNet comes from its use of an extremely large data set consisting of 15 million data points. The larger the data set a model is trained on, the more accurate the model is likely to be when it encounters new data.
The next steps in research at the intersection of data science and medicine are to improve the accuracy of these models. Researchers from the School of Data Science and U.Va. Health are working together to improve this technology and continue to apply it in a medical setting.
“There’s a lot of work to be done to improve the algorithms and better understand the characteristics of the algorithms so that we can drive those improvements,” Brown said. “There’s a lot of work to be done to develop these kinds of techniques — these kinds of data science machine learning techniques — that will do an even better job of predicting, diagnosing, and classifying.”