CMU Researchers Build Personalized Models To Advance Precision Cancer Care

Even with access to cutting-edge data — sometimes as detailed as a patient’s entire genome — doctors often lack the insight needed to make the best treatment choices. Determining how a disease will behave in a specific patient remains one of medicine’s toughest challenges. Carnegie Mellon University researchers have developed a new way to help doctors make better, personalized decisions and predict how a disease or treatment might play out in the future.

Researchers from CMU’s School of Computer Science(opens in new window) developed a new approach to bridge this gap between available data and actionable insight, creating personalized models to help doctors better understand individual patients and improve their prognosis. Led by graduate student Caleb Ellington and Professor of Computer Science Eric P. Xing(opens in new window), the researchers published their work in the Proceedings of the National Academy of Sciences(opens in new window).

The team introduced contextualized modeling, a family of ultra-personalized machine learning methods, to build individualized gene network models for nearly 8,000 tumors across 25 cancer types simultaneously. These networks helped identify new cancer biology, revealing hidden cancer subtypes and improving survival predictions, especially for rare cancers. This development opens the door to more precise, individualized cancer treatment.

From data overload to personalized cancer models

A contextualized model is built to fit a specific biological or medical situation, factoring in unique patient details like genetic mutations, tumor characteristics and lifestyle. It helps doctors and researchers make decisions based on the circumstances that matter most for each individual case and allows them to generate new models for unseen contexts, simulating future disease developments or treatments. Xing’s group has been laying the groundwork for contextualized models for years, but this research is the largest study they have done.

Ellington said that a major problem in biology and medicine is that most modeling methods require a large patient population to produce a single model. This means that researchers can only consider a few factors when organizing patients into groups to study, or otherwise risk producing inaccurate models if the groups become too granular.

“This is a big problem because it lumps many patients together into big groups, even though we know diseases and treatments can work very differently in different people, especially for complex diseases like cancer, Alzheimer’s and diabetes,” Ellington said. “This leads to heated arguments about which factors are most important for grouping patients, and the result is still a model that doesn’t work very well anyway because it doesn’t acknowledge many differences between individuals. This is bad for researchers, doctors and patients alike.”

Ellington explained that contextualized models fix this by going beyond group-based analyses, learning to generate individualized models based on each patient’s unique profile, or context.

“This lets us consider many thousands of contextual factors simultaneously, even complex contexts like genetics, because contextualized models learn which ones are important for differentiating patients and understanding diseases, and which ones aren’t. No more arguments, better models, better medicine,” he said.

How contextualized models handle new and unknown diseases

Traditional models also fail to explain emerging or newly discovered disease types because the models are produced for fixed, predetermined patient groups. In contrast, the contextualized modeling approach used by Ellington and Xing is generative, producing models on demand for new contexts. In the study, the researchers applied this to predict gene behavior in types of tumors they had never seen before.

Xing noted that “there is a consensus that biology is composed of intricate multi-scale systems, consisting of molecules, cells, organs, individuals and ecosystems, but this intuition has been built case-by-case over decades. Contextualized modeling is the first general method that allows us to test this intuition rigorously, and the variation we see across individuals is astounding. The insights we gained creating contextualized models are also helping us build GenBio AI(opens in new window), where we are developing multi-scale simulators for all of biology, building up to an AI-driven digital organism or AIDO,” he added. “Our goal for the AIDO is not just to represent the average person, but to be able to simulate the unique aspects of my biology and your biology in our own contexts.”

Modeling uncommon cancers unlocks broader insights

The team proved their contextualized model’s effectiveness by applying it to thyroid carcinoma, a cancer known to be highly survivable. Because the prognosis for thyroid cancer is usually good, it may be understudied, and the overall survivability can hide rarer types with worse outcomes, Ellington said. By applying contextualized networks to predict patient-specific tumor models for thyroid carcinoma, they were able to identify a new type of the disease with a worse prognosis, which they hope can inspire new therapies.

“The approach is much more than thyroid cancer, though,” Ellington explained. “We learn individualized gene networks for each of the nearly 8,000 cancer patients in our study, which cover 25 cancer types, including lung, brain, stomach and more. Thyroid cancer emerged as an interesting case study, but there is a lot more biology and medical insight to be extracted here.”

By accounting for the similarities and differences between individual cancers in the modeling approach, the team can also learn more about cancer as a whole, Ellington said.

The group has also provided a web tool(opens in new window) for exploration and visualization of the pan-cancer dataset.

Embracing complexity yields better results

While the team applied their model to cancer in this study, contextualized networks have the potential to improve decision-making in any field where data is complex, messy or incomplete, which is often the case in health care and biology.

In most experiments, scientists can only improve accuracy in one way: by adding more samples of the same kind (like repeating a test multiple times). This is because traditional models struggle when faced with too many different conditions. In medicine, that limitation could mean missing a critical insight about how a treatment works for certain patients. Contextualized models allow scientists to improve accuracy in an entirely new way: by adding more variety to the conditions they study (different patient types, environments or disease stages).

Contextualized models do this by learning about the similarities and differences between different contexts and using that information to improve models in every context. As a result, contextualized models can handle thousands of different conditions at once, learning which ones matter most for predicting outcomes. The research team showed that contextualized models consistently outperformed traditional approaches, especially when working with messy, varied or limited real-world data.

“Scientists often focus on a few key conditions because it’s easier to analyze,” Ellington said. “What we're showing is that you don’t have to limit yourself — and in fact, you’ll get better, more useful results by embracing many different conditions. It makes your research more flexible, more insightful and ultimately helps more people.”

The future is contextualized

The researchers want to continue to develop the model, eventually using it to personalize treatment plans for patients. To accelerate the path from model to medicine, the researchers have made a toolkit publicly available at contextualized.ml(opens in new window).