by Sara Elhawash
Can reproducibility pave the way for groundbreaking advancements in the field of precision oncology and transform cancer treatment decisions? A resounding answer emerges from an exciting reproducibility project born out of the DSI Student-Led Reproducibility Challenge. This project, led by DSI members and Professor Benjamin Haibe-Kains (University Health Network and Medical Biophysics, Temerty Faculty of Medicine, University of Toronto) and Bo Wang (Department of Laboratory Medicine & Pathobiology, University of Toronto) and a team of U of T student researchers including Emily So and Grace Fengqing Yu, is currently making significant strides in advancing research within the field.
Reproducibility and Reusability in Action
In a recent Reusability report Evaluating reproducibility and reusability of a fine-tuned model to predict drug response in cancer patient samples published in Nature Machine Intelligence, the team successfully reproduced and applied a new Artificial Intelligence (AI) method, called Transfer of Cellular Response Prediction (TCRP), originally published by the Ideker group at the University of California San Diego, in Nature Cancer in 2021, to clinical trial data.
The project originated from the DSI Thematic Program in Reproducibility, which aims to raise awareness of reproducibility, including a Student-Led Reproducibility Challenge in 2022. Given the increasing utilization of large-scale and intricate datasets and computational methods across various disciplines, the challenge of reproducibility has come to the forefront. Establishing reproducibility standards for research has emerged as a foundational aspect of data science. Therefore, it becomes essential to clearly articulate and widely integrate standards for open, reproducible research with big data. This integration is crucial not only within the University of Toronto but also on an international scale.
Emily So, a master’s student and co-researcher, reflects on the importance of reproducibility and open science principles in the context of groundbreaking methods like AI and machine learning, “In agreement with FAIR (Findability, Accessibility, Interoperatibility and Reusability) principles well established in research, usually new articles will come with data and computer code available for the scientific community. To fully understand the impact of new innovations and uncover their applications to new scientific problems, it is imperative that available resources are fully reproducible and can produce expected results easily.“
The DSI Student-Led Reproducibility Challenge attracted researchers and trainees dedicated to exploring reproducibility. “DSI support was instrumental in organizing the Challenge where students showcased their efforts in reproducing key papers in the field of engineering, social and health sciences. Emily So and Grace Yu were part of one of these teams. Their results were so exciting that we decided to push the analysis further and publish it as a Reusability Reports in Nature Machine Intelligence,“ says Benjamin Haibe-Kains.
“We were able to demonstrate the gaps that exist in open science for computational biology. This outreach made available by the DSI has allowed our group to project our experience to the scientific community as well as provide further rationale for our subsequent documentation about our project,“ says Emily So.
The team’s work aims to address two key objectives: confirming the performance of the TCRP model in its published context and expanding its application to a larger compendium of preclinical pharmacogenomic and clinical trial data.
Through extensive evaluation, the researchers found that the TCRP method surpassed established statistical and machine learning approaches in predicting drug response in novel clinical contexts. This remarkable finding highlights the superiority of TCRP in both preclinical and clinical settings.
“Our results highlight the immense potential of the TCRP method and its ability to outperform existing approaches. This opens new avenues for optimizing clinical trial design and improving patient outcomes,“ says Haibe-Kains.
In the field of precision oncology, ensuring the reliability and generalizability of new techniques in clinical settings is crucial. Reproducibility studies play a vital role in verifying claims made by predictive models, while reusability studies assess their applicability in diverse contexts. The publication of the Reusability Report in Nature Machine Intelligence signifies a significant step forward in promoting reproducibility and reusability in the field.
“Our work emphasizes the importance of reproducibility and reusability, which are essential for advancing precision oncology. By documenting new data contexts and exploring the model’s reusability, we can drive further progress in tailored cancer treatments,“ says Haibe-Kains
“Reproducing the results of this method was no easy task, but it provided a glimpse into the power and impact it could have. It was an exciting endeavor to explore the possibilities of this machine learning approach,“ shares Emily So, masters student and co-researcher.
Collaboration, Transparency, and Future Applications
The impact of this work extends beyond the research community. The study’s reliance on open science principles, where authors share their code and data, highlights the importance of collaboration and transparency. By making their materials publicly available, the researchers contribute to education, enabling the training of future health data scientists, bioinformaticians and computational biologists.
Emily So emphasizes the potential future applications of their models, “This evaluation is timely because there is a potential future application of these models in assisting clinicians in the treatment decision process. Setting a reproducibility standard is crucial to properly evaluate machine learning approaches suitable for preclinical and clinical settings, ultimately optimizing the course of action for patients.“
With the successful reproduction of the TCRP model and its outperformance of existing approaches, the potential for optimized clinical trial design and improved patient outcomes becomes a tangible reality.