Cormac Rea

Bridging The Gap: CrossTALK Bootcamp Unites Computational And Experimental Scientists For Drug Discovery

By: Sofia Mellou

The buzz around artificial intelligence (AI) in drug discovery is undeniable, but a major bottleneck remains. There are no openly available, high-quality, large-scale datasets needed to train machine learning (ML) models and advance drug discovery efforts. As the Structural Genomics Consortium (SGC) enters its third decade, it is tackling this challenge head-on by generating open science, ML-ready protein-ligand training datasets at an unprecedented scale. To support these efforts, SGC’s research site at the University of Toronto launched CrossTALK Bootcamp; a training program designed to bring together computational scientists and experimental researchers in a unique setting.

Funded by the Data Sciences Institute (DSI) at the University of Toronto as part of the DSI Emergent Data Sciences Program, this innovative program aims to train the next generation of drug discovery experts by providing them with the skills to interpret complex experimental data and harness AI-driven approaches. The program is led by a powerhouse team of professors: Matthieu Schapira, Rachel Harding, Mohamed Moosavi, Chris Maddison, Benjamin Sanchez-Langeling, Hui Peng, and Benjamin Haibe-Kains drawing expertise from pharmacology, chemistry, engineering, and AI.

What makes this initiative so exciting? It’s not just another training program; It is a hands-on, interactive experience where computational scientists step into the lab, and experimentalists take a 15-hour dive into the world of data science. The quarterly workshops feature dynamic sessions and lab visits, fostering real collaboration between two fields that often work in silos.

A Look Inside the CrossTALK Bootcamp Launch

The energy at the launch of the first Bootcamp last month was palpable. After an introductory overview of the program by Dr. Matthieu Schapira, Dr. Benjamin Sanchez-Lengeling’s question took the stage and set the tone: Why do we need molecules? From there, he took the audience on a whirlwind tour of the molecular discovery pipeline, where creativity, diversity, and scientific rigor collide to shape the future of early drug discovery. He emphasized that transformative breakthroughs require not just data and cutting-edge tools, but also the right people coming together to innovate.

Dr. Rachel Harding followed with a deep dive into the mechanics of experimental data generation and hit validation. Using the perfect metaphor of a key and a lock, she illustrated the complexity of molecular binding to its target protein and the crucial role SGC plays in validating hits. “If we combine AI with high-quality experimental validation, we can change the game in drug discovery,” she emphasized.

Excitement from the Experts

The enthusiasm for this initiative was evident when we caught up with two of the program’s leaders after the event.

“It’s thrilling to see machine learning gaining momentum in drug discovery. The response has been phenomenal. Over 140 applicants for our first Bootcamp cohort! We could only take 30 this time, but the demand is clear. There will be many more opportunities to join in following quarters and while this pilot initiative is focused on the University of Toronto at the moment, my dream is to expand it nationally and beyond,” Dr. Matthieu Schapira commented.

“What excites me most is the broadness of backgrounds and different disciplines among the participants. Seeing computational and bench scientists side by side, eager to learn from each other, is exactly what this field needs. Each cohort gets hands-on lab experience at SGC-Toronto, learning how we validate hits, produce proteins, and design assays. “This is just the beginning,” Dr. Harding added.

The registration for the second CrossTALK: Cross-Training in AI and Laboratory Knowledge for Drug Discovery is now open! The second 9-week bootcamp open to students, postdoctoral researchers and staff with computer or biological science backgrounds will take place from April to June, 2025. Interested individuals are encouraged to submit their applications, in order to secure their spot with complimentary registration.  

More information: https://datasciences.utoronto.ca/early-stage-drug-discovery/ 

Emergent Data Solutions: Harnessing Data Science and AI to Revolutionize Aging and Neurodegeneration Research 

By: Cormac Rea

There is an urgent need amongst healthcare researchers for creative solutions to address the challenges of caring for our growing aging population in diverse healthcare settings, including the need to predict disease development and treatment outcomes. Data science, including artificial intelligence (AI), can revolutionize how we understand the human brain, offering more affordable, precise tools for detecting neurodegenerative conditions.

However, AI’s integration into clinical practice faces a major barrier in terms of the multidisciplinary collaboration necessary to design, implement, and refine AI tools effectively. Data scientists working with predictive algorithm development may lack clinical context to tailor these tools for real-world healthcare workflows. At the same time, healthcare leaders must collaborate with both scientists and clinicians to ensure AI-informed decisions are sound and impactful. 

Enter Advancing Aging and Neurodegeneration Research through Data Science — a unique initiative by the Data Sciences Institute’s (DSI) Emergent Data Science Program that aims to bridge these gaps by fostering learning and training opportunities between data scientists, basic scientists, clinicians, and educators. The initiative is led by professors Rosanna Olsen (Rotman Research Institute, Baycrest Academy for Research and Education; Department of Psychology, Faculty of Arts & Science, University of Toronto); Malcolm Binns (Rotman Research Institute, Baycrest Academy for Research and Education; Dalla Lana School of Public Health, University of Toronto);  Bradley Buchsbaum (Rotman Research Institute, Baycrest Academy for Research and Education; Department of Psychology, Faculty of Arts & Science, University of Toronto); Jean Chen (Rotman Research Institute, Baycrest Academy for Research and Education; Department of Medical Biophysics, Temerty Faculty of Medicine, University of Toronto) and Kamil Uludag (Krembil Brain Institute, University Health Network; Department of Medical Biophysics, Temerty Faculty of Medicine, University of Toronto).   

The AI & Aging team will provide training and learning opportunities, bringing together data scientists, clinicians, educators, to explore the development of new areas of research, which may ultimately benefit the treatment and healthcare service. Additionally, the initiative will showcase experts in data science and aging, creating a forum to highlight and discuss key emergent issues.  

The program will launch this spring with a talk from a renowned researcher in neuroimaging and machine learning. Prof. Christos Davatzikos, Wallace T. Miller Sr. Professor of Radiology at the University of Pennsylvania, and Director of the recently founded AI2D Center for AI and Data Science for Integrated Diagnostics, will speak about Machine learning in neuroimaging: understanding the heterogeneity of brain aging and neurodegeneration, and building personalized imaging biomarker on March 27.  

Dr. Davatzikos has been the founding Director of the Center for Biomedical Image Computing and Analytics since 2013, and the director of the AI in Biomedical Imaging Lab (AIBIL). He oversees a diverse research program ranging from basic problems of imaging pattern analysis and machine learning to a variety of clinical studies of aging and Alzheimer’s disease, schizophrenia, brain cancer, and brain development. He is an IEEE fellow, and a fellow of the American Institute for Medical and Biological Engineering. 

“I am thrilled to co-lead this exciting new EDSP program, Advancing Aging and Neurodegeneration Research through Data Science, supported by the Data Science Institute at the University of Toronto,” said Olsen. “This initiative brings together experts who use cutting-edge AI and data-driven approaches to tackle some of the most pressing challenges in aging and neurodegenerative disease research.”  

We are especially honored to kick off the series with Dr. Christos Davatzikos, a true leader in AI-driven biomedical imaging, whose work is transforming how we understand and detect different types of brain disorders.”   

The DSI spoke with Dr. Davatzikos about his background, research focus, and the potential future uses of machine learning in aging.  

Tell us a little bit about yourself. How did you become interested in your area of research (neuroimaging, aging, machine learning)?  

CD: I went through education and training in engineering and computer science but was always interested in biomedical applications of technologies, especially in neuroscience. As machine learning methods were in their infancy in the 90s, I thought that they are the tools needed to help us… to see in the data what we can’t otherwise see. For example, to see brain signatures of neuropsychiatric and neurodegenerative diseases that cannot be detected visually and/ or are predictive of clinical outcomes. 

Do you have a favorite paper or research finding from your own group or from other researchers that you would like to share with us?  

CD: A recent paper in Nature Medicine on Brain aging patterns in a large and diverse cohort of 49,482 individuals is one of my favorites. It helps us understand the heterogeneity of brain aging trajectories, as well as their genetic, clinical and lifestyle correlates.   

What are you most excited about for the future of our field?  Do you anticipate any breakthroughs in the field of aging research in the next five years?  

CD: Among many potentially exciting directions, I am particularly excited about seeing more emphasis on prevention and early detection. Improving our understanding of the role of genetic and lifestyle risk factors, and being able to identify individuals at risk, can inform clinical trials and personal health management.  

Machine learning can play a significant role in this direction in many ways, two of them being the following: 1) it helps us develop endophenotypes, in part by looking at complex patterns of biomarkers of all sorts, and hence identifying individuals who not only have a risk factor, but who also seem to be “expressing” respective endophenotypes/patterns that have been linked with that risk factor; 2) it helps us build predictive models of future brain and clinical trajectories.   

Another exciting direction is that of using machine learning methods for drug repurposing and development, by learning more about genetic correlates of brain aging and associated neuropathologic processes and identifying drugs that can slow down these processes.  

Since the development process for applied machine learning tools requires multidisciplinary input across an array of clinical, measurement and data experts, do you have suggestions for optimizing collaboration and communication across professionals with different immediate goals? 

CD: As other similar technical fields, which have become an integral part of medicine and biomedical research (e.g. medical physics and biostatistics), I think that a new generation of biomedical scientists and clinicians will emerge: people who have cross-training and interests in both data science/AI and biomedical domains.  

Do you have any thoughts on sustainable AI, in health research and beyond? 

CD: AI is a technology that will become an integral part of our daily lives, including medicine and biomedical research, pretty much like other technologies from farming machines and the automobile, the cellphone and the internet. As such, we will have to develop mechanisms that constantly maintain and enhance AI tools. Due to its nature, AI is a technology that continuously adapts and learns from new data and new knowledge: the more we use it, the better it will become. 

Emergent Data Sciences Program 
Through the Emergent Data Science Program, DSI funds a broad span of activity that can lead to the development of innovative data science methodologies, deep connections with computation and applied disciplines, new training programs, collaboration, knowledge mobilization, and impact beyond the academy. Applications for the 2025 program are now being accepted. LOI Deadline: March 28   
Learn more about the application process.

Upskill Canada Boosts Investment and Propels Growth for the Data Sciences Institute’s In-Demand Skills Certificates  

By: Cormac Rea

Certificates that are equipping hundreds of professionals with skills in data science and machine learning software have received a vote of confidence with renewed investment from their government partner. 

Following a dynamic launch year offering in-demand skills training and career wayfinding for professionals, Upskill Canada has invested a second wave of funding for the University of Toronto’s Data Sciences Institute’s (DSI) Data Science and Machine Learning Software Foundations certificates – bringing the overall investment to $3.9M by 2026. This key funding will enable 680 total participants to access critical training over two and-a-half years, preparing them for jobs in key innovation sectors.

“Our ongoing partnership with DSI allows more workers to gain the knowledge and skills necessary for the jobs of tomorrow,” said Ann Buller, Interim CEO of Palette Skills. “The program has received such excellent feedback from employers and students alike — we are thrilled to offer continued support.”

In the first year of this DSI certificate, which pairs technical skills with job-readiness support and strong employer connections, almost half of all graduates secured job success via new employment, received promotions, or transitioned into new roles within six months of completion. 

“The confidence shown in the DSI through this renewed investment reflects our success at connecting learners with the immense demand for data science literacy and skills at the heart of the Canadian digital economy,” says Lisa Strug, Director of the Data Sciences Institute and Professor in the Departments of Statistical Sciences and Computer Science (Faculty of Arts & Science) and the Division of Biostatistics (Dalla Lana School of Public Health) at the University of Toronto (U of T). Strug is also a Senior Scientist at The Hospital for Sick Children.

The DSI certificates are an initiative of Upskill Canada, powered by Palette Skills and funded by the Government of Canada. Upskill Canada is designed to meet the talent needs of high-growth sectors to increase productivity and innovation in Canada. 

“We are proud to receive further financial support, allowing our work in targeting training in key areas of data science and machine learning to evolve, and increasing the available data science talent pool in Canada across a range of sectors,” added Strug.  

Equipping workers with these skills creates new career pathways for Canadians and better positions Canadian companies to compete both domestically and internationally. The funding will enable the DSI to continue its mission to accelerate the impact of data sciences, leveraging U of T’s global reputation in data science and machine learning.  

“Coming from a non-technical background, this journey has been both challenging and incredibly rewarding,” said David Vaz, who completed the Machine Learning Software Foundations Certificate and started a new job in October 2024 as a Manager of Strategic Initiatives and Partnerships at Skills for Change.  

“The Certificate has equipped me with a comprehensive foundation in data science and machine learning, covering everything from fundamental programming to cutting-edge AI applications.” 

Given the need for data science training across a range of sectors, the certificates are designed to empower participants with the skills needed to succeed in cutting-edge careers.  

“Looking ahead, I’m particularly excited about exploring applications of computer vision and NLP to create human-centered AI solutions,” added Vaz. “My goal is to contribute to the growing field of AI, focusing on developing tools that enhance and support people’s daily lives.” 

“I found the job readiness sessions extremely helpful in updating my LinkedIn profile and resume, allowing me to better highlight my skills, experience, education, and certifications,” said Zarrin Rasizadeh, who completed the Machine Learning Software Foundations Certificate and was recently hired.  

“Additionally, the mock interviews were invaluable in boosting my confidence and preparing me more effectively for actual job interviews.” 

Both DSI certificates offer foundational concepts in data science and machine learning and provide opportunities for practical application through employer case studies. Each certificate also includes sessions dedicated to career advancement, from support for resume writing to networking and interview skills development. 

About the Data Sciences Institute Upskilling Certificates 

The certificate modules and job readiness sessions are offered part-time over 16 weeks, allowing learners time to balance existing commitments and still accomplish their career goals. The training is offered to learners at a substantially reduced rate of $525 (+HST) per certificate, thanks to the support of Upskill Canada. The DSI has also committed to accessibility funding for those with financial need. To learn more for upcoming sessions: https://certificates.datasciences.utoronto.ca/ 

Tackling Liver Transplant Inequalities: Expanding a Data Sciences Institute Project Nationally

Photo (L-R): Rahul G. Krishnan (Assistant Professor, Computer Science and Laboratory Medicine and Pathobiology, Faculty of Arts & Science, Faculty of Medicine, University of Toronto and Faculty Member, Vector Institute); Mamatha Bhat (Assistant Professor, Division of Gastroenterology, Temerty Faculty of Medicine, University of Toronto and Clinician-Scientist, Multi-Organ Transplant Program, University Health Network)

By: Cormac Rea

Few experiences inspire panic and fear as much as a time spent in a hospital waiting to be seen for a serious medical procedure.  

Yet, despite ongoing advances in medical science and modelling, patients often remain dependent on limited assessments and data modeling to determine if they even qualify for certain medical interventions.  

Liver transplantation is a critical intervention for patients with end-stage liver disease. But current systems for prioritizing patients on the transplant waitlist create inequities, particularly for women, older patients, and those with some advanced conditions like non-alcoholic steatohepatitis (NASH) or cholestatic liver disease.

Supported by a Data Sciences Institute catalyst seed grant and co-led by investigators’ Rahul G. Krishnan (Assistant Professor, Computer Science and Laboratory Medicine and Pathobiology, Faculty of Arts & Science, Faculty of Medicine, University of Toronto and Faculty Member, Vector Institute) and Mamatha Bhat (Assistant Professor, Division of Gastroenterology, Temerty Faculty of Medicine, University of Toronto and Clinician-Scientist, Multi-Organ Transplant Program, University Health Network), DynaMELD and DynaCOMP is a coordinated effort between clinicians and computer scientists to address specific issues with liver transplant wait-times and patient selection.

“By applying advanced deep learning techniques to large and often complex datasets, the DynaMELD and DynaCOMP models aim to better predict patient outcomes, reducing mortality on the liver transplant waitlist, and using data sciences to offer a more just allocation process for all patients,” said Gary Bader, DSI Associate Director, Research and Software.

The project blends data sciences with health research and modelling as a driver for positive social change, a mandate also at the core of DSI funding ethos through catalyst seed grants.

The team has published part of their work on DynaCOMP at the 2024 Machine Learning for Healthcare conference. Leveraging their DSI seed funding, the researchers were awarded Canadian tri-agency funding and are currently in the process of external validation, using new data sets from different hospital systems and provinces.

“As of February this year, we were awarded a five-year CIHR Grant to expand the scope of DynaMELD to collect data from across Canada,” said Krishnan. “It has really launched a pan-Canadian idea to collect data from Alberta, from Quebec, from BC and the Atlantic provinces, in order to see how different risk scores perform on their data as well.”

But how exactly will DynaMELD and DynaCOMP address issues of inequalities in the current system with respect to liver transplant wait-times and patient selections?

“Let’s say you have 50 individuals who are all waiting for a liver,” said Krishnan. “Doctors need some number to guide them as to who should be ranked first or second or third on the transplant wait list. It’s a number that clinicians sat around the table and came up with about two decades ago.”

“So you have this score that’s been developed, and over the course of time, the score has become less calibrated since the population it was originally designed for has changed. It does not assess risk of mortality as well on women as it does on men, or for patients whose clinical condition deteriorates rapidly. We started rethinking how to calculate this score and, using what we know about AI and machine learning, wondered – what would a new score look like?”

The existing metric, known as the Model for End Stage Liver Disease (MELD)-Na score, can sometimes fail to accurately capture the severity of illness in certain groups, leading to a higher risk of waitlist mortality. Using clinical data from the University Health Network, Krishnan and Bhat used machine learning tools to develop DynaMELD, a more precise and equitable scoring system. The focus of this study included the development of new data science methodology on how changes in patients’ physiological status could be incorporated into risk scores predictive of mortality on the liver transplant waitlist.

“DynaMELD captures not just a patient’s risk of mortality but also their risk of accelerating in terms of likelihood of mortality through changing dynamics over time,” said Krishnan.

“In addition, we wanted to provide clinicians with an early warning system if the subsequent soft tissue graft was not functioning as intended – to create a similar risk score – and that motivated the DynaCOMP part of the project.”

After an individual receives a liver transplant, a common problem that clinicians are often faced with is the likelihood of soft tissue graft failure; DynaCOMP addresses this question.

“We’re very grateful to have received funding from DSI to pursue this project,” Krishnan concluded.

“You need to show evidence that in some sense you put in an effort to de-risk the project before applying for funding and the initial results that we’ve got supported by DSI were very important towards that end.”

Data Sciences Institute Galvanizing Data Science Applications in Early Stage Drug Discovery

By: Cormac Rea

While data science is driving breakthroughs in countless areas, the lack of availability of experimental training data has limited its impact on drug discovery. In particular, there is a need to help data scientists understand experimental drug discovery data, ask the right questions, and decide for themselves on the best answers.  

The Data Sciences Institute (DSI) has awarded the Galvanizing Data Science Applications in Early Stage Drug Discovery proposal as an Emergent Data Science Program, which funds researchers to energize, support, and advance data science.  

The Early Stage Drug Discovery Program will build bridges between data scientists and drug discovery experimentalists – two communities that typically do not speak the same language – by providing training to expose data science trainees to the next frontiers in drug discovery and galvanize a new generation of scientists into a space poised for machine learning-driven transformation. 

The initiative is  led by University of Toronto professors: Matthieu Schapira (Department of Pharmacology and Toxicology, Temerty Faculty of Medicine and the Structural Genomics Consortium); Rachel Harding (Leslie Dan Faculty of Pharmacy, and the Structural Genomics Consortium); Mohamed Moosavi (Department of Chemical Engineering & Applied Science, Faculty of Engineering & Applied Science); Chris Maddison (Department of Computer Science and Department of Statistical Sciences, Faculty of Arts & Science and Vector Institute) and Hui Peng (Department of Chemistry, Faculty of Arts & Science).  

Recent advances in machine learning (ML) are poised to have a transformative impact along the drug discovery and development trajectory, including finding the best protein target for a given disease, discovering and optimizing drugs and selecting patients most likely to respond to a given treatment,” says lead researcher Matthieu Schapira.  

The Early Stage Drug Discovery program will build bridges between data scientists and drug discovery experimentalists, two communities that typically do not speak the same language.  

Offering quarterly workshops on data science for hit-finding that include interactive sessions and lab visits where data scientists will learn about data generation and experimentalists will learn about data analysis, the program launches on January 31 2025 with the CrossTALK Bootcamp 

The bootcamp includes workshops to explain the chemical library screening process and associated data challenges in which participants will use their ML models to retrospectively retrieve blinded hits. 

“Supporting emergent areas of data science is a core activity of the Data Sciences Institute that helps to fulfil its mission of bringing people together for collaborative generation and application of new ideas in the data sciences,” says David Lie, DSI Associate Director, Thematic Programming. 

DSI met with Prof. Schapira to learn more about this Emergent Data Science Program:  

From a personal or professional perspective, could you explain what led you and your collaborators to propose this as an emerging data science program to the Data Sciences Institute? 

MS: A challenge for machine learning (ML) in early-stage drug discovery is the lack of publicly accessible, large and consistent data sets to train ML models, but efforts are underway to fill this gap, which will lead to new opportunities for data-science driven drug discovery. A new initiative at The Structural Genomics Consortium (SGC) aims to screen up to 2000 proteins against billions of molecules using two experimental platforms well-established in the pharmaceutical industry: DNA-encoded libraries (DEL) and Affinity Selection Mass Spectrometry (ASMS). A network of AI experts around the world committed to exploiting these data for early-stage drug discovery is rapidly growing at https://aircheck.ai/mainframe. As the SGC, in partnership with our industry partners, is poised to become a leading generator of open-science protein-ligand data, our goal is to ensure that the data science and drug discovery breakthroughs made from our U of T-generated data are not all made elsewhere. Our goal is to position Canada at the forefront of this breakthrough. This grant will enable a pilot project to train the next generation of data scientists at U of T. If successful, we will then expand this program at Universities across Canada.

Our experience with the ML divisions of pharmaceutical companies has revealed that understanding the genesis of the data is critical to elaborate efficient machine learning strategies, and a challenge. Conversely, we believe that it is critical for bench scientists to share a common language with data scientists to better provide guidelines for the reliable interpretation of experimental data.

Our solution is to galvanize Canadian data science trainees around open science data for drug discovery, and pair them with experimentalists. We will organize four bootcamps each year where experimentalists and data scientists team-up and learn together how experimental training datasets are generated, how ML models are built and used to predict bioactive molecules, and how predicted molecules are tested experimentally.   

What are some of the main challenges to bringing together researchers, trainees and students interested in this computational work? 

MS: Most participants will be graduate students and post-docs, though staff are welcome as well… and many PIs say they are keen to attend, though each bootcamp is ~20 hours, which is a real time commitment! I believe pairing experimentalists and data scientists will have a positive impact on the learning curve. Our first bootcamp starts in February, so we’ll see how things go. 

What would you like to see coming out of the CrossTalk bootcamp? 

MS: There is no question that ML will transform the way life sciences are conducted and the speed at which discoveries are made. Canada cannot afford to miss this departing train. U of T is privileged to have a pool of exceptionally talented ML trainees.  

I hope this program will provide some tools for data scientists and experimentalists at U of T and beyond to harness the waves of chemical data that are bound to accelerate early-stage drug discovery. The 2024 Nobel prize in Chemistry highlighted the first steps in this direction.