SUDS Student Call 

May-August 2026

Call for student researchers!

The Data Sciences Institute (DSI) welcomes carefully selected undergraduate students from across Canada for a rich data sciences research experience. Through the SUDS Research Program, undergraduate students, who are interested in exploring data science as a career path, have an exciting opportunity to engage in hands-on research supervised by DSI member researchers across the three UofT campuses.

The DSI is strongly committed to diversity within its community and especially welcomes applications from racialized persons/persons of colour, women, Indigenous/Aboriginal People of North America, persons with disabilities, LGBTQ2S+ persons, and others who may contribute to the further diversification of ideas.

Below are the SUDS research opportunities for May-August 2026. You can apply and rank your top three choices.

See here for information on eligibility, award value and duration, and SUDS programming.

Research Opportunities

Research description:

Type Ia Supernovae are calibrated standard light beacons that enable us to measure distances across cosmic time. These distances encode the expansion history of the Universe; however, one of the biggest challenges is finding a “pure” sample of these supernovae, given that many things explode in the night sky, and only some of those are useful cosmological probes. The Vera C Rubin Observatory is a telescope that takes images of the sky and will find hundreds of thousands of these objects, contaminated by other light sources.  Our group is working on a fully Bayesian supernova cosmology analysis pipeline to process the incoming Rubin data. 
 
There are many aspects to this analysis, including parametrizing supernova rates over time, modelling supernova spectra, and more practical considerations such as optimizing the analytic and numerical runtime, and performing coverage tests. Depending on the SUDS Scholar's interests and strengths, your tasks could include developing statistical tests to determine the accuracy of the Bayesian model, using conformal prediction or similar methods to improve quantified uncertainties, performing an independent analysis on an alternate supernova dataset, or optimizing the code for accuracy or performance.
 

Researcher: Renee Hlozek, University of Toronto, Faculty of Arts and Science, David A. Dunlap Department of Astronomy and Astrophysics

 

Skills required:

  • Python programming and a keen interest in rigorous analysis of real data.  
  • Previous experience with JAX or high performance computing and Bayesian analysis are helpful, but not required.
  • Knowledge of astronomy and supernovae are also helpful, but not necessary.

Primary research location:

  • University of Toronto St. George Campus and/or remote

Research description:

Concussion affects over 400,000 Canadians annually, with up to 30% experiencing prolonged post-concussion symptoms that disrupt recovery and quality of life. Early follow-up is critical but is frequently delayed by months due to clinician shortages across Canada and limited access to specialized urban centers, resulting in symptom exacerbation, prolonged disability, and greater strain on healthcare systems. AI-driven platforms have the potential to automate triage, summarize clinical information to inform clinicians, and support clinical decision-making, yet current systems lack multimodal sensing, clinical validation, and workflow integration. This project enhances the validated Acute Concussion Triage Agent (ACT-A), a multilingual, privacy-preserving web platform that conducts adaptive interviews, analyzes affective and behavioral cues, and generates structured summaries and recommendations for clinician review. ACT-A integrates retrieval-augmented generation (RAG)-based recommendation agents built on secure Microsoft Azure-hosted large language models to produce evidence-based next-step decisions. These structured summaries and recommendations are designed to reduce clinician workload, enabling more focused, efficient, and higher-quality patient interactions, while allowing clinicians to allocate more time to complex or high-priority cases. Through multimodal data fusion, prompt-engineered summarization, and clinician-in-the-loop validation, ACT-A will reduce triage delays and establish a scalable, agentic-AI framework for equitable, intelligent concussion care.

The SUDS Scholar's responsibilities would primarily involve: Supporting the ongoing development of large language models (LLM) and agentic AI; Deploying LLMs to analyze concussion patients' interview and historical health data, generate structured summaries and next-step care recommendations for clinicians; and, Assisting with the deployment of the developed LLM to the project’s cloud to be tested with real patients.

Researcher: Shehroz Khan, University Health Network, Toronto Rehabilitation Institute (KITE)

Skills required:

  • React JS web application development, Large Language Models, Prompt Engineering, Agentic AI, Retrieval-Augmented Generation, Machine Learning, Deep Learning

Primary research location:

  • University of Toronto St. George Campus and/or remote

Research description:

Canadians spend nearly 90% of their time indoors, where they are exposed to various airborne contaminants. Indoor air quality (IAQ) has a significant impact on health and overall quality of life. However, analyzing and understanding IAQ in diverse indoor environments remains challenging due to missing information about key factors such as contaminant generation rates, air mixing, and airflow patterns between spaces. Building on last year’s successful DSI SUDS project, this research continues the development of physics-informed machine learning (ML) methods to better understand IAQ dynamics. This year’s project will extend the previous work by refining and validating probabilistic ML models using data collected from a controlled experiment in the Twin Suites Rooftop Lab, where ground-truth information about the key factors affecting IAQ dynamics is measured. The focus will be on improving the models’ ability to estimate these factors under uncertainty. Probabilistic programming will serve as the overarching framework to integrate data-driven inference with domain knowledge.

The SUDS Scholar will work with Professor Jeffrey Siegel (CIVMIN, IAQ expert) and Professor Seungjae Lee (CIVMIN, ML expert in building science). While the project primarily focuses on the analysis of IAQ data, the SUDS Scholar will also have the opportunity to participate in the IAQ data collection.

Researcher: Seungjae Lee, University of Toronto, Faculty of Applied Science and Engineering, Department of Civil and Mineral Engineering

Skills required:

  • Proficiency in Python, with experience using essential data science libraries (e.g., scikit-learn).
  • Preferred:
    • Experience with PyTorch/Tensorflow.
    • Experience with handling time series data.
    • Foundational understanding of machine learning and probability theories.
    • Experience with high-performance computing.
    • Experience with Git for version control.
    • Interest in building science and IAQ applications.

Primary research location:

  • University of Toronto St. George Campus and/or remote

Research description:

This project explores the use of Causal Prior-Fitted Networks (CausalPFNs) and Large Language Models (LLMs) to better understand treatment heterogeneity in clinical trial datasets. CausalPFNs are transformer-based models trained on diverse simulated data-generating processes that can estimate causal effects directly from observational or experimental data without additional tuning. Applying CausalPFNs to clinical trial data enables automatic estimation of conditional average treatment effects (CATEs), revealing patient subgroups that respond differently to interventions. Meanwhile, LLMs can process and interpret unstructured clinical documents, such as trial protocols and patient narratives, to extract relevant covariates and contextualize causal findings. By combining CausalPFNs’ quantitative inference with LLMs’ interpretive capabilities, the project aims to build a unified framework for automated causal analysis and clinical insight generation. The outcomes will include validated pipelines for identifying heterogeneous treatment responses, interpretable summaries of causal results, and guidelines for integrating language-based reasoning with causal machine learning—advancing personalized medicine and evidence synthesis.

The SUDS Scholar would be programming and conducting systematic studies on how combinations of foundation model representations enable decision making with causalpfns. This would require comfort with packages such as PyTorch Jax and concepts in deep learning.

 

Researcher: Rahul Krishnan, University of Toronto, Faculty of Arts and Science, Department of Computer Science

 

Skills required:

  • Strong skills in causal inference, deep learning (PyTorch/JAX), and transformer architectures, with familiarity in clinical data analysis and Bayesian reasoning.
  • Experience with LLMs, text extraction, and data preprocessing is essential, alongside statistical literacy, scientific writing, and the ability to interpret model outputs in biomedical contexts.

Primary research location:

  • University of Toronto, St. George Campus

Research description:

The nervous system is essential for generating and coordinating complex motor behaviors that are critical for animal survival and reproduction across species. We are using C. elegans and mice to study how components of the nervous system, from the molecular to the circuit level, determine its properties and generate the complex behaviors. We have developed strategies to monitor and control the components of the nervous system in real time, both in living, behaving animals and in isolated neuronal tissues. These approaches combine genetic mutants, calcium imaging, electrophysiology, optogenetics, and immunohistochemistry to investigate the structure and function of nervous system. With these tools, we are able to examine how molecular and cellular components of the nervous system affect animal development and behavior. One challenge we face is implementing automated tracking, segmentation, and quantification of specific behaviors of interest. We have developed imaging setups for the behaviors of interest. 
 
The SUDS Scholar will work on developing an automated pipeline for characterizing and quantifying animal behavior based on our imaging setups. The student will collaborate with our team and partners who are currently building machine learning algorithms to address these challenges.
 

Researcher: Mei Zhen, Lunenfeld-Tanenbaum Research Institute

Skills required:

  • Proficient in either image processing, algorithm development, or statistical analyses.
  • Knowledge in programming is essential.
  • Students interested in applied math and physics are strongly encouraged to apply, but the key ingredient is a strong drive to learn and apply all the above to real biological problems.

Primary research location: 

  • University of Toronto St. George Campus and/or remote

Research description:

This project offers an opportunity for a motivated undergraduate student to explore global biodiversity change using established long-term ecological datasets.
 
The SUDS Scholar will work with large-scale biodiversity and environmental data collected from international monitoring programs and open repositories. Using data science tools and methods—including, but not limited to, data cleaning, visualization, and statistical or time series analyses—the student will investigate spatial and temporal trends in species diversity, abundance, and distribution. The student will learn best practices for handling complex ecological data, explore reproducible workflows, and contribute to the development of analytical pipelines that help quantify global biodiversity loss or recovery. The project will encourage critical thinking about data quality, scale, and uncertainty, as well as the broader implications of biodiversity change for ecosystem health and sustainability. This opportunity is ideal for students interested in combining computational skills with environmental science to address urgent global challenges through data-driven research.
 

Researcher: Tianna Peller, University of Toronto, Faculty of Arts and Science, Department of Ecology and Evolutionary Biology

Skills required:

  • Strong analytical, coding, and organizational skills, with an interest in applying data-driven methods to ecological and environmental questions.
  • Familiarity with time-series or spatial data, statistical analysis, and integrating multiple datasets to assess potential drivers of observed patterns are considered valuable assets.

Primary research location:

  • University of Toronto St. George Campus and/or remote

Research description:

Cities worldwide are investing in green infrastructure to enhance resilience, improve livability, and reduce carbon emissions. However, the financial value of urban greenness, how it translates into tangible economic benefits, remains underexplored. This project will quantify the economic and financial impacts of urban vegetation by integrating satellite-based greenness indices with housing market data and financial modeling techniques across the Greater Toronto Area.
 
Using multi-temporal 10-m Sentinel-2, the SUDS Scholar will calculate the vegetation index a representing vegetation density and distribution. These spatial greenness metrics will be merged with housing price datasets (from CREA, MLS, or municipal open data) to evaluate the relationship between environmental quality and property value. Using spatial regression and hedonic pricing models, the project will estimate the “green premium”, the monetary contribution of vegetation to housing prices after controlling for confounding factors (e.g., proximity to transit, schools, and employment centers). Building on these relationships, the student will apply financial modeling to translate environmental benefits into investment value. Scenarios will simulate how future urban greening initiatives or carbon pricing policies might influence neighbourhood-level property and ecosystem service value. The analysis will culminate in spatial visualizations and financial summaries quantifying how urban greenness contributes to climate resilience and economic prosperity.
 
Researcher: Yuhong He, University of Toronto, University of Toronto Mississauga, Department of Geography, Geomatics, and Environment
 
Skills required:
  • Proficiency in GIS and remote sensing.
  • Experience with satellite imagery analysis.
  • Basic knowledge of statistical modeling
  • Data integration from open housing datasets and census sources.
  • Ability to conduct spatial data cleaning, visualization, and mapping.
  • Interest in urban sustainability, environmental economics, and climate policy.
  • Strong written and analytical skills.

Primary research location:

  • University of Toronto Mississauga Campus and/or remote

Research description:

This project seeks to develop an automated, data-driven workflow for interpreting X-ray photoelectron spectroscopy (XPS) dataset of the solid-electrolyte interphase (SEI) in high-energy-density batteries—often dubbed the “Mona Lisa” of battery interfaces due to its chemical complexity and analytical opacity. Despite XPS being a cornerstone technique for SEI characterization, its interpretation is plagued by overlapping spectral features, mixed oxidation states, and subjective, non-reproducible analysis methods that hinder scientific consensus. By embedding advanced data science techniques—such as automated signal processing, dimensionality reduction, and probabilistic modeling—into the core of the XPS workflow, this project will produce an open-source, Python-based software toolkit that enables interpretable and reproducible spectral analysis. The toolkit will detect anomalous features, suggest candidate species with quantified uncertainty, and facilitate transparent, modular exploration of SEI chemistry. Aligned with the Data Sciences Institute’s mission to promote fair, ethical, and reproducible data practices, this project fosters interdisciplinary collaboration between electrochemistry and data science. By openly disseminating tools and annotated datasets, it democratizes access to advanced analytical capabilities, accelerating innovation in sustainable energy storage and advancing the development of safer, more efficient batteries. 
 
The SUDS Scholar will lead the development of an automated, data-driven workflow for interpreting XPS spectra of battery solid–electrolyte interphases (SEIs). Responsibilities include designing signal-processing and machine-learning pipelines for peak deconvolution, feature extra ction, anomaly detection, and uncertainty quantification; developing a modular, open-source Python toolkit with robust documentation and testing; curating and standardizing XPS datasets using FAIR data principles; validating models against reference standards and expert interpretations; and working closely with electrochemists to ensure chemical and physical relevance. The Scholar will also implement reproducible research practices, maintain transparent version-controlled workflows, and support knowledge dissemination through documentation, tutorials, and publications.
 

Researcher: Weilai Yu, University of Toronto, Faculty of Applied Science and Engineering, Department of Chemical Engineering and Applied Chemistry

 

Skills required:

  • Skilled in Python programming, data analysis, and signal processing, with familiarity in machine learning, dimensionality reduction, and probabilistic modeling.
  • Experience with Git, reproducible workflows, and scientific visualization is valued.
  • Interest in spectroscopy, materials science, or electrochemistry is an asset, alongside strong documentation and interdisciplinary communication abilities.

Primary research location:

  • University of Toronto St. George Campus and/or remote

Research description:

In-situ synchrotron X‑ray instruments perform material characterization to determine properties such as phase nucleation and transformation under controlled heating. However, the complexity and amount of data from synchrotron X-ray diffraction (XRD) make the analysis challenging. This project develops a high-throughput computational workflow for automated extraction of key structural features from XRD data, including crystallinity, peak parameters, and phase-transition temperatures. 
 
The SUDS Scholar will apply data science approaches such as distribution modeling and signal processing, as well as supervised/unsupervised machine learning methods, to evaluate physics-based candidate features and indicators. After identifying a workflow, students will work to automate the analysis for compatibility with high-throughput experimentation, identifying the phase evolution processes and corresponding structure information. Students will also practice software engineering skills necessary to document the workflow in an open-science framework. Final outcomes include open, reproducible analysis that accelerates materials discovery and demonstrates core data science competencies: algorithm design, scalable computing, and automated knowledge extraction.
 

Researcher: Jason Hattrick-Simpers, University of Toronto, Faculty of Applied Science and Engineering, Department of Materials Science and Engineering

 

Skills required:

  • Experience with Python, Machine Learning Knowledge, GitHub, Data Visualization

Primary research location:

  • University of Toronto, St. George 

Research description:

Driven by advances in AI, several groups in astronomy are developing large Foundation Models for Astrophysics, large general purpose ML models for performing many tasks. Our group is involved these efforts, in particular connecting these models to natural language models (LLMs). Little work has been done, however, in evaluating the performance of these models. The aim of this problem is to develop a set of benchmarks across a range of astrophysical applications (gravitational lending, galaxy morphology, photometric redshift determination, stellar parameter determination) to test the performance of current and future models. 
 
The SUDS scholar would work on gathering relevant benchmark data sets from the astronomical literature, starting from some that we have already used and then expand to others, write code to run these through existing astronomical foundation models (such as AION-1), and create summary statistics and visualizations of the foundation models' performance on these benchmarks. Finally, the SUDS Scholar will create an easily accessible resource for others to run the benchmarks on their own models (e.g., sharing it on huggingface).
 

Researcher: Jo Bovy, University of Toronto, Faculty of Arts and Science, David A. Dunlap Department of Astronomy and Astrophysics

Skills required:

  • Excitement about how AI can be used for Science!
  • Python programming
  • Familiarity with LLMs (ChatGPT, Claude)
  • Some basic astrophysics background is useful

Primary research location:

  • University of Toronto St. George Campus 

Research description:

In recent years, quantum computing companies, such as IBM, have proposed distributed quantum computing as a strategy to scale current quantum processing unit (QPU) technologies. While companies have moved towards this strategy, the field of quantum computing as a whole tends to favor monolithic algorithms intended to run on single QPUs. Like other subfields in the quantum sciences, quantum machine learning has largely followed this trend. To this end, we propose benchmarking the performance of various types of quantum circuits used for machine learning, including quantum circuit learning, neural networks, support vector machines, and kernel learning. In this study, we will explore the expressibility, entanglement, and magic of partitioned quantum circuits, distributed using local operations via gate and wire cuts. If time permits, we will also study the effects of various communications channels, such as single or bidirectional local operations with classical communications, local operations, and quantum communications.

The SUDS Scholar will first engage in preliminary tasks such as reading literature, performing tutorials and training sessions, and gaining an understanding of the existing workflows and work environment. The student will compile vital literature related to quantum algorithms and distribution and develop an understanding of what has been done in the field to help prepare an overview of which algorithms will be suitable for distribution. The student will then begin distributing suitable algorithms for the benchmarking study. The benchmarking study will require the student to synthesize insights from literature and compile data from internal distribution frameworks. By the end of the project, the student will have developed skills related to software development, version control using Git, and the real-world application of distributing quantum algorithms within a research environment. The final, expected product is a fully functional and open-source codebase for distributed quantum algorithms, achieved by maintaining an up-to-date Github repository with proper version control and documentation. The student will engage in writing a draft from the beginning to have a significant write-up completed before end.

Researcher: Hans-Arno Jacobsen, University of Toronto, Faculty of Applied Science and Engineering, Edward S. Rogers Sr. Department of Electrical and Computer Engineering

Skills required:

  • Python (NumPy, Matplotlib, seaborn, pandas).
  • Preferred:
    • Experience with quantum software, especially Qiskit or PennyLane.
    • Strong writing, reading, and presentation skills.
    • Time management and effective teamwork in a highly collaborative environment.

Primary research location:

  • University of Toronto St. George Campus

Research description:

The feedback between hosts and their microbial communities (microbiomes) is important for both host and microbial fitness. Hosts provide space and metabolites to microbes. In return, microbes can greatly affect host traits and fitness. Microbiomes can benefit plant hosts by enhancing growth, improving resistance to environmental stress, and increasing resilience to pathogens. However, the role of host factors and the degree to which they exert “control” over the functional and taxonomic diversity of microbiomes is not well understood. We designed a 20-strain synthetic community using bacteria isolated from common duckweed (genus Lemna) to investigate the role of host feedback in structuring plant microbiomes. We characterized and sequenced these microbial strains, and collected experimental data on microbe and host performance under different community configurations. Host presence was also manipulated to compare outcomes in the presence and absence of the plant host.

 The SUDS Scholar will characterize microbial metabolic functions using genomic data and model genomes together with experimental data to predict outcomes of microbial species interactions, and investigate the role of plant host feedback in these interactions. In summary, the intern will help build a bioinformatic pipeline to better understand microbiomes and their effects on hosts.   

Researcher: Megan Frederickson, University of Toronto, Faculty of Arts and Science, Department of Ecology and Evolutionary Biology

Skills required:

  • The SUDS student will need to become proficient writing Bash scripts in a Unix shell
  • Previous coding experience in R or Python is required
  • Familiarity with concepts in ecology, evolutionary biology, genomics, bioinformatics, and genetic databases
  • Ability to work independently and effectively communicate
  • Strong organization and data management skills

Primary research location:

  • University of Toronto St. George Campus and/or remote

Research description:

This project's aim is to develop the computational infrastructure required to support distributed quantum algorithms research, in particular implementations of quantum machine learning. 
 
The SUDS Scholar will help design an emulation framework of quantum processing units to manage the large volumes of data produced by distributed quantum algorithms. The framework will integrate GPU acceleration and inter-QPU communication to model how quantum information is shared across a quantum network. By introducing configurable hardware information, the environment aims to substitute physical distribution of quantum systems. The resulting datasets will be used to test and refine distributed QML workflows, focusing on how model information propagates through interconnected systems. During the project, the student will implement and experiment with components of the emulation environment. This involves data analysis of experimental quantum information; logging, visualization, and reproducibility within the environment. This work will support the development of scalable infrastructure for future quantum machine learning and algorithms research.
 

Researcher: Hans-Arno Jacobsen, University of Toronto, Faculty of Applied Science and Engineering, Edward S. Rogers Sr. Department of Electrical and Computer Engineering

 

Skills required:

  • Proficiency in Python and experience with scientific and data-oriented programming.

  • Proficiency in Rust and experience with graph partitioning and transpilation algorithms.

  • Familiarity with GPU or distributed computing is helpful.

  • Interest in quantum computing, data infrastructure, and/or scalable machine learning systems is helpful.

Primary research location:

  • University of Toronto St. George Campus and/or Remote

Research description:
Across the world, including in Canada, many of the thousands of human languages are under threat of falling completely out of use. Communities that speak these languages are under constant pressure to completely switch to languages like English, French, Mandarin, and so on. Speech technology such as speech-to-text and text-to-speech can have a role to play in helping communities maintain their languages, by making it easier to index and search audio recordings for educational and cultural preservation, and by facilitating the use of the language across communities that are becoming increasingly fragmented and dispersed. However, building speech technology requires training data, both in the form of audio recordings and text, and, for most of the world's languages, very little high-quality training data exists. This continues to pose serious problems. 
 
In this project, the SUDS Scholar will work to improve speech-to-text in challenging poorly-resourced languages spoken in Canada, such as Faetar (Franco-Provençal) and Inuttitut (Labrador Inuit), using innovative approaches making critical use of language models both large and small. Approaches will centre around the fact that, to be useful in these communities, automatic transcription needs to work at two levels: lexical (what word was uttered) and phonetic (how exactly it was pronounced).
 
Researcher: Ewan Dunbar, University of Toronto, Faculty of Arts and Science, Department of French
 

Skills required:

  • Experience with machine learning using neural networks in PyTorch
  • Excellent software development skills in Python
  • Experience with model evaluation, data analysis, and interpretation
  • Ideally:
    • Experience with computer speech processing
    • Knowledge of linguistics
 
Primary research location:
University of Toronto St. George Campus and/or Remote
 

Research description:

The province of Ontario in Canada has one of the greatest densities of lakes in the world. Sustaining and managing these important populations is vital for maintaining the ecosystem and allowing species persistence despite harvesting. A key measure that fisheries managers require is species abundance. This allows them to understand how abundances change spatially and temporally in response to various stressors and to implement effective management strategies. Traditionally, fish population abundances are tracked through invasive capture methods which require time, labour, and material investments and result in the mortality of many fishes.Hydroacoustic surveying has become an alternative to invasive capture methodologies and is currently being tested by the Ontario Ministry of Natural Resources as a possible alternative approach. In this, sonar is used to locate organisms and objects in the water and the sound emitted at up to 400 distinct frequencies bounces off organisms back to a receiver. These signals received may act as a species “fingerprint” allowing the classification of species and abundance calculations. Automating species identification from acoustic responses “remains the ‘Holy Grail’ to acoustic researchers”. Achieving species recognition through hydroacoustic processes will revolutionize the monitoring and management of commercially important fish populations in Ontario and beyond.

The SUDS Scholar will attend weekly meetings with the supervisor; Use a GitHub repository to organize code and data; Write code in python to run deep learning and other machine learning models; and, Prepare presentations on the research.

 

Researcher: Vianey Leos Barajas, University of Toronto, Faculty of Arts and Science, Department of Statistical Sciences

 

Skills required:

  • Program in python or willing to learn
  • Taken some courses in statistics and/or machine learning

Primary research location:

  • University of Toronto St. George Campus and/or remote

Research description:

Self-driving labs (SDLs) are next generation experimental systems that are fully autonomous and where robots orchestrated by AI agents conduct experiments, analyze the results and accordingly design and execute the next round of experiments. The medicinal chemistry SDL at University of Toronto includes automated chemical synthesis and experimental testing of drug-like molecules within active learning cycles where experimental results inform the design of the next batch of molecules through Bayesian optimization.
 
In this project, the SUDS Scholar will retrospectively use datasets of known active molecules to dry-test and refine a multi-fidelity Bayesian optimization protocol that integrates AI-driven computational chemistry to predict the activity of drug-like molecules before they are synthesized and tested. Multiple forms of data representation, objective and acquisition functions will be tested within a gaussian process and refined. Low affinity oracles will rely on molecular simulations such as binding free energy prediction.
 

Researcher: Matthieu Schapira, University of Toronto, Temerty Faculty of Medicine, Department of Pharmacology and Toxicology

 

Skills required:

  • Proficient in python, have strong bases in machine learning and ideally some knowledge in chemistry.

Primary research location:

  • Structural Genomics Consortium, MaRS Building, 101 College St.

Research description:

Our research focuses on the neural underpinnings of imagining future events. This ability relies on complex knowledge structures called schemas - knowledge about the world that scaffolds imagination. We recently developed a natural language processing (NLP) metric that assesses the schematicity of narratives describing imagined events. We now plan to combine this NLP tool with fMRI collected while participants describe aloud their imagined events.
 
The student on this project will have the opportunity to work with multi-echo MRI data that, when combined with advanced denoising techniques to remove motion artifacts resulting from speaking, enable more sensitive detection of BOLD (blood oxygen level dependent) signal. Processed fMRI timeseries data will be analyzed in conjunction with NLP timeseries data to determine whether the production of highly schematic narrative content modulates activity in brain regions thought to underpin schema, such as medial prefrontal cortex. 
 
The SUDS Scholar will have the opportunity to work with experts in cognitive neuroscience, biomedical imaging and NLP, and will become familiar with neuroimaging methods as well as best practices in neuroinformatics and open science. The analytic approach for combining NLP and fMRI will provide a platform for future neuroscience research on the imaginative brain.

Researcher: Donna Rose Addis, Baycrest

Skills required:

  • Advanced computer programming skills (e.g., Python, shell scripting, Matlab)
  • Data analysis skills including machine learning and time series analysis
  • Usage of Linux operating system
  • Effective oral and written communication skills
  • Ability to work independently and within a team
  • Beneficial to have:
    • Neuroimaging analysis experience

Primary research location:

  • Hybrid

Research description:

Recently, we have obtained single cell transcriptomes from the primitive and definitive neural stem cells (both quiescent and activated) that make the brain. We will do computational analyses of these transcriptomes to determine common and differentially expressed genes among the various states of neural stem cells. Genes that are expressed differentially among the states and fates of neural stem cells will be tested using loss and gain of function manipulations to determine the functional roles of these genes on neural stem cell behaviour. 
 
The SUDS Scholar will be working with our data set using R(Studio) to use the Cogent NGS Discovery Software and programs such as Seurat to find ways to differentially mark the different neural stem cell populations or find transcription factors that are important for stem cell identity.
 

Researcher: Derek Van Der Kooy, University of Toronto, Temerty Faculty of Medicine, Department of Molecular Genetics

Skills required:

  • Proficiency in R

Primary research location:

  • University of Toronto St. George Campus

Research description:

This project offers a hands-on opportunity for an undergraduate student to contribute to an innovative computational social science project examining how popular ideas about love and relationships—such as “love languages,” “soulmates,” and “chemistry”—circulate across digital communities. 
 
The SUDS Scholar will work with Dr. Impett in the Relationships and Well-Being Lab www.emilyimpett.com. The student will help collect and analyze publicly available Reddit data using Communalytic, a no-code platform for social network and text analysis. Under close mentorship, the student will learn how to clean and prepare large-scale social media datasets, conduct basic topic modeling and network mapping, and assist in interpreting visualizations that reveal how popular ideas about relationships spread online. The project bridges psychology and data science, providing training in computational social science methods. The student will also gain experience in open science practices, including reproducible workflows and accessible research communication. This opportunity will suit students interested in social psychology, relationships, data science, or digital culture, and will provide valuable skills for future graduate study in psychology or computational social science.
 

Researcher: Emily Impett, University of Toronto, University of Toronto Mississauga, Department of Psychology

Skills required:

  • Strong writing and analytical skills
  • Curiosity about relationships and online behavior
  • Attention to detail
  • Prior experiences in computer programming (e.g., R or Python) and computational social science methods (e.g., natural language processing) are preferred. 

Primary research location:

  • University of Toronto Mississauga and/or remote

Research description:

Morphogenesis refers to the physical assembly of cells to shape primordial tissues during embryonic development. How tissues and organs form in the embryo will serve as guides to understanding diseases and accelerating regenerative strategies. Although important insights into mechanisms of morphogenesis have been made through experimentation, the process is inefficient and there is much to be learned.Our lab combines students and postdocs from the biological, engineering, and physical sciences to pose and test biophysical hypotheses about morphogenesis. To accelerate our progress, we wish to narrow the parameter space of potential mechanisms in silico and test new ideas with focused empirical measurements and observations. To that end, our lab generated a provisional computational model of morphogenesis. Our model simulates individual cells as composites of finite elements to allow integration mechanical properties and forces that we can measure in vivo. Currently, the model is built on CPUs and is limited to the interaction of about 20 cells. Our goals are to: 1) incorporate a method of simulating epithelial and subsurface tissue layers, and 2) to transpose the model to GPUs to efficiently simulate hundreds or thousands of cells. 
 
The SUDS Scholar would 1) collaborate with a physicist research associate to devise a computational method of simulating apical-basal cell polarity in the finite element model, and 2) transpose the code from CPU to GPU.
 

Researcher:Sevan Hopyan, The Hospital for Sick Children, Developmental, Stem Cell, and Cancer Biology

 

Skills required:

  • Motivated by the problem,
  • Coding skills to advance our finite element model from CPUs to GPUs.
  • Some familiarity with developmental biology and finite element methodology would be beneficial but is not required.

Primary research location:

  • Hybrid

Research description:

This project involves conducting data analytics and designing optimization algorithms to develop and solve a resource allocation problem in the power systems domain. Its deliverables will improve the efficiency and equitability of the process for restoring power in affected areas of an electric power network. We use power systems data to develop optimization models and metrics for a trade-off between efficiency and fairness in power restoration decision-making. While efficiency (e.g., minimizing overall energy loss) is essential, incorporating fairness ensures an equitable distribution of resources across all impacted regions. In this remote project, the SUDS scholar will work under the guidance of data science and optimization faculty experts to complete a series of weekly research tasks.
 
After receiving some training on relevant machine learning and optimization models, the SUDS Scholar will be assigned weekly tasks that may include data analytics, formulating optimization models, implementing machine learning and Gurobi optimization models, conducting computational and data-intensive experiments on real and/or synthetic datasets, and developing exact or heuristic optimization methods to improve solution efficiency and accuracy. The expected outcome of this research is contributing reliable, open-source, and reproducible models and algorithms to the broader research communities in power systems optimization and data science.
 

Researcher: Samin Aref, University of Toronto, Faculty of Applied Science and Engineering, Department of Mechanical and Industrial Engineering

 

Skills required:

  • Fundamental data science knowledge
  • Python and its data science libraries
  • Formulating and solving optimization models
  • GitHub
  • Desired skills (to have or acquire during the project):
    • Discrete and network optimization
    • Gurobi Python libraries for optimization
    • Familiarity with electric power grids and power distribution systems
    • Academic writing and research skills

Primary research location:

  • Remote

Research description:

We aim to use low-cost wearable technologies (consumer electronics technologies) to use data science to improve health and quality-of-life, such as prediction of seizures, managing and mitigating health difficulties, such as ADHD, autism spectrum, visual and cognitive impairment, and spinal injuries, and the like, using wearable technology such as the InteraXon Muse S Athena brain-sensing headband that combines EEG (ElectroEncephaloGram) with fNIRS (Functional Near Infrared Spectroscopy).  We also aim to research wearable technologies in combination with micromobility (e.g. self-driving brain-controlled standing-spinal-support wheelchair) and therapy (e.g. integral kinesiology, water-walking (water-rollator walker), balance exercises, and the like, using EEG; fNIRS; XR (eXtended Reality) biofeedback, fall-sensing, and fall-mitigation.  Our approach will include not just classical brain-sensing (harmonic analysis in phase space) but also more modern methods such as wavelets and chirplets, including the adaptive chirplet transform which embeds machine learning into the mathematical transform rather than merely as a post-processor of transformed brain data.  This approach to data science aims to reach the Heisenberg uncertainty limit in generalized (chirplet) phase-space, and fully utilize fine-grained real-time data science capabilities made possible by wearable always on technology.

The SUDS Scholar will assist with development of the data science analysis system software, being written in Kotlin, as well as the mathematical analysis prototype being written in Octave, and algorithmic development. The student will be responsible for data collection and organization as well as assisting in development of the technological framework and execution for collecting brainwave data and running experiments on the data, as well as scientific analysis of the data. The student is expected to publish the results in ACM and IEEE publications.

Researcher: Steve Mann, University of Toronto, Faculty of Applied Science and Engineering, Edward S. Rogers Sr. Department of Electrical and Computer Engineering

Skills required:

  • Programming expertise to assist in writing mobile apps (e.g. Android, Kotlin, Java, etc.) as well as fundamentals such as harmonic analysis in phase space, wavelets, chirplets, signal processing, and the like, combined with hands-on physical "making" skills ("hacking" wheelchairs, EEG, fNIRS, GNU Linux, BASH, etc.).

Primary research location:

  • University of Toronto St. George Campus and/or remote

Research description:

Return-to-work outcomes are critical in helping injured workers re-enter the workforce and avoid long-term health and economic challenges. Medical chart review is an essential step in assessing workplace injuries and rehabilitation outcomes, yet it is time-intensive and prone to variability. Advances in Artificial Intelligence (AI), particularly large language models (LLMs), offer new opportunities to streamline chart review and improve decision-making. However, concerns remain about the accuracy, fairness, and ethical use of AI in healthcare.This project will investigate the performance of AI models in extracting clinical and socio-demographic information from workplace injury medical charts. We will evaluate model sensitivity, specificity, and fairness, with particular attention to biases that may affect equity-seeking worker groups. By comparing AI-driven chart review with human annotation, we aim to identify both the potential and the risks of using AI in occupational rehabilitation. 
 
The SUDS Scholar will play a key role in this project by supporting data annotation, bias assessment, and literature review, while gaining hands-on experience in applying data science tools to real-world healthcare problems. The results will inform future external funding applications focused on ethical, responsible, and equitable integration of AI in Canadian healthcare.
 
Researcher: Behdin Nowrouzi-Kia, University of Toronto, Temerty Faculty of Medicine, Department of Occupational Science and Occupational Therapy
 
Skills required: 
  • Basic programming (Python/R), data cleaning, and literature review skills
  • Attention to detail for chart annotation and bias assessment
  • Strong written communication
  • Interest in AI, healthcare, and equity
  • Willingness to learn and contribute in a multidisciplinary research team.

Primary research location:

  • Remote

Research description:

This project applies data-centric AI to address a key challenge in chemical research: the scarcity of structured, machine-learning-ready data. A significant bottleneck in the field is that vast amounts of chemical information—spanning small molecules, proteins, reactions, and knowledge graphs—remain locked in unstructured formats within literature and databases.
 
The SUDS Scholar will learn and apply state-of-the-art data extraction and curation techniques to create foundational datasets. A core focus will be pioneering modern data documentation practices not yet widely adopted in chemistry. The student will create comprehensive data cards and model cards to ensure transparency and responsible use. Furthermore, they will format these datasets using the Croissant ML metadata framework, a new standard for making datasets discoverable and 'ML-ready.'This work will directly contribute to accelerating AI-driven discovery in chemistry and provide the student with unique, hands-on experience at the intersection of data science and chemical research.
 

Researcher: Benjamin Sanchez-Lengeling, University of Toronto, Faculty of Applied Science and Engineering, Department of Chemical Engineering and Applied Chemistry

Skills required:

  • Proficiency in Python and familiarity with Jupyter Notebooks.
  • A foundational background in either data science or chemistry; expertise in both is not required.
  • Experience with data manipulation libraries (e.g., Pandas) is a plus.
  • Prior machine learning experience is highly desirable.

Primary research location:

  • University of Toronto St.George Campus and/or Remote

Research description:

The human claustrum is a thin subcortical structure often called the brain’s “most mysterious nucleus”. Though typically treated as a single entity, whether the claustrum contains internal subsections remains a matter of longstanding interest and debate. For example, two widely referenced, expert-drawn atlases map the overall claustrum similarly but propose subsections that differ in number, location, and shape. Clarifying the existence and nature of subsections matters because each may have specialised functions or vulnerabilities relevant to brain disorders and neuromodulation.This project seeks to discover novel subsections by leveraging three cutting-edge, super-resolution claustrum segmentations comprised of three-dimensional voxelized intensity data, derived from (i) ex vivo histology at 100µm, (ii) ex vivo MRI at 100µm, and (iii) in vivo MRI at 250µm. References:https://bit.ly/SUDSreferences
 
The SUDS Scholar will lead data-driven exploration by extracting radiomics features, selecting the most informative through dimensionality reduction, performing spatially-constrained clustering, assessing stability, and visualising results. With strong progress, the student may co-author a manuscript presenting the novel subsections and interpreting them in relation to prior atlases, known connectivity patterns, and emerging evidence of claustral subsections in non-human animals.
 

Researcher: Kamil Uludag, University Health Network, Krembil Research Institute

 

Skills required:

  • Interest in in neuroscience and statistics, comfortable writing Python code versioned on GitHub and run on a compute cluster.
  • Tasks include handling NIfTI files (NiBabel), extracting radiomics features (PyRadiomics), performing statistical analysis (SciPy, scikit-learn), and using 3D visualisation software (ITK).

Primary research location:

  • Hybrid

Research description:

Site-specific recombinases are powerful tools for genome engineering, enabling precise DNA rearrangements such as insertions, deletions, and inversions. However, predicting which recombinase variants will act efficiently on new DNA target sites remains a challenge. This project aims to develop a supervised machine learning model that predicts recombinase-DNA compatibility based on sequence input. 
 
Using an existing dataset of experimentally characterized recombinase-DNA pair activities, the SUDS Scholar will design encoding schemes for protein and DNA sequences, evaluate different architectures such as multilayer perceptrons, gradient-boosted classifiers, and simple linear classifiers, and compare their predictive performance using cross-validation and independent test sets. The project will emphasize the programming and optimization of the model rather than experimental data generation. Expected outcomes include a reproducible computational pipeline and a trained model capable of identifying sequence features that predict active recombinase variants for novel DNA target sites.
 

Researcher: Evgueni Ivakine, The Hospital for Sick Children, Genetics and Genome Biology

 

Skills required:

  • Programming experience in Python, close familiarity with machine learning frameworks, and a basic understanding of molecular biology and protein-DNA interactions.
  • Skills in data preprocessing and model evaluation will be essential for building and interpreting predictive models of recombinase activity.

Primary research location:

  • Hybrid

Research description:

Current methods for developing biocatalysts for sustainable applications require multiple rounds of design and experimentation with significant trial and error steps. Bioengineering can be accelerated significantly if systematic design, testing can be used to narrow the range of candidates that require testing. While there are many deterministic methods for modeling metabolic networks they fall short in their ability to predict the cumulative effect of higher order modifications typically required to enhance the production of desired compounds relevant for applications, for example, adipic acid required for bionylon synthesis. Hence, there is a need to develop hybrid methods that combine deterministic methods with data driven methods that can account for incomplete biological knowledge. Here we aim to use multi-modal large language models including DNA, and protein language models that can be used to predict and correct the current shortcomings of deterministic models. We aim to build on several in house datasets including a pandemic dataset of all metabolic reactions and augment this dataset with curated models derived from gold standard databases such as SWISS PROT and BiGG. We will then use these high quality metabolic network datasets to develop hybrid methods that can predict the impact of genomic modifications on physiology.

The SUDS Scholar will work to develop pipelines to process and curate data from different biological domains specific for microbial metabolism. The scholar will then work with other group members to apply existing pipelines for data analysis and modeling and also facilitate the development of new data analysis methods.

 

Researcher: Radhakrishnan Mahadevan, University of Toronto, Faculty of Applied Science and Engineering, Department of Chemical Engineering and Applied Chemistry

 

Skills required:

  • Strong coding skills and be able to query existing genomic databases and be familiar with SQL, Python.
  • Any knowledge of biochemistry and chemistry and familiarity with chemical structures representations such as SMILES will be helpful.
  • Student should be prepared to work in multidisciplinary teams

Primary research location:

  • University of Toronto St. George Campus and/or remote

Research description:

Many older adults experience hearing difficulties that impact understanding of everyday speech, yet current AI tools for assessing speech comprehension remain too coarse for clinical applications. This project will explore how multi-agentic AI systems can provide more human-like, reliable, and sensitive evaluations of participants’ responses, such as story recalls or qualitative interviews about listening experiences. 
 
The SDUS Scholar will design and implement a coordinated team of specialized AI agents that collaborate to assess these responses: a Manager Agent will assign subtasks and coordinate workflow; a Semantic Analyst will quantify factual accuracy and gist retention; Critiquer Agents will evaluate coherence, inferencing, and topical relevance; Reliability Agents will cross-check consistency between evaluators; and a Referee Agent will integrate and calibrate the group’s consensus, explaining divergences and producing a final interpretive score benchmarked against human expert ratings. Together, these agents will emulate expert-panel reasoning rather than relying on a single-model output. The student will program and evaluate these agents using Python-based frameworks and large language models, testing their ability to detect subtle comprehension deficits from transcribed speech. This work will lay the groundwork for a new AI-assisted assessment platform that more accurately reflects human understanding, advancing both basic auditory-cognitive research and clinical diagnostics.
 

Researcher: Björn Herrmann, Baycrest

 

Skills required:

  • Advanced computer programming skills (Python or MATLAB)
  • Experience with large language models
  • Effective oral and written communication skills
  • Inter-cultural competence
  • Ability to work independently and within a team
  • Beneficial:
    • Background in artificial intelligence
    • Experience with multi-agentic AI and implementation
    • Interest in auditory research

Primary research location:

  • On-site

Research description:

Voltage imaging is a powerful tool for recording neuronal activity with unprecedented temporal resolution. Despite a need for processing pipelines for voltage imaging datasets, no benchmark toolbox exists. This project involves developing a pipeline for processing raw voltage imaging recordings, through an intuitive custom-made interface. Such much-needed user-friendly toolbox will catalyze voltage imaging experiments and can lead to an impactful publication. This project presents an exciting opportunity for a SUDS student with strong programming skills in Python (and/or Matlab) to contribute to the Taxidis lab's data analysis capabilities. The focus is on upgrading existing software for processing voltage imaging recordings, and building a user-friendly Graphical User Interface (GUI).

The SUDS Scholar responsibilities: (1) Conduct literature review on related data analysis software for neuronal imaging methods; (2) Optimize an existing basic data processing pipeline through efficient coding practices to increase processing speed while maintaining accuracy; (3) Explore AI options for automated detection of Regions of Interest; (4) Expand and improve an intuitive Graphical User Interface (GUI) for users to set analysis parameters and to navigate through the pipeline seamlessly; (5) Implement advanced data visualization tools and interactive plots within the GUI; and, (6) Document the upgraded software, providing instructions and demos for future users.

Researcher: Jiannis Taxidis, The Hospital for Sick Children, Neurosciences and Mental Health

Skills required:

  • Strong programming skills (Python and Matlab)
  • Experience with data handling and plotting
  • Some experience with designing graphical user interfaces  (Python and/or Matlab)

Primary research location:

  • Hybrid

Research description:

Discover and name a novel virus. Data-driven virus discovery is revolutionizing our understanding of virology across Earth's biosphere. In 2020, there were 15,000 known RNA viruses, since then our lab has discovered more new species (375,000+) than everyone else in the world combined, including so called “Dark RNA Viruses”. Our lab explores the evolution, ecology, and molecular interactions of these viruses through state-of-the-art computational analyses. Our focus is on how these viruses intersect human health and disease. Currently we’re searching for viruses which cause disease of unknown etiology (e.g. Alzheimer’s) and human cancers. By finding these causal agents, it creates the possibility of developing an Alzheimer vaccine, or cancer vaccine.

 

Info links- [Entering the Platinum Age of Virus Discovery Talk]- [Quirks and Quarks Podcast on our research]- [Serratus flagship paper]

 

As a SUDS Scholar, you will be involved in identifying a novel RNA virus found in a human tissue of your choice and characterize this virus by any means necessary.

 

Researcher: Artem Babaian, University of Toronto, Temerty Faculty of Medicine, Department of Molecular Genetics

 

Skills required:

  • Take this leap into the unknown and learn the fundamentals of virus discovery, data science, R, genome analysis, as well as ecology, ecoinformatics and medicine.

Primary research location:

  • University of Toronto, St. George Campus

Research description:

In this project, we study how U.S. households refinance their 30-year fixed rate mortgages to respond to lower interest rates or to change the pace at which they pay down their loan. Unlike the Canadian system, which resets at regular intervals and is common in much of the world, U.S. mortgages are locked in at origination and remain fixed for the full 30 years. The only way to adjust the contract is to refinance, so the burden is entirely on the homeowner to recognize opportunities and act on them. While some attentive and financially literate households refinance at the right time to capture gains, others do not adjust at all, even when the potential savings are large. 
 
The SUDS Scholar will be responsible for analyzing micro loan-level data from U.S. regulators covering the last 25 years to document how often households refinance, how sensitive they are to interest rate movements, and how much money is left on the table by those who do not take action. Using this analysis, the SUDS Scholar will quantify household behaviour and identify simple improvements to the mortgage market that could help households across the income distribution save money and lower financial stress.
 

Researcher: Michael Boutros, University of Toronto, University of Toronto Mississauga, Department of Economics

 

Skills required:

  • Working efficiently with large datasets (25 GB+) that span multiple files.
  • Understanding of panel datasets and relevant statistical techniques.
  • Background/interest in finance and economics.

Primary research location:

  • Remote

Research description:

Electron microscopy offers the highest possible imaging resolution. Nanometer-resolution ultrastructural reconstruction of subcellular organelles, including endoplasmic reticulum, mitochondria and synapses, have benefited not only cell biology studies but also informed the process of pathological development in human tissues. To date, volume EM (vEM) reconstruction of subcellular organelles at desired resolution remains too demanding in time and cost. This is unfortunate because routine histological diagnosis carried out by light microscopy of 2D tissue slices remains qualitative and sensitive to late-stage disease progression. Developing a tool that delivers rapid volumetric tissue reconstruction aimed at revealing subcellular pathology, from mitochondrial to synaptic changes that inform early stages of metabolic and brain disorders, would transform the field of both basic and clinical cell biology studies. This proposal harnesses artificial intelligence to accelerate the established FIB-SEM platforms to perform nanometer-scale image acquisition for organelle reconstruction in a fraction of time required with current imaging modalities. Dr. Zhen and colleagues recently achieved automated nanometer resolution vEM reconstruction of human tissue samples. She and her collaborator will develop a SmartEM specialized for ultrafast, nanometer-scale reconstruction of a broad range of biopsy human tissues, which will deliver a powerful new tool for early detection of clinical pathologies. 
 
The SUDS Scholar will be involved in the implementation and optimization of a SMART EM scanning module. A proof-of-principle prototype pipeline has been recently developed for connectomics imaging (https://doi.org/10.1101/2023.10.05.561103). This method employs algorithms that analyze low-resolution electron micrographs and make reliable estimates of salient structures, the pixel locations of neurite membranes, and presynaptic termini. Machine learning and statistical models are then employed to rapidly quantify uncertainty of every pixel in a ROI-detected image to decide what needs to be scanned at higher resolution and with longer dwell time. The student will apply the same principles to develop an optimized ML model that focuses on vEM reconstruction of other cellular organelles that are universal across cells, tissues, and systems. The student will develop and employ human-annotated ground truth, including synapses, mitochondrial cisternae, and the endoplasmic reticulum, to train two models, one to exclude for re-scanning, and one to include for re-scanning.
 

Researcher: Mei Zhen, Lunenfeld-Tanenbaum Research Institute

Skills required:

  • Proficient in either image processing, algorithm development, or statistical analyses.
  • Knowledge in programming is essential.
  • Students interested in applied math and physics are strongly encouraged to apply, but the key ingredient is a strong drive to learn and apply all the above to real biological problems.

Primary research location:

  • Lunenfeld-Tanenbaum Research Institute and/or remote

Research description:

This project aims to develop emotionally aware AI math tutors that can teach children mathematics in a personalized and adaptive way. The system will analyze each learner’s performance and error patterns as well as their expressed and hidden emotions using advanced emotion decoding algorithms to deliver individualized instruction and targeted explanations for specific concepts.In addition to tracking learning progress, the AI tutor will continuously monitor the student’s emotional states (e.g., frustration, confusion, or engagement) using emotion recognition technologies based on facial expressions, voice tone, and behavioral cues and hidden emotions and physiology using transdermal optional imaging. By adapting its responses and instructional strategies according to both cognitive and emotional feedback, the system will enhance not only mathematical understanding but also learners’ motivation and emotional well-being.
 
Training will be provided in affective computing, AI analytics, and AI ethics. The SUDS Scholar responsibilities will include:
  • Attending regular team meetings to discuss project goals, technical requirements, and assigned tasks.
  • Learning and applying programming and AI-related skills through discussions, mentorship, and collaboration with team members.
  • Assisting in implementing and testing modules for emotion recognition and adaptive learning feedback.
  • Supporting data collection and analysis to evaluate students’ learning performance and emotional responses.
  • Refining and improving the system based on test results and user feedback to enhance learning effectiveness and emotional engagement.
  • Developing interdisciplinary experience that combines artificial intelligence, emotion recognition, and educational technology.

Researcher: Kang Lee, University of Toronto, Ontario Institute for Studies in Education, Department of Applied Psychology and Human Development

 

Skills required:

  • Strong Python skills and interest in applying AI to education/child learning.
  • Any of the following is an asset:
    • Experience in machine learning, adaptive learning, or emotion recognition is an asset.

Primary research location:

  • University of Toronto St. George Campus and/or remote

Research description:

This project aims to develop an emotionally aware AI psychologist capable of assessing an individual’s psychological state through natural conversation. The AI system will engage users in dialogue to evaluate their emotions, mood, and mental health indicators, while simultaneously analyzing nonverbal cues such as facial expressions, gestures, and vocal tone, and hidden emotions and physiology using transdermal optional imaging. By combining natural language processing, affective computing, and multimodal behavioral analysis, the project seeks to build a system that can detect emotional distress, identify patterns of psychological well-being, and provide meaningful feedback or referrals for further support. The long-term vision is to advance emotionally intelligent AI for mental health assessment and early intervention. 
 
The SUDS Scholar will be part of a team developing an AI system capable of recognizing and responding to users’ emotional and psychological states. Student responsibilities will include:
  • Participating in weekly team meetings to review project objectives and discuss ongoing progress.
  • Learning and applying programming, natural language processing, and affective computing techniques with guidance from team members.
  • Assisting in developing and testing conversational and emotion-recognition modules using multimodal data such as text, speech, and facial expressions.
  • Supporting the integration and evaluation of system components to assess psychological insight and response accuracy.
  • Revising and improving algorithms or dialogue strategies based on feedback and testing outcomes.
  • Gaining interdisciplinary research experience in AI, psychology, and human–computer interaction.

Researcher: Kang Lee, University of Toronto, Ontario Institute for Studies in Education, Department of Applied Psychology and Human Development

Skills required:

  • Strong programming skills in Python and familiarity with machine learning frameworks (e.g., PyTorch, TensorFlow).
  • Experience in natural language processing, emotion recognition, or speech analysis is highly desirable.
  • Interest in psychology, mental health, and interdisciplinary AI research is strongly encouraged

Primary research location:

  • University of Toronto St. George Campus and/or remote

Research description:

As research becomes increasingly reliant on AI, and the reliability of AI models is dependent on the datasets used for training and open data, it’s more important than ever to make sure datasets shared under the open science framework are not only accessible but also AI-ready. 
 
The SUDS Scholar will build a local, open-source AI tool that helps assess and improve the FAIRness (Findable, Accessible, Interoperable, Reusable) of research datasets at the Rotman Research Institute, with the source code made openly available to maximize its benefit. The tool will run entirely within our secure local environment, protecting sensitive participant data and avoiding any reliance on external web services. The student will explore existing open-source solutions for FAIR data assessment and metadata annotation, adapt them to fit researchers’ needs, and develop a practical workflow for making datasets more FAIR using AI-driven methods. They will also ensure the code is reusable and well-documented, so other institutes in the Tanenbaum Open Science Network can adopt and build on it. By the end of the project, we will have stronger data management and sharing capabilities, AI-ready data sets, and a meaningful contribution to the open science community.
 

Researcher: Bradley Buchsbaum, Baycrest

 

Skills required:

  • Python or R, open-source development, and basic AI/ML workflows, with familiarity in data management and FAIR principles; comfortable working with metadata standards, automating workflows, and developing secure local tools. Curiosity, problem-solving ability, and clear documentation practices are essential.

Primary research location:

  • Hybrid

Research description:

Modern mechanical computer-aided design (CAD) software allows engineers to construct detailed 3D models using sequences of parametric modelling commands, but mastering these complex tools often requires years of experience. Given the complexity of geometric data in CAD, existing foundational machine learning (ML) models cannot be readily applied. Inspired by advances in natural language processing, this project explores how CAD modelling processes can be learned similarly to natural language: a complex CAD model can be represented as a sequence of modelling commands. Using a large dataset of publicly available human-generated CAD models, we aim to develop a context-aware recommendation system that predicts the most likely next modelling command based on the current state of design. Similar to sentence autocompletion in text editors, this system will assist designers by suggesting contextually relevant next steps, streamlining repetitive workflows and enhancing design efficiency. 
 
Through this project, the SUDS Scholar will gain valuable experience in developing and training advanced ML models, processing complex geometric data, and eventually integrating the developed AI-agent into a commercial CAD system, if time permits. The student will have the opportunity to contribute to writing and submit their work at a top-tier AI/HCI/CAD venues (e.g., ACM CHI, ACM IUI, NeurIPS, CVPR, etc.).
 

Researcher: Alison Olechowski, University of Toronto, Faculty of Applied Science and Engineering, Department of Mechanical and Industrial Engineering

 

Skills required:

  • A good theoretical understanding of popular neural network architectures (CNN, Transformers, etc.).
  • Fluency in Python programming and experience with training machine learning models using PyTorch.
  • Experience with mechanical CAD software (e.g., SolidWorks, Autodesk Fusion, Onshape) is a big plus. 

Primary research location:

  • University of Toronto St. George Campus and/or remote

Research description:

This project aims to develop and validate an AI-driven data extraction pipeline using Large Language Models (LLMs) to transform unstructured clinical notes into structured data within hospital electronic medical records (EMRs). The student will use a high-performance computing (HPC4Health) environment to locally deploy pre-trained LLMs, build data ingestion and output processes, and use statistical methods to evaluate accuracy. The focus will be on developing quantitative features/variables related to social determinants of health and technology use among hospitalized children. This work will advance the use of AI methods in healthcare data science and inform quality improvement and research in pediatric care.

 
 
The student will be embedded in the lab of Dr. Mahant, SickKids Research Institute , and co-supervised by Professor Nathan Taback, Department of Statistical Sciences, Faculty of Arts & Science University of Toronto.
 
The SUDS Scholarwill gain hands-on experience in: High-performance computing and data science workflows; Deploying and fine-tuning large language models; Clinical informatics and healthcare data systems; Statistical methods to evaluate and compare data accuracy and ethical data stewardship; and, Applying AI and data science techniques to real-world clinical research questions.
 

Researcher: Sanjay Mahant, The Hospital for Sick Children, Child Health Evaluative Sciences

 

Skills required:

  • Proficiency in Python including data processing and machine learning libraries
  • Familiarity with LLM concepts such as fine-tuning, machine learning
  • Familiar with standard statistical models, concepts.
  • Familiar with platforms such as Hugging Face.
  • Familiar with Linux command line
  • Strong analytical and problem-solving skills, with enthusiasm for interdisciplinary collaboration

Primary research location:

  • Hybrid

Research description:

This research project applies computational social science and GeoAI methods to examine how urban heat exposure, green infrastructure distribution, and socioeconomic disparities intersect across Ontario’s cities. Urban heat islands—areas that experience elevated temperatures due to dense built environments and limited vegetation—disproportionately affect low-income and racialized communities, amplifying existing health and social inequities.Using geospatial data science techniques, the project will analyze multi-temporal satellite imagery (1985–2025) and link it with census and health data to map patterns of heat exposure and green space accessibility. Advanced AI-based spatial modeling will help quantify how environmental and socioeconomic factors interact over time to shape urban resilience. By integrating spatial data on urban heat, vegetation, and social vulnerability, the project aims to identify communities most at risk and provide actionable insights for equitable climate adaptation and urban planning. The findings will advance our understanding of how environmental inequities emerge and persist in cities, demonstrating the potential of AI and GeoAI as tools for solving complex urban challenges and promoting environmental justice and social resilience in Canadian urban contexts. 
 
The SUDS Scholar will assist with acquiring, cleaning, and organizing satellite, census, and environmental datasets; conduct exploratory spatial analysis; and support the implementation of GeoAI models used to examine heat exposure and green infrastructure patterns. They will help generate maps and summary outputs, contribute to interpreting results, and prepare materials for internal reports or presentations. The scholar will participate in weekly team meetings, receive training in geospatial and analytic tools, and collaborate closely with the research team in the lab.
 

Researcher: Jue Wang, University of Toronto, University of Toronto Mississauga, Department of Geography, Geomatics, and Environment

Skills required:

  • Proficiency in Geospatial Analysis and Data Science (familiar with Google Earth Engine, Python, JavaScript, Spatial Analysis tools, etc)
  • Experience with Processing Census Data
  • Familiarity with Statistical Modeling to assess correlations
  • Visualization Skills via mapping and data visualizations
  • Ability to work with large, multi-source datasets and perform Spatial-Temporal Analysis

Primary research location:

  • University of Toronto St. George Campus and/or remote

Research Description:

This project aims to apply geometric deep learning to reveal structure–property relationships in physical chemistry. Modern chemistry and materials science generate vast amounts of data—from molecular simulations to spectroscopy and diffraction experiments—that encode physical laws in high-dimensional, structured forms. However, conventional machine learning often overlooks the underlying geometric and physical symmetries that govern these systems.In this project, the student will explore graph neural networks and equivariant models that respect molecular and crystalline symmetries to learn representations of atomic interactions, energy landscapes, and thermodynamic behavior. 
 

By leveraging datasets developed in our lab—such as those unifying experimental and computational knowledge for materials and molecular systems—the SUDS Scholar will train and analyze models that bridge quantum chemistry, thermodynamics, and data science.Beyond technical training, the student will gain experience in Python programming, scientific data analysis, and AI for the natural sciences. The ultimate goal is to advance explainable and physically grounded AI methods capable of reasoning over chemical systems—supporting the broader mission of accelerating materials and molecular discovery through the integration of geometry, physics, and data.

 
Researcher: Seyed Mohamad Moosavi, University of Toronto, Faculty of Applied Science and Engineering, Department of Chemical Engineering and Applied Chemistry
 
 
Skills required:
  • Python programming (NumPy, pandas, PyTorch)
  • Data analysis
  • Basic understanding of machine learning concepts
  • Interest in chemistry or materials science, especially molecular structure and thermodynamics, is desirable.
  • You should be willing to contribute in team efforts and be a team player.
Primary research location:
  • University of Toronto, St George Campus

Research description:

In the last decade, deep learning models have become an integral part of our lives, from image recognition software to large language models such as ChatGPT. These models are often overparameterized, with many (many!) more parameters than training examples. While this naively implies that these models should just memorize their training data, instead, we find that they generalize extremely well, finding solutions that are often even better than more traditional underparameterized models. This almost-magical ability to generalize rather than memorize turns out to be (in part) the result of how we optimize and sample from these overwhelmingly large models' parameters. Motivated by this behaviour, this project will investigate new variants of optimization and/or sampling methods based on ideas from Hamiltonian optimization, dynamics, and optics, to see how well they perform across a wide class of problems (potentially including large language models). This will involve a combination of theoretical work as well as empirical studies of how these methods perform on various methods on benchmarks, their stability and dynamics under various conditions, and the implicit and/or explicit regularization that they provide.This project will be co-supervised with Prof. Ricardo Baptista, Department of Statistical Sciences, Faculty of Arts & Science, University of Toronto. 
 
The SUDS Scholar will be actively involved in pursuing a combination of both (1) theoretical work and literature review, as well as (2) conducting empirical studies of how these methods perform (including their stability, dynamics, and regularization properties) across various benchmarks.
 

Researcher: Joshua Speagle, University of Toronto, Faculty of Arts and Science, Department of Statistical Sciences

 

Skills required:

  • Strong background in Python (especially auto-differentiable frameworks such as Jax, PyTorch, or TensorFlow), statistics (optimization/sampling), and/or physics (mechanics) are especially encouraged to apply.

Primary research location:

  • University of Toronto St. George Campus

Research description:

Incoming sensory information is often processed by separate, parallel pathways that encode increases and decreases in the input signal. The paradigmatic example is the visual system, where inputs separate into light increment (ON) and light decrement (OFF) pathways. What is the utility of this prevalent motif, and how does it facilitate sensory processing? To answer this question, we will use a recently published annotated connectome of the fly’s visual system. The connectome dataset includes input and output synapses for each cell, grouping of neurons into cell types, and neurotransmitter predictions for each cell type. 
 
The SUDS Scholar will use this dataset to study how information from the ON and the OFF pathways reintegrates to facilitate different visual computations in the visual system of the fly. They will survey ON-OFF motion integration patterns in the entire population of output cell types and ask: (1) what is the proportion of ON and OFF motion inputs received by each cell type, (2) what is the spatial distribution pattern of ON and OFF synapses along their dendrites, and (3) what is the cooccurrence of cell types in polyadic synapses (synapses with one presynaptic site and several postsynaptic partners).
 

Researcher: Eyal Gruntman, University of Toronto, University of Toronto Scarborough, Department of Biological Sciences

Skills required:

  • Programming skills in Python (preferred) and/or RBackground in Statistics and Linear Algebra

Primary research location:

  • University of Toronto Scarborough Campus and/or remote

Research description:

Our research group is populated primarily by undergraduates, coming from a range of backgrounds spanning pure math, statistics, bioinformatics, and computational biology. Each student has their own project, while also collaborating with each other as desired. Being part of the Donnelly Centre and Molecular Genetics department, our group also collaborates heavily with many experimental groups spanning tumor biology and regenerative medicine or basic cell or plant biology. In addition to interactions with the research group, the student will interact closely one-on-one with the PI (Shu Wang) on technical details.

The SUDS Scholar project is to develop methods for inferring polynomial dynamical systems from high-throughput/high-dimensional datasets, such as single-cell RNA expression time series. Depending on the student's background, the project can involve various subsets of 1) pure math proofs to determine the requirements on data/experimental setups that allow dynamical systems to be uniquely identifiable from data, 2) development of efficient algorithms to infer dynamical systems from data, or 3) application of inference algorithms to experimental single-cell omics data in the contexts of cancer biology and organismal development to predict dynamical outcomes of biological systems.
 

Researcher: Shu Wang, University of Toronto, Temerty Faculty of Medicine, Terrence Donnelly Centre for Cellular and Biomolecular Research

 

Skills required:

  • Multivariable calculus and Linear Algebra
  • Helpful:
    • Proof-based Math, ODE/PDEs, Differential Geometry, Algebraic Geometry, Combinatorial Geometry, Probability and Statistics, Coding in Python/R/MATLAB, Nonlinear Optimization, Chemical Kinetics, Dynamical Systems Theory, Graph Theory, Markov Fields

Primary research location:

  • University of Toronto St. George Campus and/or Remote

Research description:

Dark matter makes up most of the matter in the Universe, yet it cannot be seen directly—it interacts only through gravity. This project uses artificial intelligence and data science techniques to uncover the hidden structure of dark matter by analyzing the motion of stars in dwarf galaxies and stellar streams. These small galaxies and elongated star systems act as “gravitational detectors,” responding to the unseen dark matter around them. 
 
The SUDS Scholar will apply simulation-based inference (SBI), a cutting-edge approach in machine learning that uses simulated data to train neural networks to infer the physical parameters of complex systems. By comparing simulated and real astronomical datasets, the team will learn how to extract the dark matter distribution and test different theories about its properties.This project offers hands-on experience in modern data-driven astrophysics, combining tools from AI, Bayesian statistics, and computational modeling. Students will gain exposure to real astronomical survey data, explore uncertainty estimation, and contribute to developing machine learning models that help us understand one of the Universe’s greatest mysteries.
 

Researcher: Ting Li, University of Toronto, Faculty of Arts and Science, David A. Dunlap Department of Astronomy and Astrophysics

 

Skills required:

  • Basic programming experience (Python preferred)
  • Familiarity with data analysis or machine learning libraries (NumPy, pandas, PyTorch, or TensorFlow) is helpful but not required.
  • Interest in statistics, simulations, or AI for scientific discovery.
  • Curiosity and willingness to learn new data-science techniques for complex real-world problems.

Primary research location:

  • University of Toronto St. George Campus

Research description:

This project involves a new tool we developed to support open science activities and reporting at a Canadian neuroscience institute: the Rotman Research Institute (RRI). The tool retrieves and cleans publication data of RRI scientists from openalex.org, depositing it into a MySQL database. This project will focus on integrating the tool with the RRI data nexus (i.e., existing data management infrastructure and databases at the RRI), enabling seamless generation of institute reports (e.g., lists of open access publication and citation counts; chord diagrams visualizing collaborations) and automatic updating the RRI open science website and scientist webpages with lists of open publications. Another part of this project is to use AI tools to build a cross-reference map between existing RRI publication database entries and openalex publication IDs, and to retrieve data on study preregistrations and open datasets shared in online repositories (e.g., http://osf.io). This project will support our goal to apply for external funding to further develop these informatics and open science tools. 
 
The SUDS Scholar will have the opportunity to learn about institutional data management processes and open science best practices and tools, and to work with both neuroscientists and programmers in Research IT.
 

Researcher: Donna Rose Addis, Baycrest

 

Skills required:

  • Advanced computer programming skills (e.g., Python, shell scripting, HTML, SQL)
  • Effective oral and written communication skills
  • Ability to work independently and within a team
  • Beneficial to have:
    • Familiarity with open science
    • Data visualization skills

Primary research location:

  • Hybrid

Research description:

Proteins are inherently dynamic in nature; proteins can assume different structures that contribute to their function. As it stands, most deep learning-based methods for protein property prediction rely on protein sequences or single structures. There have recently been efforts to consolidate and generate dynamics data in silico. This increasing data has led to the development of models that are capable of efficiently generating protein structural ensembles. In this project, the student will investigate the integration of large-scale dynamics data and predicted protein ensembles with deep learning models to improve protein property prediction. 
 
The SUDS Scholar will test the ability of structure-based embedding models to distinguish between different structural states of input proteins. This will include investigating models that make use of tokenized, all-atom, and surface-based representations of protein structure. The student will evaluate protein property predictions generated by aggregating results with different input structures. The student will benchmark performance using deep mutational scanning data and protein-protein interaction data. This project will provide insight into the capacity of existing models to directly make use of dynamics information and will inform future efforts to develop predictive models that can make better use of protein dynamics. 
 

Researcher: Osama Abdin, The Hospital for Sick Children, Molecular Medicine

Skills required:

  •  Python coding experience
  • An understanding of the basics of machine learning/deep learning
  • Good written and oral communication skills
  • Experience with molecular structure data is an asset 

Primary research location:

  • Hybrid

Research description:

Autism spectrum disorder (ASD) is a neurodevelopmental condition whose core symptoms are communication difficulties, repetitive behaviors, and restricted interests. It affects 1 in 54 individuals and is 4x more common in males than females. Twin studies have estimated the heritability of ASD to be 64-91%; however, rare, high-impact variants in known ASD-risk genes are detected in only ~15% of ASD-affected individuals. Previous studies have examined the contribution to ASD risk of many types of genetic variants, including sequence-level variants, structural variants, and tandem repeat expansions. However, little is known about the role of missense variants that disrupt protein post-translational modification (PTM) sites. PTMs are changes made to proteins after they are synthesized, usually involving the covalent addition of chemical groups to amino acid residues. Catalyzed by protein kinases, phosphorylation is the most common PTM in humans and plays a critical role in nearly all cellular processes, including transcription regulation, chromatin remodeling, and circadian rhythms—all processes disrupted in ASD.
 
In this project, the SUDS Scholar will use statistical and machine learning approaches, in combination with whole-genome sequencing and whole-exome sequencing data from over 50,000 autistic children, to systematically examine the role of PTM site disruption in ASD.
 

Researcher: Brett Trost, The Hospital for Sick Children, Molecular Medicine

 

Skills required:

  • Linux shell
  • R or Python programming
  • The following skills would be assets:
    • Experience using a high-performance computing cluster
    • Experience with genetics/genomics data
    • Knowledge of statistics

Primary research location:

  • Hybrid

Research description:

Spectroscopic surveys are vital to the field of astronomy, especially for detailed measurements of the chemical, physical, and dynamical properties of stars that are used to understand how galaxies form and evolve. However, all spectral observations are a complicated mixture of light from a combination of sources: the target of interest, intervening gas and dust in space, and the Earth's atmosphere (if using ground-based data). Properly removing non-stellar signatures from stellar spectra remains an open and active area of research, with recent work exploring dimension reduction and component separation techniques. This project will focus on building new models of well-known-but-poorly-characterized Earth-based spectral signatures called "telluric absorption", largely caused by atmospheric water, methane, and carbon dioxide. 
 
The SUDS Scholar will use millions of ground-based near-infrared stellar spectra from the Sloan Digital Sky Survey (SDSS) to build data-driven, time-variable models of these tellurics using alternating minimization within a component separation framework. The results of this work will be directly incorporated into the data products of one of the largest and most scientifically impactful collaborations in the astronomical community. Proper removal of Earth's tellurics from SDSS spectra will reveal previously-hidden stellar features in under-explored regions of wavelength, potentially advancing our understanding of stellar astrophysics.
 

Researcher: Joshua Speagle, University of Toronto, Faculty of Arts and Science, Department of Statistical Sciences

 

Skills required:

  • Strong background in scientific programming (e.g. reading in data and then fitting models).
  • A background in linear algebra is helpful.
  • Ideally, the student is familiar with coding in Julia, though Python users with a desire to learn Julia are also encouraged to apply.

Primary research location:

  • University of Toronto St. George Campus

Research description:

This project advances automated behavioral assessment by developing and evaluating large language model (LLM)–based pipelines for coding performance on common psychological and educational tasks. Many behavioral tasks—such as open-ended text responses, conversation-based interactions, and video-recorded activities—require human coders to identify constructs like emotion regulation, empathy, risk-taking, cognitive strategies, or adherence to task instructions. Manual coding is labor-intensive, inconsistent, and difficult to scale. Building on recent progress in generative and multimodal AI, this project investigates whether LLMs (e.g., GPT-5, multimodal vision-language models) can produce reliable, valid, and interpretable codes for these tasks.
 
The SUDS Scholar will help create a structured dataset consisting of (a) short text responses, (b) transcripts, and (c) short video clips with benchmark human-coded scores. We will design and test multiple prompting, in-context learning, and self-critique strategies to improve LLM accuracy. For the video tasks, we will explore vision-language pipelines (OpenAI, Gemini) to extract behavioral features such as facial affect, gesture patterns, and task-relevant actions. Model outputs will be evaluated against expert ratings using reliability, generalizability, and validity metrics.This project contributes to responsible use of generative AI for behavioral science, enabling scalable, transparent measurement tools for research, intervention evaluation, and digital-twin modeling.
 

Researcher: Feng Ji, University of Toronto, Ontario Institute for Studies in Education, Department of Applied Psychology and Human Development

Skills required:

  • Strong interest in AI, psychology, or behavioral science
  • Ability to manage datasets
  • Basic experience with Python or R
  • Familiarity with LLMs, prompting, or NLP is an asset
  • Experience with video annotation or computer vision is optional but beneficial.
  • Curiosity, reliability, and willingness to learn are essential.

Primary research location:

  • Remote

Research description:

Understanding how different nanoparticles interact with biological tissues is essential for developing safe and effective nanomedicines. In our lab, we use organ-on-a-chip models that mimic the human placenta and other organs to study how nanoparticles behave under realistic physiological conditions. This project will apply machine learning to experimental data collected from these models to predict cellular responses to nanoparticles based on their properties and exposure conditions.
 
The student will work with a graduate student to clean, organize, and analyze datasets, build predictive models, and create clear visualizations to interpret findings. This project offers an exciting opportunity to apply data science skills to a real-world biomedical challenge at the intersection of nanotechnology, microfluidics, and computational modeling.
 

Researcher: Hagar Labouta, Unity Health Toronto

 

Skills required:

  • A motivated data science student with experience in Python, R, or related data analysis tools.
  • Familiarity with machine learning packages is an asset but not required.
  • No prior experience in nanomedicine or organ-on-a-chip systems is needed; the project provides an opportunity to learn these concepts in an interdisciplinary environment.

Primary research location:

  • Unity Health Toronto

Research description:

 
Marine microbes are prolific yet underexplored producers of bioactive compounds and the source of new anti-cancer and antimicrobial drugs. A key challenge is mining microbial chemistry at scale to identify and prioritize novel compounds. New ‘omics techniques have created unprecedented amounts of data on the chemistry and genomes of microbes. However, this data remains difficult to interpret. Current computational methods to link biosynthetic gene clusters (BGCs) to their products largely operate in low-data regimes and cannot reliably identify metabolites from genomes. Here, we will use cutting-edge machine learning techniques to mine multi-omics data.  This project will lay the foundations towards building a genomic predictor for chemical potential using protein language models and novel BGC representations and using transformer-based models for metabolomics data and multi-modal contrastive learning to directly link BGCs with chemical features. Our goal is to create new tools to explore and mine ‘omics data to expedite the discovery of new microbial chemistry and therapeutics. This project will be in collaboration with Prof. Ben Sanchez-Lengeling. 
 
The SUDS Scholar will analyze genomics and metabolomics data from a collection of 150 marine bacteria. The goal is to create a pipeline to make ‘omics data AI-ready, though data organization; setting up data schema and data manuals; and performing exploratory data analysis (EDA).
 

Researcher: Rachel Gregor, University of Toronto, Faculty of Applied Science and Engineering, Department of Chemical Engineering and Applied Chemistry

 

Skills required:

  • We are looking for motivated candidates who are interested in drug discovery, environmental sciences, genome mining, and artificial intelligence.
  • Candidates with a background in computer programming, biology, and/or chemistry are especially encouraged to apply. 

Primary research location:

  • University of Toronto St. George Campus and/or remote

Research description:

Previous work from the MEDCVR lab has shown strong results in policy learning for surgical robotic manipulation tasks using methods such as Reinforcement Learning (RL) and Imitation Learning (IL), but most approaches rely on single camera 2D images. However, as the task becomes more complex: the robot must interact in a 3D world requiring spatial reasoning and depth understanding and multi-object interaction, on which single-view learning does not scale effectively. This project will develop methods to capture consistent 3D representations across viewpoints. These representations can then be used to train policies that are more robust to camera viewpoints, and capable of handling inherently 3D tasks such as surgical cutting.  This project aims to: Build on previously developed representation learning pipeline;  Train policies for 3D manipulation tasks using multiple cameras in simulation; Transfer trained policies to real robots to evaluate performance on real 3D tasks; Compare baseline single-view RL methods to quantify improvements in sample efficiency and robustness to multiple camera viewpoints; and, Publish the findings.
 
The SUDS Scholar will work closely with the lab's Amey Pore, Schmidt AI in Science Postdoctoral Fellow. The SUDS Scholar responsibilities will include:
1. Data collection on surgical robotic simulators based on Unity, Unreal engine and MuJoCo Playground
2. Data collection on the da Vinci robot and Franka robots
3. Different camera configuration experiments for Imitation Learning (IL) and Reinforcement Learning (RL)
4. Developing novel methods for improving existing IL and RL pipeline
5. Ablation studies and paper writing.
 

Researcher: Lueder Kahrs, University of Toronto, University of Toronto Mississauga, Department of Mathematical and Computational Sciences

 

Skills required:

  • Strong background in Python, machine learning, and knowledge of computer vision
  • Experience with robotics, reinforcement learning, 3D perception (stereo vision, multi-view geometry) and game engines like Unity or Unreal or robot simulators like Mujoco or Isaac Sim is a plus.  

Primary research location:

  • University of Toronto Mississauga Campus and/or remote

Research description:

Alzheimer's disease and related dementias progress slowly at first, then accelerate as individuals move from normal aging to mild cognitive impairment then dementia. Cognitive and biomarker changes are non-linear near clinical conversion. We previously developed Bayesian linear mixed-effects models to track brain changes over time. This project will extend to nonlinear Bayesian mixed-effects modeling that will capture richer temporal dynamics. Under supervision, the student will improve and tune these models on longitudinal T1-weighted MRI. They will also evaluate model performance and help refine the computational pipeline. The analysis pipeline will provide voxelwise estimates of higher-order derivatives of individual brain trajectories. We propose that these features on whether the brain change is accelerating or decelerating will offer earlier and more specific signals of imminent conversion compared to traditional linear measures such as baseline level or slope. Better prediction of time-to-conversion could meaningfully improve clinical trial design and participant selection. 
 
Working with collaborators at UCSF, the SUDS Scholar will gain experience in building Machine Learning tools on the extracted metrics to estimate how early individuals can be identified for targeted trial designs. This project offers hands-on experience at the interface of neuroimaging, Bayesian modeling, and translational machine learning.
 

Researcher: Lei Wang, Baycrest

Skills required:

  • Computational Science and Data Mining, independence, problem solving  

Primary research location:

  • Hybrid

Research description:

Social behavior towards other individuals is central for our well-being and survival. Its impairment, such as difficulty in social communication and avoidance of eye contact, is commonly observed in people with autism spectrum disorders (ASD). Although animal models (especially mouse models) have played critical roles in understanding mechanisms by which mutations in ASD risk genes lead to social and non-social behavioral phenotypes, most of the prior studies have focused on the behavior in adult animals. Therefore, even though ASD clearly affects behavior during childhood, the patterns of social behavior in neonatal and juvenile mice and how those patterns are impaired in ASD model mice are poorly characterized. To overcome this challenge, our lab is starting behavioral recordings of neonatal and juvenile mice with their mother.
 
This SUDS project will focus on the analysis of such short-term and long-term behavior data using wild-type mice and mice with genetic or chemical interventions. By using and potentially modifying machine-learning based tools including DeepLabCut and SLEAP (tracking keypoints in the mouse body such as nose, ears, neck, tail root) and MoSeq (identifying behavioral motifs such as rearing and grooming in an unsupervised manner), we will quantitatively analyze the patterns of mother-pup interactions.

Researcher: Tatsuya Tsukahara, Lunenfeld-Tanenbaum Research Institute

Skills required:

  • Experience in python, especially the machine-learning related tools such as scikit-learn, for analysing the behavior data.
  •  
    Experience with GitHub, as some of the tools we plan to use are available as GitHub repositories.

Primary research location:

  • Lunenfeld-Tanenbaum Research Institute

Research description:

This research project aims to develop a novel physics-informed neural network (PINN) for predicting droplet solidification during the spray freezing (SF) process for renewable mine heating and cooling in northern areas. The objective is to solve a two-phase Stefan problem in the spherical coordinate that predicts the equilibrium freezing process of droplet solidification via PINNs. Existing close-form analytical solutions (e.g., perturbation solutions) with specific assumptions (e.g., low Stefan number) will also be incorporated into the PINN to enhance learning. The PINN framework will allow a high-resolution, high-fidelity SF model that considers the freezing process of each droplet, which improves the predictive capability of mine heating and cooling through SF technology. 
 
The SUDS Scholar tasks: In Month 1, conduct a literature review on PINNs and run the existing PINN framework (developed in my lab) for Stefan problems in Cartesian coordinates; in Month 2, develop a PINN for a spherical two-phase Stefan problem to predict droplet solidification; in Month 3, incorporate perturbation-based analytical solutions from the literature into the PINN to reduce training time and improve accuracy; in Month 4, perform a parametric study on the effects of droplet size and climatic conditions with the preparation of a written report.
 

Researcher: Minghan Xu, University of Toronto, Faculty of Applied Science and Engineering, Department of Civil and Mineral Engineering

 

Skills required:

  • Fluent in Python and familiar with Github
  • Strong background in engineering mathematics, heat transfer and thermodynamics
  • Familiar with machine learning or PINNs or willingness to learn

Primary research location:

  • University of Toronto St. George Campus 

Research description:

An important step in planning a large conference is to divide the accepted papers into groups (sessions) of topically similar ones that will be presented together. Each presenter would like their paper presented with most similar other papers, but it is generally impossible to maximally satisfy all presenters simultaneously. This requires striking a balance between the preferences of different presenters, raising a fairness concern. Today, this step is largely performed in an ad-hoc manner by the conference chairs. The goal of this project is to build an automated tool that combines large language models and clustering algorithms in a novel manner to produce meaningful conference schedules with provable fairness guarantees. The tool will be published in an open source model. The project will involve understanding, adapting, and implementing existing algorithms with theoretical fairness guarantees; running simulations with state-of-the-art large language models; evaluating empirical fairness and accuracy; and contrasting with existing conference scheduling approaches.
 

Researcher: Nisarg Shah, University of Toronto, Faculty of Arts and Science, Department of Computer Science

 

Skills required:

  • Ability to analyze and implement clustering algorithms.
  • Ability to follow mathematical proofs and adapt them to derive fairness or approximation guarantees.
  • Experience building ML pipelines that integrate large language models.
  • Experience designing empirical evaluations and metrics.
  • Strong research-grade software engineering skills, including clean modular code, reproducibility, and (ideally) open-source development.

Primary research location:

  • University of Toronto St. George Campus and/or remote

Research description:

Arterial dysfunction is a precursor to aging-related cognitive decline and dementia; a clinical biomarker of dysfunction is blood flow velocity. Specifically, the detection of slowed flow in the anterior choroidal arteries is valuable for the early treatment of hippocampal degeneration, the hallmark of Alzheimer’s disease. 3D time-of-flight (TOF) magnetic-resonance angiography data, which can be collected in as little as 4 minutes, is the most promising non-invasive way for arterial imaging. However, 3D TOF does not provide blood-flow velocity information and is thus of limited clinical value. The alternative method, 3D phase-contrast angiography, provides quantitative blood flow velocity but requires 20-30-minute scans and more complex computations, and is not feasible for routine use in patients. In this work, we propose to develop a deep-learning framework for generating quantitative arterial-velocity measurements from 4 minutes of conventional 3D TOF data. To that end, we will first measure blood-flow velocity in the anterior-choroidal arteries using phase-contrast MRA in healthy adults, which we will use to train a network along with 3D TOF data from the same individuals. This project could provide a proof-of-concept that breathes new value into existing 3D TOF data and facilitates the inclusion of TOF in more clinical studies of dementia. 
 
The SUDS Scholar will: Familiarize with existing deep-learning tool for quantitative blood-flow velocity measurement from functional MRI signals; Perform biophysical modeling to understand the relationship between TOF signal intensity and blood-flow velocity; and, Identify ways to adapt existing deep-learning tool for estimating blood-flow velocity from TOF signal.
 

Researcher: Jean Chen, Baycrest

 

Skills required:

  • Usage of the Linux operating system
  • Programming in Python and/or Matlab
  • Basic data-science concepts, e.g. correlation, regression
  • Basic statistical concepts, e.g. t-tests, F-tests, outlier identification
  • Deep-learning methods, including CNN, GAN, autoencoders
  • (optional) Medical imaging analysis experience

Primary research location:

  • Hybrid

Research description:

People often try to trick chatbots into unsafe answers by rephrasing questions or switching languages. A safe healthcare assistant must resist these tactics in realistic conversations. This project will test whether an AI assistant keeps refusing when a user becomes more insistent or changes language during a dialogue.
 
The SUDS Scholar will design simulated conversations where the user asks for things that the AI should not provide, such as risky medical tips or harmful instructions. Over several turns, the tone of the user will move from polite to demanding or distressed while sometimes shifting from English to another language to probe consistency. The responses from the assistant will be logged and reviewed to see if it ever stops refusing. The student will build synthetic dialogues with a language model as the assistant and a scripted or model based agent as the persistent user. The analysis will track how often the assistant maintains a refusal and how the wording of responses shifts with pressure from the user. The project will also compare different prompt styles that shape system instructions for the assistant and will store reusable templates and logs for future safety audits in clinical deployment inside hospitals and other health environments.
 

Researcher: Zahra Shakeri, University of Toronto, Dalla Lana School of Public Health, Institute of Health Policy, Management, and Evaluation

 

Skills required:

  • Proficiency in Python and experience with LLM APIs such as OpenAI, Llama, DeepSeek, and medical language models.
  • Knowledge of transformer architectures and practical fine tuning for health related tasks.
  • Familiarity with reinforcement learning methods for model training and safety evaluation.
  • Written and oral communication skills for diverse audiences in healthcare.

Primary research location:

  • University of Toronto St. George Campus and/or remote

Research description:

Foundation models trained on text prediction (large language models) have yielded revolutionary advances on natural language processing tasks. Similar models trained on speech audio (speech foundation models) have also led to major advances in speech processing (such as speech-to-text), but performance of these models still has a number of major limitations. Important insights can be gained by building models inspired by human speech perception. Human listeners show tremendous robustness in their ability to identify phonemes, the basic units of speech, in noise, in different accents, and in different linguistic contexts, in spite of the fact that these conditions lead to massive variability in the signal. This project will make use of direct comparisons between the behaviour of speech foundation models and the behaviour of human listeners in a variety of listening experiments, using experimental stimuli based on perception of unfamiliar sounds from other languages (not in the training set of the model/experience of the listener), which are extremely revealing of the underlying mechanisms of human speech perception.
 
The SUDS Scholar will develop and implement architectures and techniques to make the perception behaviour of speech foundation models more robust, drawing on decades of insights from classical (pre-neural) computer speech processing.
 

Researcher: Ewan Dunbar, University of Toronto, Faculty of Arts and Science, Department of French

 

Skills required:

  • Experience with neural network training in PyTorch
  • Basic understanding of supervised learning and model evaluation, ideally some familiarity with self-supervised/unsupervised learning
  • Strong analytical skills and creativity in tackling challenges
  • Experience with data interpretion and analysis in a research context
  • Ideally: knowledge of linguistics and phonetics

Primary research location:

  • University of Toronto St. George Campus and/or remote

Research description:

Large language models can write first person stories about gambling and gaming and substance use. Many researchers treat this synthetic text as a proxy for behavioural data, although we lack tests of its realism. This project will compare Gen-AI generated narratives with real forum posts from online gambling and gaming and substance use communities. We will measure how often key themes appear in each source and will track financial harm, platform mechanics, stigma, loneliness, family conflict, relapse, and recovery. 
 
The SUDS Scholar will build a corpus of posts under ethical guidance and will generate synthetic narratives. They will apply topic modelling, sentiment analysis, and simple classifiers to compare the two worlds and to detect stable gaps. The student will also test how these gaps affect research tasks such as risk scoring or message testing for harm reduction campaigns. The project will work well for a student with some experience in Python and interest in gambling harms, NLP, and Gen-AI ethics. The goal is to give behavioural scientists guidance about synthetic data so they can decide when LLM stories act as safe stand ins for people. Findings will show how this evidence shapes digital health policies in real settings.
 

Researcher: Zahra Shakeri, University of Toronto, Dalla Lana School of Public Health, Institute of Health Policy, Management, and Evaluation

 

Skills required:

  • Strong Python programming skills
  • Experience using web and LLM APIs
  • Familiarity with NLP pipelines and transformer models

Primary research location:

  • University of Toronto St. George Campus and/or remote

Research description:

 
The goal of the project is to estimate the cross-sex distribution of allele-specific effects on isoform usage. The student will work existing fruit fly RNAseq data sets designed to estimate allele-specific expression in heterozygotes. We have already developed a pipeline to assign RNAseq reads to each parental haplotype.
 
The SUDS Scholar will use this pipeline to then estimate transcript-level counts (via salmon pipeline) for each haplotype. These counts will be used to estimate the proportional expression of each transcript isoform from a given gene. The student will then develop a likelihood model to parameterize the cross-sex distribution of allele-specific effects on isoform usage.
 

Researcher: Aneil Agrawal, University of Toronto, Faculty of Arts and Science, Department of Ecology and Evolutionary Biology

 

Skills required:

  • Experience with R and unix
  • Willingness to work with RNAseq data, mostly following established pipelines.
  • Familiarity with key concepts of likelihood (and ideally Bayesian statistics) and optimization.

Primary research location:

  • University of Toronto St. George Campus and/or remote

Research description:

Chromosomes are huge polymers that store and protect our genome sequence. The shape, or conformation, these chromosomes adopt plays a pivotal role in the genome’s functions and in how the sequence of DNA is interpreted by the cell. How one genome generates a large diversity of cell types, each with unique spatial and temporal gene expression patterns and physiological roles, is an enduring fundamental question in cell and developmental biology. We have discovered that chromosome conformation is highly variable between cells, and we are currently investigating the functional significance of this variability.
 
The SUDS Scholar will use clustering techniques to assess the cell to cell variability of chromosome conformation in different cells and tissues of embryos as they develop from a single cell to a complex organism.
 

Researcher: Ahilya Sawh, University of Toronto, Temerty Faculty of Medicine, Department of Biochemistry

 

Skills required:

  • Experience with computational biology, basic statistics, and programming languages (R, Python) are required.
  • Coursework in biology/biochemistry is preferred.

Primary research location:

  • University of Toronto St. George Campus

Research description:

The Morris Lab at the Donnelly Centre develops and applies cutting-edge deep learning models to interpret human cell epigenomic data and predict the effects of genetic variants. Although thousands of variants are linked to human traits and diseases, over 90% fall in noncoding regions of the genome, obscuring their function. By training deep learning models of cell type–specific activity, we are generating a comprehensive catalogue of predicted variant effects.
 
The SUDS Scholar will help train and evaluate models using our high-performance computing cluster, and integrate predictions with in-house experimental datasets (e.g., CRISPR perturbation screens). By comparing variants with strong versus weak predicted effects, the student will assess whether these predictions can guide experimental design and identify likely functional variants. This project offers hands-on experience in computational genomics, machine learning, and collaborative research within a vibrant human genetics community.
 

Researcher: John Morris, University of Toronto, Temerty Faculty of Medicine, Terrence Donnelly Centre for Cellular and Biomolecular Research

Skills required:

  • Proficient in Python or Bash, as these are the primary languages used in the lab for developing genomics data analysis workflows and running computations on our high-performance computing cluster.
  • A basic understanding of human genetics or genomics data is an asset but not required.

Primary research location:

  • University of Toronto St. George Campus

Research description:

Current retrieval-augmented generation (RAG) approaches in which LLMs interact with document libraries rely on search algorithms such as vector search or BM25 full-text search. We propose an indexing and retrieval method that more closely mimics human approaches to information retrieval by intelligently traversing directory structures and document contents.
 
The SUDS Scholar will implement an agentic algorithm to create and traverse a summarization tree structure, and validate this method against current RAG retrieval approaches. Using LLMs to recursively summarize document sections, then pages, then folders and finally their superfolders we create a tree structure of increasingly abstract summaries of the entire document library. Given a question or research topic, an LLM agent can then navigate this tree structure to find the relevant information and synthesize an answer. This will be deployed on our local GPU servers at OICR to improve search while maintaining privacy over our internal-access only genomics Quality Management System.
 

Researcher: Mélanie Courtot, Ontario Institute for Cancer Research

Skills required:

  • Some programming experience in Python
  • Desired:
    • Biological background and experience with neural networks
    • Interested in learning about machine learning and working collaboratively in a small research lab, coupled to a large software engineering team.

Primary research location:

  • Hybrid

Research description:

The student will utilize a large database comprised of decades of data from ecological research studies on the same species (the water strider Aquarius remigis) to answer a core question in ecology: how are individuals and phenotypes distributed across space. These experiments have all involved collecting detailed data on the traits and behaviour of marked individuals in artificial streams. We will utilize this large dataset to identify the factors that determine how individuals are distributed across habitats. Specifically, we will test whether biotic factors including population density, sex ratio, and mating success influence an individual’s decision to leave a habitat and to settle in a habitat.
 
The SUDS Scholar’s task will be to 1) process the data so that many datasets can be combined in one analysis, 2) analyze the data to test the research questions, and 3) document the analysis and write a report of the findings.The SUDS student will work directly with Dr. Baines, who is a spatial ecologist with many years of experience conducting ecological research with water striders and other aquatic invertebrates. Dr. Baines also has expertise in using R to analyze large ecological datasets.
 

Researcher: Celina Baines, University of Toronto, Faculty of Arts and Science, Department of Ecology and Evolutionary Biology

Skills required:

  • Proficiency using R to process, analyze, and visualize data.
  • Advanced statistical knowledge, especially fitting and interpreting linear mixed models and meta-analytical methods.
  • Proficiency using github to collaborate with other researchers.

Primary research location:

  • University of Toronto St. George Campus and/or remote

Research description:

Children with medical complexity (CMC) often experience multiple chronic conditions, functional limitations, and high healthcare use, frequently requiring technologies such as gastrostomy tubes for nutrition. Despite their high impact on pediatric health systems, there is limited evidence to guide feeding management after gastrostomy insertion. This project applies data science and predictive modeling to understand outcomes and trajectories of CMC following primary gastrostomy insertion. Building on existing work that defined the study cohort and variables, the student will analyze longitudinal EMR data to characterize patient phenotypes, outcomes, and risk factors for feeding intolerance and healthcare utilization. Objectives:
  • Conduct descriptive phenotyping of children undergoing primary gastrostomy
  • Describe post-procedure outcomes and care trajectories;
  • Apply predictive modeling to identify children at highest risk of feeding intolerance and high healthcare use.

The project will use data from SeDAR (SickKids Enterprise-wide Data in Azure Repository), a curated, research-ready version of the hospital EMR, accessed through the HPC4Health high-performance computing environment. Analyses will be performed using R and Python, leveraging advanced data science methods.

The SUDS Scholar will be embedded in the lab of Dr. Sanjay Mahant, SickKids Research Institute, and co-supervised by Professor Nathan Taback, Department of Statistical Sciences, University of Toronto.

 

Researcher: Sanjay Mahant, The Hospital for Sick Children, Child Health Evaluative Sciences

 

Skills required:

  • Proficiency using R to process, analyze, and visualize data.
  • Advanced statistical knowledge, especially fitting and interpreting linear mixed models and meta-analytical methods.
  • Proficiency using github to collaborate with other researchers.

Primary research location:

  • Hybrid

Research description:

Scientific agencies like NASA provide vast data about Earth and society, but this information must be delivered in ways that enable public understanding and informed decision-making. This project investigates how to make data science insights digestible, engaging, and informative through two advances: LLM-powered annotation pipeline: Extract relevant annotations to overlay on dashboards, enhancing comprehension and sensemakin; and, Web-based rendering pipeline: Display annotations to support effective data storytelling.
 
The SUDS Scholar will contribute to an existing codebase using secured infrastructure and data sources. They will gain hands-on experience with machine learning and NLP (LLMs, prompt engineering), web development (HTML/CSS, TypeScript, React), and version control (Git/GitHub). Additional exposure includes data visualization, human-computer interaction principles, and collaborative research practices. Selected students will join the DGP lab at the University of Toronto, working with collaborators from Inria (France) and NASA SVS. They will participate in lab meetings, reading groups, and seminars, developing technical expertise while learning to communicate across interdisciplinary teams. This project offers training in cutting-edge computing technologies applied to social good, preparing students for impactful careers at the intersection of AI, visualization, and public engagement.
 

Researcher: Fanny Chevalier, University of Toronto, Faculty of Arts and Science, Department of Computer Science

 

Skills required:

  • ML / NLP: Experience with LLMs, prompt engineering
  • Web Development: HTML/CSS, TypeScript, and frameworks (e.g. React)
  • Version Control: Proficiency with Git/GitHub
  • Problem-Solving & Debugging: Ability to troubleshoot/optimize code
  • Interdisciplinary Communication: Comfortable collaborating in multidisciplinary team
  • Desirable
    • Human-Computer Interaction: Familiarity with user-centered design
    • Data Storytelling: Understanding principles for creating informative visual narratives

Primary research location:

  • University of Toronto St. George Campus and/or remote

For more information

SUDS.dsi@utoronto.ca

News

DSI Celebrates SUDS Cohort of 2025 with Annual Showcase

Read the full story.

Students may also be interested in the Urban Data Science Corps Summer Internships offered by the School of Cities.

Learn more