SUDS Student Call 

May-August 2024

Call for student researchers!

The Data Sciences Institute (DSI) welcomes carefully selected undergraduate students from across Canada for a rich data sciences research experience. Through the SUDS Research Program, undergraduate students, who are interested in exploring data science as a career path, have an exciting opportunity to engage in hands-on research supervised by DSI member researchers across the three UofT campuses.

The DSI is strongly committed to diversity within its community and especially welcomes applications from racialized persons/persons of colour, women, Indigenous/Aboriginal People of North America, persons with disabilities, LGBTQ2S+ persons, and others who may contribute to the further diversification of ideas.

Below are the SUDS research opportunities for May-August 2024. You can apply and rank your top 3 choices.

See here for information on eligibility, award value and duration, and SUDS programming.

Researcher Opportunities

Research description:

Large-scale data have provided robust evidence that more liberal-leaning communities (e.g., university students, big cities) hold vastly different moral views than more conservative-leaning communities (e.g., working class, rural areas). Liberals and conservatives also differ in their personality, emotional profiles, cognitive styles, attitudes toward science, and lots of other basic psychological characteristics. Why do liberals and conservatives differ in so many ways? Are there deeper mechanisms that can account for all of their differences? This research project seeks to identify the most basic and fundamental psychological ingredients of liberal and conservative ideology. To do so, we need large-scale data collection on numerous variables that predict individual variations in social, moral, and political attitudes.
The SUDS Scholar will develop a website for the general public to complete measures of their moral values, political views, attitudes towards science, and various other psychological characteristics. As an informational reward, survey respondents will receive personalized feedback on their ways of thinking. The Supervisor's 2023 Scholar presented their work (on a different project) at the SUDS 2023 Showcase and won one of the two Best Oral Presentation Awards (https://datasciences.utoronto.ca/suds-scholars-showcase-their-newly-acquired-data-science-skills/).
 

Researcher: Spike W. S. Lee, Rotman School of Management and Department of Psychology, UofT

Skills required:

  • Website development
  • Interest in political, moral, social, behavioral, psychological, or cognitive science
  • Strong writing and presentational skills

Primary research location:

  • University of Toronto St. George Campus and/or Remote

Research description:

The rise of the digital age has markedly increased the popularity of online gaming and gambling, providing entertainment and social interaction but also prompting concerns about negative mental and financial impacts. In Ontario, a six-week study during COVID-19 restrictions involving 2,005 gamblers revealed increased online gambling among 1,081 high-risk individuals who showed symptoms of anxiety, depression, and altered work behavior due to the pandemic. Similarly, in Québec, those engaging in both online and offline gambling experienced notable disruptions in work, relationships, health, and finances. As Canada anticipates detailed economic impact data, evidence from Germany underscores the severe effects of online gambling, highlighting the urgency for in-depth research.This research aims to deepen our understanding of online gambling marketing using computational models and AI. This project, crucial for developing interventions to foster healthier online habits, takes an interdisciplinary approach. A SUDS Scholar joining our team at the Health Informatics, Visualization, and Equity (HIVE) Lab would engage in diverse and challenging tasks. These tasks include collecting text and image data from online gaming and gambling platforms using web scraping and third-party APIs. They also involve applying novel NLP and image processing techniques to analyze complex datasets and contributing to thorough literature reviews that will inform and shape our strategies.
 

Researcher: Zahra Shakeri, Institute of Health Policy, Management, and Evaluation, Dalla Lana School of Public Health, UofT

Skills required:

  • Proficiency in data analysis and a keen interest in behavioural research.
  • Prior experience in coding, particularly in Python and familiarity with natural language and image processing libraries.
  • Experience with machine learning algorithms, particularly using frameworks like Keras or TensorFlow, will be highly regarded.
  • Knowledge in statistical modeling is also advantageous.

Primary research location:

  • University of Toronto, St George Campus and/or Remote

Research description: We (https://www.camlab.ca/) are focused on developing and applying state-of-the-art machine learning and computational biology tools to understand the interplay between cancer evolution and phenotypes. In collaboration with the Bremner Lab, we aim to identify the pivotal genes and regulatory networks that drive lineage switching and drug resistance across cancers, with a focus on AML. Our previous work has highlighted the important role of YAP1 and TAZ in stratifying cancers into binary classes which interchange to drive drug resistance. We would like to expand these findings by mapping these subtypes pan-cancer given the wealth of single-cell data generated across both primary tumours and cell lines.

The SUDS student will be immersed in hands-on computational research, working with state-of-the-art deep learning frameworks to integrate large-scale pan-cancer datasets for in-depth analysis leading to discovery of cancer lineage switch drivers. There is significant freedom in project direction, including integrating perturbational datasets. The student will have the opportunity to join a vibrant computational lab and learn cutting-edge tools and techniques for exploration of high-dimensional data.
 

Researcher: Kieran Campbell, Lunenfeld-Tanenbaum Research Institute

 

Skills required:

  • Coding experience in R or Python
  • Strong interest in big data analysis and deep learning; related experience is an asset
  • Motivated to learn about cancer biology and single-cell transcriptomic analysis

Primary research location:

  • Lunenfeld-Tanenbaum Research Institute and/or Remote

Research description:

Join our exciting research project and become a crucial part of our mission to revolutionize healthcare delivery. As a SUDS scholar, you'll collaborate closely with our principal investigator and research assistants (including a former SUDS scholar). Together, we're developing cutting-edge, real-time intelligent video analytics tools that have a direct impact on hospitals and medical imaging departments.
Your role will involve crafting and implementing edge computing prototypes utilizing advanced technology like the Nvidia Jetson, the world's leading AI computing platform. You'll harness the power of open-source frameworks such as Gstreamer and Tensorflow to apply deep learning techniques to image data, unlocking valuable insights. The data you gather will be the key to extracting essential performance metrics and enhancing hospital productivity.
Notably, our previous SUDS scholar successfully deployed vision-based AI tools in our hospital's CT suite last summer. This year, our focus is on aggregating data from multiple image sensors to gain deeper insights across diverse environments. Join us on this exciting journey, where your work will directly influence healthcare access, costs, and quality. If you're passionate about AI, healthcare, and making a real-world impact, this project may be perfect for you!
 

Researcher: Andrew Brown, Unity Health Toronto

Skills required:

  • Passionate about technology, with an interest in learning new areas outside his/her comfort zone
  • Self-motivated and capable of working independently
  • Strong work ethic, ability to be proactive and responsive in high-stakes situations
  • Experience with the Python programming language and Linux would be an asset

Primary research location:

  • St. Michael's Hospital and/or Remote

Research description:

We (https://zhenlab.com/) combine cutting-edge imaging and computational biology tools to address how a nervous system develops and operates. One approach we use is volume electron microscopy to map the morphology and wiring of all neurons in the nervous system (Witvliet Nature 2021; Mulcahy Current Biology 2022). We are currently developing three machine learning pipelines to improve the accuracy and speed of connectomics: 1) morphological segmentation, 2) chemical synapse segmentation, 3) electrical synapse segmentation.
The student will work with biologists and computer scientists on the machine learning pipeline to identify electrical synapses. This work will result in higher throughput and higher quality connectomics.
 

Researcher: Mei Zhen, Lunenfeld-Tanenbaum Research Institute

 

Skills required:

  • Ideal candidates will be proficient in either image processing, algorithm development, or statistical analyses. Knowledge in programming is essential.
  • Students interested in applied math and physics are strongly encouraged to apply. The most key ingredient is a strong drive to learn and apply all the above to real biological problems.

Primary research location:

  • Zhen Lab at the Lunenfeld-Tanenbaum Research Institute and/or Remote

Research description:

Using Big Data, text analysis, and machine learning techniques, we will analyze how fake news and real news differ in their moral themes, cognitive styles, antiscience attitudes, emotional valence, and other psychological characteristics. Likewise, we will analyze how ideologically more biased news and less biased news differ on the various dimensions. To answer these questions with rigor and robustness, we will analyze about 7 million news articles from about 500 media outlets. The outlets vary widely in ideological leaning, from far left to far right. They also vary in veracity, from mostly fact-checked to mostly fake, conspiracy, and pseudoscience news. Our team has already completed preprocessing of all the news articles.
The SUDS Scholar will apply automated text analysis and machine learning techniques to these articles in order to identify linguistic patterns and biases depending on how fake or real and how left-leaning or right-leaning the media outlet is.
My last SUDS Scholar (Summer 2023) worked on the first stage of this project, presented our work at the Showcase, and won one of the two Best Oral Presentation Awards (https://datasciences.utoronto.ca/suds-scholars-showcase-their-newly-acquired-data-science-skills/). This year's SUDS Scholar (Summer 2024) will work on the full-fledged implementation of the project.
 

 Researcher: Spike W. S. Lee, Rotman School of Management and Department of Psychology, UofT

 

Skills required:

  • Text analysis; natural language processing; machine learning
  • Interest in political, moral, social, behavioral, psychological, or cognitive science
  • Strong writing and presentational skills

Primary research location:

  • University of Toronto St. George Campus and/or Remote

Research description:

The overarching research project is to develop machine learning (ML)-assisted insights of radiation treatment quality to ensure patients receive high quality and timely radiation treatments. The project will develop a quality radiation oncology platform built on novel ML methods with the following objectives:
i) Build and deploy ML to prioritize radiation treatments for review by the expert radiation medicine team based on treatment complexity and highlight treatments with potential errors requiring attention prior to treatment approvals.
ii) Develop ML for real-time quality assessment of radiation treatment adapted based on time-series imaging data and anatomical information acquired over the course of treatment to ensure updates to treatments are appropriate based on the patients’ changing anatomy. The novel research will include: i) generating new outlier detection models with human understandable output to enable ML-assisted decision support for treatment review and incorporating time-series data to provide real-time decision support to ensure treatment adapted based on anatomy meets the clinical objectives of treatment. The research project will have direct applicability and significance for improving the clinical workflow of quality processes in radiation oncology and enabling improved cancer care in addition to enabling patients more timely access to complex treatments.
The student will essentially be building a classifier of treatment complexity. We have collected data (supervised, prospective) with a complexity classification. We would then predict the complexity of a novel patient based on imaging and radiation therapy treatment data. The complexity prediction would be used to prioritize treatments that require additional or more comprehensive review (less complex cases have a similar review, and complex cases will have a more extensive review). The data will be DICOM-RT data that includes computed tomography images, image segmentations of organs, radiation dose, and radiation beam data.
 

Researcher: Tom Purdie, Princess Margaret Cancer Centre, University Health Network

Skills required:

  • The student will be responsible for processing and curating data into relevant patient cohorts, and building machine learning (ML) models predicting radiation treatment quality using clinical data as inputs.
  • The student will have experience with Python and some familiarity with ML and/or data processing.

Primary research location:

  • Princess Margaret Cancer Centre and/or Remote

Research description:

Speech and natural language processing have become major forces in AI, with increasingly perceptible impacts on our daily lives. But very little is really understood about the way that modern speech representations trained with machine learning really work (e.g., the popular wav2vec 2.0 or Whisper features that have revolutionized speech processing in recent years). They are "black boxes". At the same time, phonetics, the scientific study of speech sounds and perception, has not taken advantage of the massive potential of recent speech representations trained with machine learning, which promise to help us better understand how human speech works. This is in part because they are hard to use, and in part because they are opaque and would require in-depth analysis to be able to interpret what they are doing.
This project has two aspects. The student will be engaged in adding features to Speech Features Online, an existing online platform aimed at making speech representations accessible to non-experts. Second, the student will develop approaches for analyzing these representations that help bridge the gap between modern language and speech sciences, and modern machine learning approaches to speech processing. This project is a step towards changing the way we understand human speech.
 

Researcher: Ewan Dunbar, Department of French, Faculty of Arts & Sciences, UofT

 

Skills required:

  • Background in machine learning and ideally in linguistics, with strong software development profile.

Primary research location:

  • University of Toronto, St George Campus

Research description:

The sense of smell relies on the detection of diverse chemicals, odors, by the odorant receptors (ORs). Although mammals have a huge number of OR genes (> 1,000 in mice and ~400 in humans), each of the olfactory sensory neurons in the nose expresses only one OR and thus serves as a single channel of the chemical world. This feature allows us to identify OR-defined sensory neuron subtypes across experiments and quantify changes in gene expression on a receptor-by-receptor basis from single-cell transcriptome data. We recently found that olfactory sensory neurons reconfigure their transcriptomes based on the history of neural activity across environments. Changes in gene expression can predict acute odor responses, suggesting the possibility that these neurons use transcription to adapt to the environment. To understand the molecular basis of social recognition by the olfactory system, in this project we will characterize how odors from other mice are encoded in transcriptomes of olfactory sensory neurons across >1000 ORs. We will analyze the large-scale single-cell transcriptome dataset from male or female mice that interact with familiar or non-familiar mice with the same sex, as well as mice with different sexes.
The student will be responsible for being a respectful and collaborative lab citizen, for reading research papers, conducting data analysis, and discussing about them with lab members, and for participating and presenting at the meetings in the lab and SUDS events including the SUDS showcase.
 

Researcher: Tatsuya Tsukahara, Lunenfeld-Tanenbaum Research Institute

 

Skills required:

  • We will primarily use python to analyze our data, and thus experience in using python or at least one programming language such as R is required.
  • Experience in using command line interface and willingness to learn about the background and experimental components of the project are also strongly encouraged.

Primary research location:

  • Lunenfeld-Tanenbaum Research Institute and/or Remote

Research description:

This project seeks to characterize the relationship between various environmental exposures and breast cancer risk among BRCA mutation carriers. Previous literature established a relationship between ambient air pollution (nitrogen dioxide (NO2), nitrogen oxides (NOx) levels, and components of particulate matter) and breast cancer risk. To our knowledge, there have been no studies conducted examining this risk for BRCA mutation carriers in Canada. We hope to leverage our existing database of BRCA mutation carriers from across Canada, coupled with the rich environmental data found in the Canadian Urban Environmental Health Research Consortium (CANUE), to better assess and quantify this risk. CANUE is a national initiative that aggregates geospatial environmental data. This consortium generates information on air quality, pollution, weather, climate, greenspace and built-environment characteristics.
 
Responsibilities for SUDS Scholar:
  • Complete all necessary training, including on boarding, ethics and institutional requirements
  • Conduct a literature review on topic
  • Formulate an analysis plan and complete project initiation documents
  • Familiarize and contribute to epidemiologic research on-going within the team which may include, but not limited to the follow tasks:
  • Completes data entry and quality control ensuring the accuracy and integrity of data collection; may investigate missing or invalid data and prepare data sets
  • Completes review of medical records to collect research data
  • May prepare and mail out follow-up questionnaires to research participants or collaborators
  • Prioritizes and monitors various study deadlines for their own work
  • Conduct statistical analysis as per the study objectives
  • Contribute preparing materials for proposals, progress reports, presentations, and publications

Researcher: Joanne Kotsopoulos, Dalla Lana School of Public Health, UofT

Skills required:

  • Dependable, hardworking, detail-oriented, team player, independent, strong communication skills, analytic skills, strong organization skills, prior experience in SAS or R is an asset but not required.

Primary research location:

  • Women's College Hospital/ UofT and Remote

Research description: Many citizens struggle to access clean and safe water, and finding a cost effective solution is a major challenge for the international development community. This proposed project investigates how water cisterns - which harvest and store rainwater - can facilitate climate change adaptation in drought-prone areas. We examine the effects of this technology, employing a randomized control trial that built residential water cisterns in rural Northeast Brazil. The current project investigates effects on development outcomes, thereby significantly extending the team’s work (Vulnerability and Clientelism (Bobonis et al., American Economic Review 2022). The research team will examine both the short-term and long-term development outcomes of water cisterns using a wide array of measures linked to the team’s experimental sample, including a representative panel survey spanning three years, satellite imagery, and administrative data. This study will provide evidence about how the cisterns technology can be deployed at scale to heighten climate change adaptation. The intervention was conducted in partnership with Articulação no Semiarido Brasileiro (ASA), a network focused on implementing policies that allow coexistence with the semi-arid region in Brazil. To date, there is no experimental evaluation of the cisterns program; we intend to fill this gap in the climate adaptation literature.

 

Researcher: Gustavo J. Bobonis, Department of Economics, Faculty of Arts & Sciences, UofT

 

Skills required:

  • Stata, R, Python, coding of satellite image data

Primary research location:

  • University of Toronto, St George Campus and/or Remote

Research description:

Many machine learning (ML) applications rely on having high-quality, labeled training data that is representative of the type of data the ML model will eventually be applied to. However, in many astronomical applications, we have the exact opposite, with observed training data that is substantially biased (often to the brightest, closest, best-measured objects) relative to the underlying populations of interest (the fainter, faraway, noisier objects). To account for these domain mismatch issues, astronomers often resort to various data augmentation strategies that include making the training data "noisier" and supplementing observed data with simulated data from theoretical models. While these broadly address the fundamental problems, they also tend to degrade the performance of the initial ML model.
This project will explore new approaches to improve on these data augmentation strategies using state-of-the-art data from the DESI and SDSS-V astronomical surveys, with the goal of having a model that does strictly better on both observed (real) and simulated (theoretical) data under almost all circumstances. The main responsibilities of the student will be to review relevant literature, lead coding and data analysis efforts (using simulation studies and/or real data), and meet regularly with me and various collaborators to discuss progress on the project.
 

Researcher: Joshua Speagle, Department of Statistical Sciences, Faculty of Arts & Science, UofT

 

Skills required:

  • Preference will be given to students with some prior experience/background in coding (especially in Python), machine learning (especially neural networks), and statistical inference (especially Bayesian inference).

Primary research location:

  • University of Toronto, St George Campus

Research description:

We (https://zhenlab.com/) combine cutting-edge imaging and computational biology tools to address how a nervous system develops and operates. One approach we use is calcium imaging of the _C. elegans_ nervous system – monitoring the calcium concentration in neurons over time as they freely behave, to look at how specific neurons contribute to specific behaviors. One challenge we face is implementing automated tracking, segmentation, and quantification of calcium signals as neurons move around in 3D.
In this project, the SUDS Scholar will work with team members and collaborators to build on our current work implementing machine learning approaches to overcome these challenges and establish an accessible pipeline to obtain high-quality recordings of calcium activity in moving animals.
 

Researcher: Mei Zhen, Lunenfeld-Tanenbaum Research Institute

 

Skills required:

  • Ideal candidates will be proficient in either image processing, algorithm development, or statistical analyses.

  • Knowledge in programming is essential. Students interested in applied math and physics are strongly encouraged to apply. The most key ingredient is a strong drive to learn and apply all the above to real biological problems.

Primary research location:

  • Zhen Lab at the Lunenfeld-Tanenbaum Research Institute and/or Remote

Research description:

Novel statistical and data science methodology developments are often motivated by the increasingly complex data collected for research studies that involve a disease or health outcome. For example, methodological studies motivated by breast cancer or Alzheimer's disease are common. However, whether the effort for methodological development is appropriately being used for diseases that affect the global population the most is unknown.
In 2020, Lancet published the updated global burden of 369 diseases and injuries in 204 countries and territories. For each top 25 disease, using an automated systematic literature review, we will identify all published methodological research motivated by the disease. Then we will:
  1. Assess the relationship between the common global diseases and the common diseases that motivate methodological research.
  2. Identify global diseases that are being "neglected" by methodologists.
The student will learn how to conduct an automated systematic literature review, text analysis, produce professional tables, figures, and graphics. The student will have the opportunity to be a co-author of a paper in an academic journal. The R statistical programming language will be primarily used, but other programming languages may be considered based on the student's proficiency.
 

Researcher: Aya Mitani, Dalla Lana School of Public Health, UofT

Skills required:

  • Excellent programming skills in R (or Python)
  • Some experience in literature review
  • Strong interest in public health research
  • Excellent oral and written communication skills

Primary research location:

  • University of Toronto, St George Campus and/or Remote

Research description:

Because we can only observed the Milky Way at the present time, obtaining ages for large numbers of stars is crucial to unraveling our Galaxy's history. However, ages are notoriously difficult to obtain using traditional astronomical techniques. The most robust method for determining ages uses time series of red giant stars; a star's age is directly reflected in the random oscillations that such stars undergo and that we can observe using detailed time series observations. However, these observations are expensive and difficult to model.
 
Obtaining high-resolution spectra using a diffraction grating is much easier and such samples now consist of about a million stars. But while we believe these spectra contain age information, we have no robust theory to extract it. This is where machine learning comes in! In this project, we will use contrastive learning to extract the age information from stellar spectra using similar techniques as used to, for example, provide captions for images (see, e.g., OpenAI's clip). We will use this to obtain ages for large numbers of stars in the APOGEE and SDSS-V surveys and determine the age distribution of stars across the Milky Way's disk. The student will be responsible for implementing the contrastive learning process in Pytorch using data that we will provide and for evaluating the model's performance using a test set and by comparing to the results from other, previous techniques.
 

Researcher: Jo Bovy, David A. Dunlap Department of Astronomy and Astrophysics, Faculty of Arts & Science, UofT

 

Skills required:

  • Preference will be given to students with experience using Python and machine-learning (in particular neural-networks/deep-learning). No prior background in astronomy is necessary!

Primary research location:

  • University of Toronto, St George Campus and/or Remote

Research description:

Disparities in health outcomes represent one of the most challenging issues our healthcare system is currently facing. The development an equity dashboard in hospitals has been proposed as a solution to facilitate the identification of variations in outcomes, encourage accountability, and support ongoing monitoring. However, limited data, lack of data (including demographic attributes), and insensitive measures can render the development of equity dashboards challenging. In order to gain insights on potential variations in care, multiple sources of quantitative and qualitative data - including EMR documentation, incident reports, patient feedback, and various outcomes - need to be linked and leveraged to create a broader understanding of clinical systems inequities. Using maternal care as a case study, this project will utilize incident report data, patient experience data, and outcome data to develop an equity dashboard that can be used to inform decision-making.
 
The responsibilities of the student will be as follows:
  • Complete TCPS 2.0 Research Ethics Training
  • Review literature on maternal mortality and disparities
  • Conduct statistical analysis on disaggregated data to identify differences in outcomes and narrow down outcomes of interests - SMM indicators, adverse events, process measures
  • Assist with developing and evaluating predictive models based on patient characteristics and social vulnerability indices
  • Conduct data preprocessing
  • Train and evaluate different models
  • Assess fairness
  • Utilize explainable artificial intelligence techniques
  • Design and test an interactive dashboard of the outcomes (in Python, Tableau, or Power BI)
  • Develop visualizations that meaningfully convey potential disparities in care and outcomes
  • Incorporate XAI explanations as necessary to support transparency
  • Conduct usability testing to iterate design
  • Compose abstract of the findings
  • Present research at UnERD conference

Researcher: Myrtede Alfred, Department of Mechanical and Industrial Engineering, Faculty of Applied Science and Engineering, UofT

Skills required:

  • Knowledge of statistical analysis techniques (descriptives, t-tests, ANOVAs)
  • Knowledge of ML techniques (logistic/multinomial regression, random forests, support vector machine, gradient boosting trees)
  • Experience in Python libraries for ML and explainable artificial intelligence tools (e.g. Pandas, scikit-learn, XGBoost, SHAP or LIME)
  • Experience developing data visualizations (in Python, Tableau, or Power BI)

Primary research location:

  • University of Toronto, St George Campus and/or Remote

Research description:

Improving in recent years is the exploration of mobile health (mHealth) to empower users with better sleep hygiene. The adverse sequelae of poor sleep hygiene can affect a person’s physical fitness and emotional and mental wellness; thus, its significance has spurred much technological advancement, ranging across a broad spectrum of smart and wearable devices to support personalized sleep health. Yet, the continued realization of mHealth sleep solutions that strategically integrate with society and the healthcare environment has seen more fragmentation than coordination. This project tackles a growing concern in developing ML-driven sleep behavioral models, often including bias learning from the studied population. Indeed, prior work has reported differences in sleep model performance between children and older adults. Transfer learning, in general, has shown feasibility in achieving subject-independent classification using pre-training and fine-tuning paradigms in many activity recognition applications. In the same manner, students will investigate the efficacy of such techniques for sensor-driven mobile data predicting sleep measures.
Through this project, the SUDS Scholar will:
  • Learn to identify and develop potential ML applications for sleep detection using publicly available and user-study datasets
  • Learn to design and implement human-interpretable results and output created by the ML models.

Researcher: Camellia Zakaria, Institute of Health Policy, Management, and Evaluation, Dalla Lana School of Public Health, UofT

 

Skills required:

  • Proficient in programming languages (Python preferred, R)
  • Data manipulation and analysis libraries
  • Familiarity with machine learning libraries (e.g., scikit-learn, PyTorch)
  • Familiarity with GIT version control

Primary research location:

  • University of Toronto, St George Campus and/or Remote

Research description:

This research aims to construct a Quality by Design (QbD) simulator for autologous cell therapy manufacturing. Employing a digital twin and Design of Experiment (DoE) approach, we will generate and analyze over 1000 digital twins using Monod kinetics for cell proliferation and various Monod parameter distributions. DoE will determine optimal experimental designs based on budget constraints, and in silico experiments will be conducted using digital twin populations to simulate data for all runs. The simulated data will be used to develop Response Surface Models (RSM) and machine learning models. The efficacy of different DoE and modelling approaches will be assessed by establishing a specification target for the response, enabling the prediction of culture conditions that maximize donor proportions meeting the target. This research holds the potential to accelerate the advancement of personalized medicine and improve the efficiency and affordability of autologous cell therapy production.
The student will primarily be involved in running the quality by design simulator using computational approaches and analyzing data with guidance and supervision from a senior graduate student.
The student will attend weekly lab meetings, and participate in one-on-one meetings with the Supervisor and the graduate student
 

Researcher: Sowmya Viswanathan, Krembil Research Institute, UHN

Skills required:

  • Data visualization using tools like Plotly, Seaborn, or Jmp
  • Knowledge of machine learning algorithms, such as regression, decision trees, random forests, support vector machines, and neural networks
  • Strong coding skills in languages like Python.
  • Familiarity with libraries like scikit-learn, pandas, and NumPy
  • Understanding of regenerative medicine and related concepts

Primary research location:

  • Krembil Research Institute at University Health Network (UHN) and/ or Remote

Research description:

Fast and accurate computational modeling of chemical properties and structures has immense potential to accelerate discovery in various fields, including drug design and catalysis. In this context, predicting transition state (TS) structures of chemical reactions that cannot be obtained experimentally offers a powerful way to gather mechanistic details and generate energy profiles. Unfortunately, most traditional computational chemistry methods for predicting TS structures are still very costly to expedite high-throughput applications. Deep learning (DL) can potentially substitute traditionally used expensive quantum mechanical (QM) methodologies in chemistry to predict these structures by offering to generate fast and accurate mathematical models better suited to everyday computers.
 
In this project, the student will further explore this promising avenue and utilize a chemical reaction data set being currently generated in-house. They will utilize it to develop a novel graph neural network (GNN) based model for predicting the highly desired transition state structures at an unprecedented speed and demonstrating the acceleration provided via DL. The research aims at providing a novel way to cut down the computational cost and manual intervention associated with TS structure prediction. Such developments hold immense potential to advance computational chemistry and accelerate high-throughput applications like drug discovery and catalysis.
 
The student will assist in solution design, development of research code (with PyTorch, Keras, Tensorflow, and Python), deployment and running on HPCs. They will help with the generation of reference chemical property datasets. They will then be engaged mainly in designing graph convolutional neural network architecture for structure predictions. The proposed research will be coordinated by the Supervisor and a postdoctoral fellow with experience in running interdisciplinary collaborations. A key responsibility will be to provide one-on-one support to the student, which includes guidance on interdisciplinary method development, data analysis, result interpretation, and effective research communication in the form of published articles and presentations.
 
 

Skills required:

  • Familiarity with PyTorch, Keras, and TensorFlow.
  • Knowledge of active learning, transfer learning, and chemistry is preferred but not required.

Primary research location:

  • University of Toronto, St George Campus 

Research description:

How one genome generates a large diversity of cell types, each with unique spatiotemporal gene expression patterns and physiological roles, is an enduring fundamental question in cell and developmental biology. To understand the genome structure-function relationships, it is not sufficient to know the genome sequence and local epigenetic features - we must also consider the large-scale physical architecture of entire chromosomes and their positioning within the nucleus in space and time (4D). Our broad objective is to understand how the entire genome is organized in complex multicellular systems, and how this organization influences the genome’s functional output (Sawh et al., Mol Cell 2020; Sawh and Mango. Current Opinion in Genetics & Development 2022). In the current opportunity, a SUDS scholar will extend our methods of traditional watershed 3D image segmentation to extract quantitative chromosome conformation information from C. elegans embryo spatial omics data. With a large amount of 3D segmented ground truth data in hand, the applicant will develop a threshold-free deep neural network approach to accurately segment anisotropic cell, nuclear, and chromosome objects in C. elegans embryos over developmental time. The position can be in-person, remote, or hybrid depending on the preference of the candidate.
The student will work closely with graduate students and myself, to use and refine 3D semantic image segmentation algorithms on fluorescence images. The student will refine a pre-trained neural network model (e.g. Cellpose 2.0, https://doi.org/10.1038/s41592-022-01663-4) using already available ground-truth data to develop custom models for C. elegans cell, nuclei, and chromosome volumes. 
 
 

Skills required:

  • Previous experience in deep learning applications, Python (preferred), MATLAB (optional but beneficial)

Primary research location:

  • University of Toronto, St George Campus 

Research description:

Many of us are familiar with in-car alert systems. Alerts that warn the driver early about a potential hazard can speed responses and reduce the risk of a collision. However, alerts that provide redundant information may distract the driver from responding to a dangerous situation on the road. One potential solution is to determine whether a driver is already aware of a dangerous situation, and only provide an alert when they have not noticed the hazard. Eye-tracking measures have previously been used to build a model that predicts whether the driver is looking at a hazard present or hazard-absent scene (Costela & Castro-Torres, 2020). However, can eye tracking measures be used to infer whether a driver is aware of a hazard in the scene? In this project, the SUDS Scholar will use machine learning and computational modelling to predict whether a participant correctly located a road hazard using eye-movement data. Importantly, because participants very rarely miss hazards, the student will apply techniques to deal with rare events.
 
 

Skills required:

  • Programming experience with R, Python, or Matlab is preferred.
  • Previous experience with machine learning techniques are highly desirable.

Primary research location:

  • University of Toronto Mississauga, and/or Remote

Research description:

Advancements in reinforcement learning have paved the way for more sophisticated policies for decision-making in complex environments. This project delves into the exploration of diffusion policies, a class of algorithms that leverage stochastic processes to model the decision-making process in both discrete and continuous action spaces. The primary objective is to design, implement, and evaluate diffusion policies for a range of applications, showcasing their adaptability and effectiveness in diverse scenarios. The student will be responsible for implementing the method, conducting experiments, and compiling the results
into a paper.
 
 

Skills required:

  • Algorithm design and implementation
  • Programming skills: experience with popular frameworks like TensorFlow or PyTorch
  • Mathematics and statistics: a solid background in mathematics and statistics, especially probability theory and linear algebra
  • Collaboration and communication skills
  • Reinforcement learning fundamentals (Optional)

Primary research location:

  • University of Toronto Mississauga, and/or Remote

Research description:

Data-driven virus discovery is revolutionizing our understanding of virology across Earth's biosphere. In 2020 there were 15,000 known RNA viruses, since then our lab has discovered more new species (currently 375,000+) than everyone else in the world combined, including so called “Dark RNA Viruses” (see Nature paper)
Our lab explores the evolution, ecology, and molecular interactions of these viruses through state-of-the-art computational analysis. Our focus is on how these viruses intersect human health and disease. Currently we’re searching for viruses which cause neurodegenerative disease (i.e. Alzheimer’s) and human cancers. By finding such causal agents, it creates the possibility of developing vaccines or new therapies against devastating diseases.
Your project will be to select/prioritize the thousands of "candidate unknown human viruses" we have identified, and characterize them by any means to identify which viruses are human pathogens.
 
Info links
 
 

Skills required:

  • Creativity, and the capacity/desire to characterize unknown unknowns (high probability of failure)
  • Ability to synthesis multiple knowledge domains (i.e. R coding, genetics, statistics, pathophysiology, SQL, virology, ecology,...)
  • Communication and capacity for collaboration

Primary research location:

  • Terrence Donnelly Centre for Cellular and Biomolecular Research, and/or Remote

Research description:

We are proposing a novel gender-inclusive approach focusing on understanding barriers faced by women, Indigenous people, youth, and other underrepresented groups to increase recruitment, improve retention, expand and stabilize the construction and industrial workforces across Ontario. This proposal builds on our prior research on workplace factors associated with health professions' workplace stressors, injuries and retention, and my former collaborative professional practice with injured miners, employers, and unions on workers' return to work. Our research will develop and implement strategies to increase worker participation and retention in the construction and industrial workforce, based on gender-, age-, and ethnicity-informed systematic analysis of barriers to recruitment and retention.
 
The SUDS scholar will play a crucial role in several aspects of the project, contributing to both data analysis and project development:
  • Data Analysis and Interpretation (Quantitative Data Analysis)
  • Data Analysis and Interpretation (Qualitative Research)
  • Epigenetics analysis
This research provides a unique learning experience for the SUDS scholar, combining data science techniques with insights into social and workplace dynamics. The SUDS scholar will join the ReSTORE lab and be part of a multidisciplinary team. They will gain hands-on experience in advanced analytical methods while contributing to a project with real-world implications.
 

Researcher: Behdin Nowrouzi-Kia, Department of Occupational Science and Occupational Therapy, Temerty Faculty of Medicine, UofT

 

Skills required:

  • Excellent interpersonal skills
  • Strong computer experience including statistical analyses
  • Outstanding organizational skills
  • Demonstrated ability to maintain confidentiality
  • Ability to be a team-player
  • Experience working in a mental health context
  • Detail-oriented and dependable
  • Flexible individual with initiative and capacity to handle a complexity of tasks simultaneously
  • Interest in health professions

Primary research location:

  • ReSTORE Lab (http://restore.rehab) at the Department of Occupational Science and Occupational Therapy, Temerty Faculty of Medicine, University of Toronto, St George Campus and/or Remote

Research description:

Road traffic collisions pose a significant public health challenge, and controlling vehicle speed is a crucial factor in enhancing road safety. Excessive speed not only endangers drivers but also presents substantial risks to vulnerable road users. Automatic Speed Enforcement (ASE), employing cameras and sensors to detect speeding vehicles, has emerged as an effective strategy. This research focuses on evaluating the impact of ASE in a medium-sized Canadian city, considering the socio-economic diversity across neighborhoods.
The city of Guelph, reflecting varying collision rates and safety infrastructure distributions, serves as a key location for this study. Leveraging diverse data sources, including speed records, offender postal codes, and marginalization indices, the research employs quasi-experimental analyses to: assess pre-existing traffic speed differences; explore the impact of ASE across neighborhoods with different levels of marginalization, and determine whether offenders reside in the camera-equipped neighborhoods.
Led by a multidisciplinary team with expertise in public health, epidemiology, and biostatistics, this research aims to contribute valuable insights into ASE deployment, particularly in terms of social equity. Partnering with city authorities ensures the translation of findings into practical policies, making the study a potential nationwide reference for equitable ASE implementation initiatives. The expected responsibilities of the student will be:
  • Data Cleaning : Ensure accuracy and consistency in the dataset by meticulously cleaning and validating the data.
  • Data Integration: Merge data from diverse sources to create a unified dataset, facilitating comprehensive and holistic analysis.
  • Data Visualization: Proficiently employ data visualization techniques to convey insights effectively, making complex information easily understandable.
  • Communication and Collaboration : Facilitate effective collaboration among team members, ensuring seamless information exchange and understanding.

Researcher: Brice Batomen Kuimi, Dalla Lana School of Public Health, UofT

 

Skills required:

Seeking a skilled undergraduate with expertise in:
  • Data cleaning to ensure accuracy and consistency
  • Merging data from diverse sources for comprehensive analysis
  • Proficiency in data visualization to convey insights effectively
  • Strong communication skills for effective collaboration

Primary research location:

  • University of Toronto, St George Campus and/or Remote

Research description:

This project will 1) analyze PacBio and Omni-C data to assemble a high-quality reference genome and annotation for a common aquatic plant, the duckweed Lemna minor, and 2) analyze low-pass sequencing data to characterize genomic variants in L. minor samples from across Toronto and environs. The student will gain familiarity with standard bioinformatic pipelines for the analysis of genomic data.
 

Researcher: Megan Frederickson, Department of Ecology and Evolutionary Biology, Faculty of Arts and Science, UofT

Skills required:

  • The SUDS student will need to become proficient writing Bash scripts in a Unix shell, using Git and GitHub for version control, and running genome assembly and annotation software packages such as Hifiasm, YAHS, juicebox, samtools, etc.
  • Some background in genetics and evolutionary biology is also an asset.

Primary research location:

  • University of Toronto, St George Campus and/or Remote

Research description:

Physical rehabilitation is central to the recovery process after many type of musculoskeletal and neurological injuries. Considering limited resources in the healthcare system, delivering effective rehabilitation at home is crucial to achieving optimal outcomes. For this reason, tools to track rehabilitation activities in different environments are needed to generate data that will assist with individualized treatment planning, performance monitoring, and the development of evidence to support the most effective approaches.
Systems that combine video data with deep learning are achieving impressive performance in tracking posture and recognizing activities. However, individuals with disabilities may perform movement exercises in varied ways. In order to support effective tracking of rehabilitation, systems are needed that recognize activities without requiring large numbers of examples of the same activity being performed in exactly the same way. As an important step towards this goal, the objective of this project will be to develop a video-based deep learning approach that can perform few-shot action recognition.
The student will be responsible for:
  • Developing a deep learning system that integrates existing neural networks for encoding motion data into a few-shot learning architecture.
  • Evaluating the performance of the system on public action recognition datasets, using different motion encoders.

Researcher: Jose Zariffa, The Kite Research Institute, University Health Network

Skills required:

  • Experience designing and evaluating deep neural networks.
  • Previous experience applying deep learning methods to video data is preferred.

Primary research location:

  • Kite Research Institute - Toronto Rehab - UHN and/or Remote

Research description:

The growth and assembly of galaxies involves many complex processes, which culminate in the diverse collection of galaxies we observe today. One of the best ways of understanding these processes is through "Galactic Paleontology", which tries to reconstruct the assembly history of nearby galaxies through their surviving "fossils" (which are their present-day surviving stars!). Using this data, we simulate the birth, evolution, and death of many thousands/millions of stars, compare the end result with the stars we observe today, and repeat this process many times for many different evolutionary pathways to see which ones match the observed data better.
For the past few years, astronomers have largely relied on simulation studies and more "ad hoc" approaches to try to compare simulated data with real data, often involving "binning" the data into larger groups. In this project, co-supervised with Prof. Ting Li, we will develop a new, more principled approach based on Inhomogeneous Poisson Point Processes (IPPP) that will allow us to utilize all of the available data. If time/interest permits, we will also try to compare these results with traditional approaches and potentially explore new probabilistic machine learning-driven methods. The main responsibilities of the student will be to review relevant literature, lead coding and data analysis efforts (using simulation studies and/or real data), and meet regularly with me and various collaborators to discuss progress on the project.
 

Researcher: Joshua Speagle, Department of Statistical Sciences, Faculty of Arts and Science, UofT

Skills required:

  • All applicants are welcome, but preference will be given to those with background/experience with statistical inference (especially Bayesian inference) and coding experience (especially Python).

Primary research location:

  • University of Toronto, St George Campus and/or Remote

Research description:

The project examines trends and outcomes among U.S. private higher education institutions implementing a tuition reset. Tuition resets are cuts in published tuition prices by at least five percent and are a growing trend in the private higher education sector. The SUDS Scholar will assist with data management, data analysis, and preparing papers for conference submission. The project will focus on 1) recent trends in institutions adopting these policies, and 2) estimate the impact of the policies on student outcomes. Specifically. The SUDS Scholar will produce descriptive statistics of institutions that have adopted such policies and develop models aimed at understanding the effect of the policies on key student outcomes. This project will use publicly available data from the Integrated Postsecondary Education Data System and the College Scorecard.
 

Researcher: Daniel Corral, Department of Leadership, Higher, and Adult Education, Ontario Institute for Studies in Education, UofT

Skills required:

  • Previous experience conducting descriptive (e.g., summary statistics and data visualizations) and quasi-experimental (e.g., difference-in-differences and synthetic control) analysis using panel (longitudinal) data
  • Strong data management and analysis skills in Stata (e.g., knows how use merge, append, and egen commands)

Primary research location:

  • University of Toronto, St George Campus and/or Remote

Research description:

Virtual Rehabilitation programs provide comparable health outcomes to traditional in-person programs. Virtual Rehabilitation can reduce high dropout rates prevalent in traditional in-person programs due to transportation, financial constraints, staff shortages and other barriers. Automation is essential for the successful delivery of virtual rehabilitation to enable patients perform exercises at-home at their comfort without constant supervision from clinicians. We have developed a cloud-enabled and AI-driven virtual rehabilitation assistant (AVA), which is an intelligent avatar that can be accessed on any device with a webcam through a web browser. AVA monitors body joints movement of a patient performing exercises and evaluates it correctness using spatiotemporal graph convolutional networks. In this project, we aim to integrate pre-trained large language models (LLM) into AVA to provide real-time and useful feedback to patients. Patients' movements are translated into action tokens and paired with the spatiotemporal graph convolutional network's output. This information will be input into LLM fine-tuned for our application and will be delivered through AVA in the form of text or audio cues and instructions to patients on movement adjustments and correcting exercise techniques. This interactive and engaging feedback encourages independent exercise at-home, improving adherence to virtual rehabilitation programs, and boosting patients' health outcomes.
 

Researcher: Shehroz Khan, Toronto Rehabilitation Institute (KITE), UHN

 

Skills required:

  • Deep learning, Transfer Learning, Natural Language Processing, Large Language Models, and Transformer Neural Networks.

Primary research location:

  • Kite Research Institute - Toronto Rehab - UHN and/or Remote

Research description:

Automatic source code summarization is the task of generating a readable summary that describes the functionality of the code in natural language. In recent years, the use of deep learning-based approaches has led to significant improvement in the performance of automatic code summarization, e.g., using Transformers and Graph Neural Networks. However, the performance is still far from optimal and developers that are unsatisfied with a given summary are not able to provide feedback or additional information that can be used to refine the output.
In this research project, the goal is to investigate ways in which additional input from the developer can further improve the performance of automatic code summarization. Specifically, the main tasks in the project are:
  1. Investigating existing failures of state-of-the-art source code summarization solutions
  2. Developing new computational approaches and interactive schemes for incorporating developer input and feedback in order to improve the performance of deep learning-based approaches for source code summarization
  3. Evaluating the impact of the new approaches on existing large code summarization datasets.
The responsibilities of the SUDS student will be:
  • Read about, implement, and empirically evaluate state-of-the-art models for automatic source code summarization.
  • Investigate existing failures of state-of-the-art source code summarization solutions and develop interactive schemes for incorporating developer input and feedback in order to improve their performance.
  • Evaluating the impact of the new approaches on existing large code summarization datasets.

Researcher: Eldan Cohen, Department of Mechanical and Industrial Engineering, Faculty of Applied Science and Engineering, UofT

 

Skills required:

  • Knowledge in deep learning (relevant topics: RNNs, Transformers, deep generative models VAE/GAN)
  • Experience coding in a deep learning framework (e.g., PyTorch, Tensorflow, Keras, MXNet, etc)
  • Experience coding in Python and working with data-related libraries (Pandas, Scikit-learn)

Primary research location:

  • University of Toronto, St George Campus and/or Remote

Research description:

Cancer is a genetic disease caused by small mutations in DNA that occur in individual’s cells over time. Most mutations are harmless “passenger” mutations while a small minority of mutations termed “driver” mutations unlock the features of cells that lead to cancer. Passenger mutations tell us about the history of the cancer and how mutations arise due to age, carcinogens, or deficient DNA repair processes in cells. Thousands of cancer genomes with millions of mutations are now available. These datasets show that mutations do not occur randomly but instead have nucleotide characteristics (such as C>T mutations correlated with patient age vs. C>A mutations associated with tobacco smoking). However, these “mutational signatures” are based on very limited DNA context, usually just the two nucleotides around the mutated position. The objective of this research project is to develop sequence-base machine learning models that classify or generate cancer mutations based on their mutational process that caused the mutations, or the cancer type they occur in. In addition to developing accurate models, we aim to enhance model interpretation and decipher the sequence features contributing most to model performance, allowing us to better understand how mutations contribute to cancer development and molecular complexity. The student is expected to develop and test ML models using R or python coding, interpret data from computational and biological angles, visualize data, prepare documentation, and present at lab meetings. We will finetune the project based on the computational and/or biological or disease research interests of the student.
 

Researcher: Juri Reimand, Ontario Institute for Cancer Research

Skills required:

  • Data science, machine learning; R or python coding; genomics and/or cancer research experience is a plus

Primary research location:

  • Ontario Institute for Cancer Research 

Research description:

The Simons Observatory (SO) is a new, multi-telescope experiment to study the origin and evolution of the cosmos by measuring the cosmic microwave background (CMB), the oldest light in the Universe. Raw data consist of TBs of timestreams of measured sky brightness recorded each day—adding up to several PB over several years—that need to be reconstructed into 2D maps. However, before this can happen, the timestreams need to be automatically processed to remove noise contaminants and foreground galaxies/stars that block the main signal. In this project, you will work with a small team of researchers in Toronto that is developing machine learning methods to identify and classify these objects. Some development may use existing data from the Atacama Cosmology Telescope (ACT), a precursor to SO. Possible avenues of research include developing ways of retraining our classification algorithms on-the-fly and figuring out how to propagate uncertainties in classification into errors in the final maps. An exciting aspect of this project is that our classification will help enable the search for astrophysical transients, such as flaring stars and gamma ray bursts. The successful candidate will:
  • Write and document code in coordination with the research team led by Profs. Hincks & Hložek. This may include researching suitable methods/algorithms for the code.
  • Participate in regular meetings (~weekly) with team members, with flexibility regarding in-person or remote attendance.
  • Possibly participate in ~weekly telecons with other SO researchers.
  • Optionally attend training sessions and seminars for undergraduate researchers offered in the department of Astronomy & Astrophysics.
  • This is a full time position, but apart from meetings (schedules TBD), work hours are flexible.

Researcher: Adam Hincks, David A. Dunlap Department of Astronomy and Astrophysics, Faculty of Arts and Science, UofT

Skills required:

  • Required: Python coding
  • Highly desirable: understanding of machine learning concepts (e.g., active learning), experience with scikit-learn/sklearn, familiarity with collaborative coding workflows with Github
  • Helpful assets: web development (e.g., CSS, JS, Vue, React), database development (e.g., SQL)

Primary research location:

  • University of Toronto, St George Campus and/or Remote

Research description:

Animal species exhibit characteristic diurnal or nocturnal activity patterns, as a result of adaptations to the daily light cycle. However, we do not fully understand the evolutionary causes or consequences of these activity patterns. We have previously investigated temporal activity patterns across ~4000 species of fish through meta-analysis of the literature, and compared these to the activity patterns of ~5000 species of tetrapods. We demonstrated that nocturnality conferred an evolutionary advantage during mass extinctions, and that frequent nocturnal-to-diurnal transitions facilitated post-extinction diversification across vertebrates (Shafer et al, biorXiv, 2023). However, almost nothing is known about the evolution of this behaviour across the most diverse animal phyla, invertebrates (insects, mollusks, and cnidaria) of which there may be as many as 10 million species worldwide.
The SUDS scholar will extend our macro-analyses, and reconstruct the tempo and evolution of nocturnality and diurnality across invertebrates. Using machine learning assisted text mining, they will perform a systematic literature survey to identify the temporal activity patterns for thousands of species, and use statistical phylogenetic modeling and ancestral reconstruction to compare the evolution of nocturnality and diurnality across the vertebrate and invertebrate animal kingdoms.
 

Researcher: Maxwell Shafer, Department of Cell and Systems Biology, Faculty of Arts and Science, UofT

 

Skills required:

  • Experience with bioinformatics, text mining, machine learning / artificial intelligence, or programming languages (R, Python) are beneficial.
  • Coursework in evolution or evolutionary modeling preferred (but not required)

Primary research location:

  • University of Toronto, St George Campus 

Research description:

To develop safe nanoparticles for use during pregnancy, we first need to understand the cross-talk (communication) between cells of the placenta (barrier between the mother and the baby) and other cells from the mother at different pathological conditions, e.g. cancer. We developed an organ-on-a-chip model to mimic this environment in the lab and investigate the cross-talk between cells. We used this model to generate protemic and transcriptomic data.
A data science student will work with a graduate student and help analyze this big data and enable different visualization approaches of the data. This a great opportunity for the student to work in an interdisciplinary team that works at the intersection between nanotechnology and microfluidics, and learn new wet-lab techniques, and apply their knowledge in data science to solve real-case problems.
 

Researcher: Hagar Labouta, Unity Health Toronto

 

Skills required:

  • A motivated data science student with expertise in R, Python and/or other data packages.
  • Prior experience on omics projects is advantageous.
  • No prior knowledge in nanomedicine or organ-on-a-chip technology is required; this will be a learning opportunity for the student as well.

Primary research location:

  • St. Michael's Hospital 

Research description:

Galactic science has reached the threshold where 3D maps of the radiative properties of the galactic medium can be made, which play a role in many fields including star formation, dark matter, and cosmology. Previous work has combined existing 3D stellar-derived star and dust data with emission observed by the Planck and IRAS satellites to create the first 3D temperature map of the interstellar dust temperature at resolutions of half of degree.
This project, co-supervised with Dr. Ioana Zelko at the Canadian Institute for Theoretical Astrophysics, will make use of novel statistical inference techniques to increase both the angular and distance resolutions of the maps. To increase the resolution in distance, we will explore the use of Bayesian blocks, a nonparametric method to group of data with various underlying properties, in contrast to fitting parametric models or fixed resolution approaches. To increase the angular resolution, we will explore running Bayesian inference analyses that group together different tessellations of the sky.
The main responsibilities of the student will be to review relevant literature, lead coding and data analysis efforts (using simulation studies and/or real data), and meet regularly with me and various collaborators to discuss progress on the project.
 

Researcher: Joshua Speagle, Department of Statistical Sciences, Faculty of Arts & Science, UofT

 

Skills required:

  • All applicants are welcome, but preference will be given to students with some prior experience/background in coding in Python as well as with statistical inference (especially Bayesian inference).

Primary research location:

  • University of Toronto, St George Campus 

Research description:

Insects cause untold damages to agricultural and horticultural crops every year, despite our best efforts to control them with pesticides and management practices. And, because plants are the base of terrestrial food chains, damage caused by insects to plants has the potential to have ripple effects on other species. Agricultural and greenhouse pests, in particular, cause large economic losses every year. The genetic basis of how these pests thrive, or fail to thrive, on their hosts plants remains poorly understood. We are in the unique position of having a large sample of genomic sequence data for an insect amidst a population outbreak, when there numbers were growing exponentially. The research is an integrative mixture of plant biology, genetics and genomics, population genetics, and bioinformatics.
The essence of the project is for student(s) to align next generation sequence data to a reference genome for the pest, identify polymorphic sites in the population, and then link allele frequency differences in the insects with phenotypic differences in their host plants.
 

Researcher: John Stinchcombe, Department of Ecology and Evolutionary Biology, Faculty of Arts & Science, UofT

 

Skills required:

  • Data analysis experience, or a keen interest to learn.
  • Statistics course work.
  • Past experience, or keen interest, in quantitative methods.
  • Experience working with data in a server environment, programming, or a strong desire to learn.
  • An intense desire to apply quantitative reasoning to biology.

Primary research location:

  • University of Toronto, St George Campus 

Research description:

Pose estimation is, traditionally, the problem of converting an image of a human figure to a skeleton-like set of connected vertices (the hand, elbow, shoulder, etc.). Interest in applications like mixed reality video filters, sports match analysis, and crowd tracking has led to recent breakthroughs in inferring 3D poses from 2D images. This is possible because only some theoretically possible poses are plausible given the image (for example, we rarely throw our hands out sideways when seated). Modified transformer architectures (e.g. this paper) have been shown to learn the most plausible pose.
The student will explore repurposing these pose estimation methods to summarize the 3D ‘pose’ of galaxies. Like humans, galaxies have connected structures that follow predictable patterns. A human pose is determined by social context while the pose of a galaxy is determined by the physics of invisible dark matter. By measuring the 3D poses of a large set of galaxies, we can make new otherwise-impossible measurements of that dark matter.
This project fits into a broader program of work developing models and image annotation techniques in astronomy; the student would ideally have or be looking to gain experience working as part of a software development team. ThE project will be co-supervised by Dunlap Postdoctoral fellow Mike Walmsley. The student will be responsible for reviewing published pose estimation literature/code and applying their preferred approach to galaxy images. We will provide the images along with pose-like (keypoint) labels to learn from.
 

Researcher: Jo Bovy, David A. Dunlap Department of Astronomy and Astrophysics, Faculty of Arts & Science, UofT

 

Skills required:

  • Preference will be given to students with a background in Python and deep learning.
  • This project fits into a broader program of work developing models and image annotation techniques in astronomy; the student would ideally have or be looking to gain experience working as part of a software development team.

Primary research location:

  • University of Toronto, St George Campus 

Research description:

Ensuring student success at UofT is a critical objective for the institution. Student attrition and delayed graduations not only affect individual students but also have broader societal and economic implications. The Student Academic Analytics project is a multi-year collaboration across the University and has resulted in the development of a series of data tools focused on elements of undergraduate student success. This project will develop predictive models to help understand factors impacting student retention, graduation, and time to graduation. The ultimate goal is improving understanding of barriers to success and considering support systems and strategies to enhance student outcomes.
The successful candidate will use a variety of curated datasets relating to student success to generate and test models. The datasets include many student (e.g., gender, legal status, high school GPA, course load, course performance), though not EDI data, and environmental characteristics (e.g., academic program design, living in residence) to examine vital questions broadly around three areas.
  1. What characteristics are most associated with being retained from year 1 to year 2?
  2. What characteristics are most associated with greater graduation rates?
  3. What characteristics are most associated with shorter times to graduation?

Researcher: Susan McCahan, Department of Mechanical and Industrial Engineering, Faculty of Applied Science and Engineering, UofT

 

Skills required:

  • Experience in data preparation and analysis using R, Python, or Stata particularly with methods for inference, logistic and linear regression, and predictive models
  • Interest in effective data visualization and storytelling techniques with data
  • Strong communication skills and interest in understanding how data can be used to support student success

Primary research location:

  • University of Toronto, St George Campus and/or Remote

Research description:

The ability to quickly and accurately identify covariate shift at test time is a critical and often overlooked component of safe machine learning systems deployed in high-risk domains. While methods exist for detecting when predictions should not be made on out-of-distribution test examples, identifying distributional level differences between training and test time can help determine when a model should be removed from the deployment setting and retrained. This project will evaluate the Detectron model https://github.com/rgklab/detectron on a wide variety of datasets from the WILDS benchmark (https://wilds.stanford.edu/) and the SUBPOPBench (https://github.com/YyzHarry/SubpopBench) benchmark dataset. This will enable ML researchers to identify promising next steps to build guardrails to protect against distribution shift. The student will be responsible for coding, designing and running experiments using Pytorch on a large scale GPU cluster to study, compare and contrast different methods to detect when a machine learning model might fail on publicly available datasets. This project will introduce the student to slurm, pytorch and empirical research in machine learning.
 

Researcher: Rahul Krishnan, Department of Computer Science, Faculty of Arts and Science, UofT

Skills required:

  • Knowledge of machine learning models (undergraduate class experience a minimum)
  • Familiarity and interest with concepts in causal inference, robustness, distribution shift is a plus
  • Knowledge of programming deep learning models in pytorch,
  • Interest in implementing distributed systems for training machine learning models (e.g. training ML models across multiple GPUs)

Primary research location:

  • University of Toronto, St George Campus 

Research description:

Our galaxy, the Milky Way, is surrounded by numerous small galaxies and star clusters that can be influenced by its gravitational forces, leading to the formation of stellar streams— celestial "rivers" orbiting around our galaxy. These streams offer a unique opportunity for astronomers to delve into the mysteries of galaxy formation and the elusive nature of dark matter. (For an intriguing example, check out our feature in The Globe & Mail: https://www.theglobeandmail.com/canada/article-star-streams-reveal-milky-ways-ravenous-history/)
Thanks to cutting-edge cosmic surveys, we now have access to comprehensive data on millions of stars in our universe, including their full 6D information (position and velocity). The SUDS Scholar will be at the forefront of developing a Bayesian framework to assess the membership probability of each star in potential streams and to characterize the properties of these stellar streams. This involves leveraging vast astronomical datasets, totaling several gigabytes of data, obtained from one of the largest spectroscopic surveys, the Dark Energy Spectroscopic Instrument (DESI, https://www.desi.lbl.gov/).
In this research project, the SUDS Scholar will explore the development and application of innovative statistical and computational techniques. These methodologies are crucial not only for unraveling the secrets hidden within stellar streams but also for paving the way for future astronomical surveys.
 

Researcher: Ting Li, David A. Dunlap Department of Astronomy and Astrophysics Faculty of Arts and Science, UofT

Skills required:

  • Strong skills in Python programming for code development and troubleshooting.
  • Keen interest in Bayesian statistics, nested sampling algorithms, and model comparison.
  • Effective written and oral communication, especially in paper writing and presentations.
  • Demonstrated ability to work well in teams and engage in collaborative efforts.

Primary research location:

  • University of Toronto, St George Campus 

Research description:

Solid organ transplantation is a life-changing intervention for individuals with end-stage kidney, liver, pancreas, lung, or heart failure. Numerous studies have shown that transplantation improves survival and quality of life for recipients. However, as the number of transplants performed in Canada continues to increase, greater attention is needed towards understanding other outcomes important to patients, such as ability to work and earn income. There is currently little known about the labour market implications of solid organ transplantation. Previous studies in this area have been survey-based which can be biased due to small sample sizes and self-reported data. We propose to conduct a population-based retrospective cohort study examining income and employment of individuals who have undergone solid organ transplantation. We will leverage a unique dataset that our team has created called the Canadian Hospitalization and Taxation Database. Income, employment, and health data of all solid organ transplant patients in Canada will be derived from a linkage between the Canadian Institute for Health Information Discharge Abstract Database and the T1 Family File. Analysis will be conducted using advanced econometric and epidemiological methods. This study will be the first of its kind and have a significant impact on the field of transplant medicine.
The student will be expected to conduct a background literature search, develop a protocol, analyze data and prepare a manuscript for publication in a peer-reviewed journal for which they will be the first-author. The student will also attend weekly meetings with the research team and provide regular updates on the progress of the project.

Researcher: Karim Ladha, Unity Health Toronto

 

Skills required:

  • Familiarity with a statistical software package such as R and working with large datasets.
  • Students with an interest in clinical medicine will also have the opportunity to shadow physicians in the operating room and clinic to gain a better understanding of the project.

Primary research location:

  • St. Michael's Hospital and/or Remote

Research description:

With a life-time prevalence of 12% in Canada, major depression is the third highest cause of disability worldwide. Mid-to-late-life depression (MLD) confers a 2-5 fold increase in dementia risk, including for Alzheimer’s disease. Also, it has become increasingly clear that depression has a strong link to neuroinflammation. Thus we emphasize the importance of grounding the study in the link between a neurological signature of depression and its link with inflammation. Furthermore, there are sex differences in the immune system. Men and women may also differ in the types of inflammatory markers they produce as they age. The premier approach to studying the neurological markers of depression is neuroimaging, predominantly magnetic resonance imaging (MRI). MRI has uncovered structural changes, functional alterations, or connectivity abnormalities in specific brain regions or networks in patients of depression. Thus, this project will focus on a retrospective analysis of the Canadian Biomarker Integration Network in Depression dataset, which provides inflammatory and MRI assessments in MLD patients. We aim to (1) synthesize a multimodal neuroimaging signature of MLD severity; and (2) assess the associations between this signature and systemic inflammation.
 
 

Skills required:

  • Usage of the Linux operating system
  • Programming in Python and/or Matlab
  • Basic data-science concepts, e.g. correlation, regression
  • Basic statistical concepts, e.g. t-tests, F-tests, outlier identification
  • (Asset) Advanced data-science methods, e.g. principal-component analysis, weighted gene co-expression network analysis (WGCNA)
  • (Asset) Medical imaging analysis experience

Primary research location:

  • Baycrest Centre and/or Remote

Research description:

Ovarian cancer diagnosed at its earliest stage, often at oophorectomy, is associated with a very favorable prognosis. The ability to incorporate a serum-based biomarker that correlates with the presence of precursor lesions (or at minimum occult cancer), may translate to a delay in age at surgery, an earlier detection of preclinical disease or, stratify those at the highest risk of dying. Our team recently showed that abnormally high platelet counts (i.e., thrombocytosis) are associated with a significantly increased risk of developing and dying of the disease. We propose to leverage data and samples from the largest study of BRCA mutation carriers to determine if a high platelet count correlates with evidence of disease or (pre)invasive cancer. Furthermore, we will perform a time-to-event analysis to evaluate the association between platelet counts and the outcome of interest. Importantly, we will conduct the first systematic evaluation of medications and other exposures with pro-/anti-inflammatory or pro-/anti-platelet properties and cancer risk. Findings from this study will have potential to transform the management of high-risk women across the globe and improve outcomes from this deadly cancer. It is timely that we develop a more personalized approach to the prevention of a fatal disease for high-risk women.

Summary of responsibilities:
  • Complete all necessary training, including on boarding, ethics and institutional requirements
  • Conduct a literature review on topic
  • Formulate an analysis plan and complete project initiation documents
  • Familiarize and contribute to epidemiologic research on-going within the team which may include, but not limited to the follow tasks:
  • Completes data entry and quality control ensuring the accuracy and integrity of data collection; may investigate missing or invalid data and prepare data sets
  • Completes review of medical records to collect research data
  • May prepare and mail out follow-up questionnaires to research participants or collaborators
  • Prioritizes and monitors various study deadlines for their own work
  • Conduct statistical analysis as per the study objectives
  • Contribute preparing materials for proposals, progress reports, presentations, and publications

Researcher: Joanne Kotsopoulos, Dalla Lana School of Public Health, UofT

Skills required:

  • Dependable
  • Hardworking
  • Detail-oriented
  • Team player
  • Independent
  • Strong communication skills
  • Analytic skills
  • Strong organization skills
  • Prior experience in SAS or R is an asset but not required

Primary research location:

  • Women's College Hospital/ UofT and Remote

Research description:

Every month, 83,000 articles are published on Medline. Less than 1% of these will change medical practice and nearly all of the articles that do will be randomized controlled trials (RCTs). Our lab has created a software tool called PaperScrape which monitors Medline and identifies RCTs relevant to internal medicine. PaperScrape then retrieves the abstract, identifies additional information using ClinicalTrials.gov, and makes a call to openAI’s davinci API, which generates a 3-sentence summary. The summaries are disseminated via a twice-monthly newsletter (Trial Files). The student’s role on this project will include enhancing prompt engineering, boosting accuracy, broadening the scope of Trial Files to additional medical fields, and reducing hallucinations in our model. To reduce hallucinations, we first need to benchmark how often they occur. The student will review a random sample of 400 large language model (LLM) outputs and compare them to the published abstract (ground truth). Thereafter, the student will collaborate with the supervisor and related study team members to identify and develop an ideal approach to reduce hallucinations and inaccuracy in the model (e.g., including a second LLM that focuses on comparing the first LLM's output to the published abstract).

 Researcher: Michael Fralick, Lunenfeld-Tanenbaum Research Institute

Skills required:

  • Requirements: completed at least one year of medical school, self-motivated, strong critical thinking skills
  • Not required but an asset: familiar with natural language processing and/or machine learning

Primary research location:

  • Mount Sinai Hospital and/or Remote

Research description:

This project aims to develop a machine learning (ML) model for predicting children's attention ability using features extracted from over 10,000 MRI images from the Adolescent Brain Cognitive Development (ABCD) study. The project will cover diverse data science topics, including big data in neuroimaging, network analysis, complex systems, feature selection, visualization, and cloud computing. The proposed features encompass those derived from T1 and T2 MRI scans, structural connectivity via diffusion, functional connectivity via resting and task-based fMRI, and non-linear metrics like fractal dimensions and Lyapunov exponents. The ABCD database provides predictive labels, including self-report surveys, clinical assessments (e.g., NIH Toolbox Flanker Inhibitory Control), and ADHD-related diagnosis and symptoms. Model interpretability is a priority. Feature selection should be transparent, and their respective contributions should be reportable and visualized. This project is part of a broader study on brain-computer interfaces and neural plasticity.
 
The student will have the opportunity to work with large-scale neuroimaging data, MRI/fMRI preprocessing, experimentation with feature selection methods and ML and/or deep learning models. They will have access to cloud computing resources and Google Vertex AI tools. The student will be supported by doctoral trainees and staff engineers in the lab. The expected deliverables will be:
  • A deep learning model trained on 10,000 MRI images from the ABCD study to predict children’s attention ability.
  • A feature visualization tool to qualitatively and/or quantitatively describe what brain areas and measures are physiologically relevant to attention in children.

Researcher: Tom Chau, Holland Bloorview Kids Rehabilitation Hospital

Skills required:

  • Required:
    • Advanced programming in Python
    • Linear algebra at the undergrad engineering level
    • Neuroscience basics
  • Preferred
    • Previous experience with machine learning preferred
    • Previous experience with querying a database preferred

Primary research location:

  • Holland Bloorview Kids Rehabilitation Hospital 

Research description:

This project will involve students to use advanced machine learning approaches to analyze large scale datasets of psychological tests with responses collected from participants all over the world. The goal of this project is to analyze response patterns from participants, train computational models to optimize the assessment of participants' psychological traits (e.g., personality) and abilities (e.g., IQ) in an effective and efficient manner, and implement computational models in an app for use by real-world users.
The student will be responsible for data cleaning, data analysis, using machine learning techniques to optimize the models, implementing the models on a website for use by users, and writing a paper for publication.
 

Researcher: Kang Lee, Department of Applied Psychology and Human Development, Ontario Institute for Studies in Education, UofT

 

Skills required:

  • Advanced Python programming skills and experiences
  • Experience with machine learning is an asset but not required
  • Machine learning training will be provided

Primary research location:

  • University of Toronto St. George Campus and/or Remote

Research description:

Many adults over 60 years live with some form of hearing impairment that makes speech comprehension difficulty. However, progress in predicting speech-comprehension difficulties in everyday life has been limited, in part, because hearing-science research has mainly focused on speech comprehension of short, disconnected sentences that lack a topical thread and are not relevant to the listener. New approaches to understanding naturalistic speech listening are thus critical to gaining insight into impaired speech processing.
The student will be involved in research that leverages novel natural language processing (NLP) approaches (e.g., sentence embeddings) with graph theoretic approaches (e.g., network centrality) to better understand how individuals listen to naturalistic speech. The student will analyze transcripts of spoken stories and transcripts of individuals recalling these stories after listening to them (using NLP), and will integrate relevant information from these analyses to capture the structure in which individuals comprehend speech (using graph theory). The student will program the analyses and visualize the results using Python/MATLAB. The student will work with the supervisor and a graduate student with biophysics and psychology background. The lab provides ample opportunities to learn how sophisticated data-analysis tools can be used to facilitate research in basic science with clinical applicability.
 

Researcher: Bjorn Herrmann, Baycrest

 

Skills required:

  • Must: Advanced computer programming skills (Python or MATLAB); effective oral and written communication skills; inter-cultural competence; ability to work independently and within a team
  • Beneficial: background in artificial intelligence; experience with natural language processing; knowledge of graph theory; interest in auditory research

Primary research location:

  • Rotman Research Institute at Baycrest Health Sciences

Research description:

Invariant natural killer T (iNKT) cells are unconventional T-cells that are ubiquitously found in mammals and provide immunity to pathogens and against tumours. Through their T-cell receptor (TCR), iNKT cells respond to glycolipid antigens, a class of antigens that is invisible to conventional CD4 or CD8 T-cells. Furthermore, these cells are heterogenous and can differentiate into discrete effector subsets. Using advanced functional genomics, we identified a bona fide cytotoxic iNKT cell subset, which is functionally equivalent to cytotoxic CD8 T-cells. These cytotoxic iNKT cells efficiently kill tumour cells in vitro/in vivo and provide several advantages over their CD8 T-cell counterparts. Interestingly, cytotoxic iNKT cells recognize and kill tumour cells through several modalities that are both TCR-dependent and TCR-independent. Together, our findings indicate that cytotoxic iNKT cells could be used to develop novel cancer immunotherapies with a lower risk of tumour evasion, although a mechanistic understanding of their function remains to be understood. The goal of this project is to leverage available single cell RNA sequencing datasets to identify immune receptors expressed by cytotoxic iNKT cells that may be involved in recognition of or response against tumour cells. Results from this work will inform the rational design of iNKT-targeted cancer immunotherapies. This project is co-supervised with Dr. Thierry Mallevaey (Department of Immunology, Temerty Faculty of Medicine, University of Toronto).
 
The responsibilities of the student will include, although are not limited to:
  • Review the primary literature to find important immune receptors expressed by cytotoxic iNKT cells
  • Assess the quality of available scRNA seq datasets, with consideration of experimental design/statistics
  • Establish a scRNA seq data processing workflow
  • Perform data reduction and clustering of available scRNA seq datasets using R software
  • Identify genes driving cluster formation from scRNA seq data and extract biological knowledge from cluster-specific biomarkers
  • Attend weekly lab meetings (~1 hour/week) and present research updates biweekly
  • If time: validate identified biomarkers on iNKT cell subsets in the laboratory setting via flow cytometry or through performing gene expression analyses using qRT-PCR

Researcher: Jastaran Singh, Department of Immunology, Temerty Faculty of Medicine, University of Toronto

Skills required:

  • Familiar with R and/or Python
  • Understand the basics of next-generation sequencing (NGS)
  • Basic statistical analysis skills (200-level)
  • Data management/organization

Primary research location:

  • University of Toronto, St George Campus and/or Remote

For more information

SUDS.dsi@utoronto.ca

SUDS Info Session Slides are available now.

News

2023 SUDS Scholars showcase their newly acquired data science skills.

Read the full story.

Students may also be interested in the Urban Data Science Corps Summer Internships offered by the School of Cities.

Learn more