SUDS Student Call 

May-August 2023

Call for student researchers!

The Data Sciences Institute (DSI) welcomes carefully selected undergraduate students from across Canada for a rich data sciences research experience. Through the SUDS Research Program, undergraduate students, who are interested in exploring data science as a career path, have an exciting opportunity to engage in hands-on research supervised by DSI member researchers across the three UofT campuses.

The DSI is strongly committed to diversity within its community and especially welcomes applications from racialized persons/persons of colour, women, Indigenous/Aboriginal People of North America, persons with disabilities, LGBTQ2S+ persons, and others who may contribute to the further diversification of ideas.

Below are the SUDS research opportunities for May-August 2023. You can apply and rank your top 3 choices.

See here for information on eligibility, award value and duration, and SUDS programming.

Researcher Opportunities

Research description:

Built and energy infrastructure are sensitive to the impacts of global warming and climate change including increased cooling loads, dangerous heat waves, damaging flooding in cities, permafrost degradation, and other impacts. To model, quantify, and predict these impacts for engineering analysis requires “downscaling”, which relates  available climate information (e.g. weather station data, model output) to the requirements of engineering (site specific information, design requirements), while accounting for incompatible sampling, errors, and uncertainty. Downscaling is a workflow of data processing and model calibration against observations that increasingly uses machine learning and modern data science. We are developing tools to accelerate research and applications using downscaling for engineering applications. These tools are packaged in the UofT Climate Downscaling Workflow (UTCDW), a set of guides, software, and visuals to help integrate downscaling into engineering research and design. 


The SUDS Scholar will test, apply, and extend the data-science methods in UTCDW for engineering problems related to structural wind loads arising from extreme wind events, as well as impacts of climate change on battery electric vehicle range and on biodiesel viability. The project will demonstrate how the UTCDW can effectively translate climate science knowledge into actionable information.

 

Researcher: Paul Kushner, Department of Physics, Faculty of Arts and Science, UofT

Skills required:

  • Grounding in data science methods (programming in R/python, data QC and organization, modelling and machine learning, etc.) as well as classical statistics (hopefully multivariate statistics).
  • An interest in either or both of climate/atmospheric science or civil/environmental engineering is desirable, but no prior experience in these areas is required.

Primary research location:

  • University of Toronto St. George Campus and/or Remote

Research description:

In randomised controlled trials (RCTs), patients can experience low tolerance to the treatment or adverse events and drop out of the study after discontinuing the treatment and therefore result in failure to measure the study outcomes. Consequently, patients with measured data may no longer be representative of the initial study population and this post-randomisation selection bias can partially invalidate randomisation. Ignoring or inadequately handling missing data can lead to estimation biases and loss of statistical power. Although a substantial literature on missing data has been developed, there remains variability in practice on strategies for handling missing data in clinical research and the lack of consensus on which approach was desired under different primary statistical analyses and missing data mechanisms.

The objectives of this study are to compare common approaches to the analysis of missing data in RCT via a series of statistical simulation studies and to provide recommendations on how to handle missing data in RCT. In our simulation study, we will consider continuous, binary, and repeatedly measured study outcomes, and compare missing data methods including complete-case analysis, imputation, and methods that do not require a complete data set. The DSI-SUDS trainee working on this project will gain knowledge and expertise in missing data, the design and analysis of standard RCT, and conducting statistical simulation study in R statistical software.

 

Researcher: Kuan Liu, Institute of Health Policy, Management, and Evaluation, Dalla Lana School of Public Health, U of T

Skills required: 

  • A good foundation in applied statistics and familiarity with concepts and methods to handle missing data. Familiarity with clinical research would be a great asset.
  • Previous programming experience with R is highly desirable.

Primary research location:

  • Dalla Lana School of Public Health, University of Toronto, St George Campus and/or Remote

Research description:

Large-scale data have provided robust evidence that more liberal-leaning communities (e.g., university students, big cities) hold vastly different moral views than more conservative-leaning communities (e.g., working class, rural areas). Liberals and conservatives also differ in their personality, emotional profiles, cognitive styles, attitudes toward science, and lots of other basic psychological characteristics. Why exactly do liberals and conservatives differ in so many ways? Is there a deeper mechanism that can account for all of their differences? This research project seeks to identify the most basic and fundamental psychological ingredients of liberal and conservative ideology. To do so, we need large-scale data collection on numerous variables underlying individual variations in social, moral, and political attitudes. The SUDS Scholar will develop a website for the general public to complete measures of their moral values, political views, attitudes towards science, and many other psychological characteristics. As an informational reward, survey respondents will receive personalized feedback on their ways of thinking. 

 

Researcher: Spike W. S. Lee, Rotman School of Management and Department of Psychology, UofT

 

Skills required:

  • Website development
  • Interest in political, moral, social, behavioral, psychological, or cognitive science

Primary research location:

  • University of Toronto, St George Campus and/or Remote

Research description:

Alzheimer's disease (AD) slowly destroys memory and places tremendous emotional and financial burdens on family and caregivers. Thanks to past research, we now have a good understanding of how harmful proteins build up in brains with AD. But, we still do not know much about why protein buildup causes memory problems. To address this question, we investigate how these proteins disrupt brain activity patterns in a mouse AD model. Data collected from our experiments consists of vast quantities of time series data recorded in freely behaving mice.
The SUDS Scholar will work with two types of data, local field potentials (LFPs) and neuronal spiking activity. LFPs are rhythmic oscillatory activity patterns that reflect the summed activity of many neurons while spiking activity represents the action potentials of individual neurons. We will use several dimension reduction and machine learning approaches to isolate specific abnormalities in the oscillatory and spiking activity in the mouse AD model. We will also build a statistical model to infer cause-effect relationships between memory impairments and the detected abnormal patterns. The detected patterns could develop into a novel biomarker for AD that is indispensable for early and accurate diagnosis.

 

Researcher: Kaori Takehara, Department of Psychology, Faculty of Arts & Science, U of T

 

Skills required: 

  • We welcome students with a strong interest in applying data sciences to real-world brain cell activity.
  • The SUDS Scholar is expected to have the ability to code in Matlab and Python with some familiarity with statistical and machine learning techniques.
  • Some background knowledge of neuroscience and neurophysiology will be an asset.

Primary research location:

  • University of Toronto, St George Campus and/or Remote

Research description:

Ontario Laboratory Information System (OLIS) contains lab tests results for Ontario residents and is used for a wide range of surveillance, research and health decision-making purposes. The data is collected and maintained by Ontario Health (OH) and shared with the Ministry of Health for analytic and accountability purposes. Several fields are submitted as string variables with irrelevant text for analytic purposes, and a new process must be developed to ensure consistent and standardized information. Moreover, the Reference Range variable need to be split into at least two (sometimes more to reflect ranges for different sexes and/or age groups) fields that contain numeric values for the lower and upper limit of the range. This data workflow is currently done for individual tests of interest by project groups versus as a standardized process. Developing a flexible model that would deal with all tests during ETL process would help with the standardization of the entire process. The SUDS scholar will work with content experts and other data scientists to develop and validate a model to process and standardize this large and complex lab database. The model will also need to be retrainable when the lab tests are added or the test kits result in different units or test values; therefore, performance measures will need to be established to allow for monitoring of model sensitivity and specificity and timely retraining when and if required. The SUDS scholar will contribute to a manuscript summarizing the process and outcomes and present the findings to various academic and government stakeholders.

 

Researcher: Laura Rosella, Dalla Lana School of Public Health, U of T

 

Skills required:

  • Knowledge of Natural Language Processing and Python programming.

Primary research location:

  • University of Toronto, St George Campus and/or Remote

Research description:

Machine learning (ML) has profound potential to make healthcare more accurate, accessible, and personalized. Today, ML models developed with medical images are being validated in clinical settings to expedite diagnosis in ophthalmology and radiology. Electronic health records (EHRs) are also being utilized to build ML systems for more inclusive clinical trials design so that study findings translate to clinical practice.
Progress withstanding, ML in healthcare faces significant ethical and methodological barriers. Ethically, it is evident that ML systems can reinforce inequity by perpetuating biases intrinsic to clinical data or poor design choices. Substantial differences in the accuracy of ML models across protected attributes such as race have been observed in numerous applications, recently in a state-of-the-art radiograph diagnostic model. To prevent ML from exacerbating inequity in healthcare, there is an urgent need to precisely measure and subsequently mitigate such algorithmic bias.
This project will examine the state of algorithmic (un)fairness within biomedical informatics. The SUDS Scholar will participate in a thorough literature review as well as several real data analyses (eg. EHRs, medical images) that illustrate the nuances of both identifying and measuring algorithmic bias in healthcare applications. The student will contribute to a manuscript for publication as well as open source software development.

Researcher: Jesse Gronsbell, Department of Statistical Sciences, Faculty of Arts & Science, U of T

Skills required:

  •  R programming and a course in statistical learning (e.g., STA314)

Primary research location:

  • University of Toronto, St George Campus and/or Remote

Research description:

These papers give a good sense at the url https://tiny.cc/iadaptexpt. The SUDS Scholar will start by replicating this work, and then be involved in modifying/developing such algorithms for adaptive experimentation, and/or generalizing statistical hypothesis tests (or Bayesian analyses) for analyzing such data.
"Multi-armed bandit algorithms like Thompson Sampling can be used to conduct adaptive experiments, using data to progressively assign more participants to more effective arms. Such assignment strategies increase the risk of statistical hypothesis tests identifying a difference between arms when there is not one. We explore algorithms that combine the benefits of uniform randomization for statistical analysis, with the benefits of reward maximization achieved by Thompson Sampling (TS). TS PostDiff (Posterior Probability of Difference) takes a Bayesian approach to mixing TS and UR: the probability a participant is assigned using UR allocation is the posterior probability that the difference between two arms is `small' (below a certain threshold), allowing for more UR exploration when there is little or no reward to be gained. We find that TS PostDiff method performs well across multiple effect sizes, and thus does not require tuning based on a guess for the true effect size."

 

Researcher: Joseph Jay Williams, Department of Computer Science, Faculty of Arts & Science, U of T 

 

Skills required:

  • We can teach most skills to students excellent at communication, documentation, proactivity.
  • Helpful if students have background in hypothesis testing (e.g. t-test, testing coefficients in regression model), have analyzed real data-set, meticulous in using checklists and triple checking analysis, explaining process to others, have run simulations before using R/python.

Primary research location:

  • On campus, Bahen Centre

Research description:

Patients with critical illness often receive invasive mechanical ventilation, either for support of respiratory failure or to facilitate recovery after a major surgery. Sedation is often administered to treat agitation or distress during the provision of invasive ventilation. However, multiple studies have shown that minimizing sedation has many benefits, including faster weaning from the ventilator, less delirium, and lower risk of mortality. Sedation is often prescribed on an “as-needed” basis, and even standardized protocols require some degree of interpretation by the bedside clinical team. This type of decision-making is vulnerable to implicit bias. However, it is unknown whether patient race/ethnicity is associated with the use of sedation. We have two aims: 1) measure the association between patient race/ethnicity and the use of sedation during mechanical ventilation, and 2) estimate the effect on ventilator duration and mortality mediated by any differences in use of sedation. This will require both clinical and methodological innovation, but the findings will have immediate relevance for the practice of critical care medicine.
With careful supervision and support, the SUDS Scholar will lead this retrospective cohort study. The SUDS Scholar will be primarily responsible for generating the cohort from the data, implementing the analysis, and interpreting the results. Note that the data (MIMIC-IV and eICU databases) will be available from day 1 of the summer research period. Study design will be done in collaboration with the co-supervisors. Throughout the research, the co-supervisory team will be closely involved and provide hands-on support to help the scholar accomplish these tasks, including meetings at least once per week. The anticipated output by the end of the summer is an abstract suitable for presentation at national or international conferences and a manuscript in preparation for journal submission.

Researcher: George Tomlinson, Toronto General Hospital Research Institute, UHN

Skills required:

  • Familiarity with mathematical modeling
  • Knowledge of the following is helpful: Github, SQL / Bigquery, Python, R, Stan
  • Experience with literature review and scientific writing
  • Independent problem-solving and lateral thinking to overcome programming, logistical, and conceptual challenges
  • Clinical background or interest is an asset
  • Familiarity with research in the social determinants of health is an asset

Primary research location:

  • Toronto General Hospital / ICU offices in MaRS and/or Remote

Research description:

This project involves evaluating and improving the performance of an existing algorithm for a computationally intensive graph optimization task.
Community detection is the process of inductively identifying groups within a networked system. The Bayan algorithm has been developed in a recent project supported by the DSI. It currently outperforms all existing methods in accurate retrieval of communities. Bayan is publicly accessible (github.com/saref/bayan) as a part of the open-source Python library, CDlib.
In this project, the SUDS Scholar will work alongside a senior researcher from Huawei and the project supervisor to complete a series of weekly tasks. After receiving training on mathematical optimization and the design of a branch-and-cut algorithm, weekly tasks will be assigned which involve data analyses, computational experiments, and implementing and testing new algorithm speed-ups in a GitHub environment.
This project leverages the state-of-the-art methods in computing and optimization to push the limits for solving a fundamental NP-complete problem exactly and efficiently. The output of this summer research project contributes to the development of a reliable, open-source, and reproducible algorithm for a robust and theoretically grounded detection of communities, thereby improving upon a widely used computational tool for data science.

 

Researcher: Samin Aref, Department of Mechanical and Industrial Engineering, Faculty of Applied Science and Engineering, Uof T 

 

Skills required:

  • Background and experience in: Python, Python libraries for data analysis (pandas, numpy, seaborn, scikit-learn)
  • Other desired skills (to have or acquire during the project): Python libraries for large-scale/network data analysis and optimization (networkX, CDlib, igraph, Gurobi, joblib), collaborative software development, discrete optimization, data science methods, graph theory and network science, open science, reproducible research

Primary research location:

  • University of Toronto St. George Campus and/or Remote

Research description:

We (https://zhenlab.com/) combine cutting-edge imaging and computational biology tools to address how a nervous system operates. We have developed advanced live imaging tools that allow us to visualize panneural network activities real-time in animals performing complex behaviors (Susoy Cell 2021). In this project, we are investigating the computational principles that underlie the dynamic process and effectiveness of an animal’s ingestion system. We have developed a new live imaging pipeline that allows us to visualize individual steps of the motor patterns and are currently using these data to generate a mathematical model to describe and explore parameters that improve the efficiency of food transport. The SUDS scholar will work with our team to analyze data generated from the imaging pipeline.

 

Researcher: Mei Zhen, Lunenfeld-Tanenbaum Research Institute

 

Skills required:

  • Ideal candidates will be proficient in either image processing, algorithm development or statistical analyses. Knowledge in programming is essential.
  • Students with background in engineering, computer science, applied math and biophysics, are encouraged to apply, but the only key ingredient is a strong drive to learn and apply all the above to real biological problems.

Primary research location:

  • Zhen Lab at the Lunenfeld-Tanenbaum Research Institute and/or Remote

Research description:

We (https://zhenlab.com/) combine cutting-edge imaging and computational biology tools to address how a nervous system develops and operates. In collaboration with the Center for Brian Science, Harvard University (https://cbs.fas.harvard.edu/directory/jeff-lichtman/) and a state-of-art imaging facility (https://lab.research.sickkids.ca/nbif/about/), we are pioneering the field of connectomics, where serial ultrathin electron microscopy (EM) images are used to map the wiring of whole nervous systems. Using datasets from animals at different developmental ages, and exposed to different environments, we can glean insights into the general principles of network assembly and plasticity (Witvliet Nature 2021; Mulcahy Current Biology 2022). In this project, research students will work on the development of automated image processing, segmentation, and volumetric reconstructing of circuits from EM image stacks, working with our team and collaborators currently building machine learning algorithms to address these challenges.

 

Researcher: Mei Zhen, Lunenfeld-Tanenbaum Research Institute

 

Skills required:

  • Ideal candidates will be proficient in either image processing, algorithm development or statistical analyses. Knowledge in programming is essential.
  • Students with background in engineering, computer science, applied math and biophysics, are encouraged to apply, but the only key ingredient is a strong drive to learn and apply all the above to real biological problems.

Primary research location:

  • Zhen Lab at the Lunenfeld-Tanenbaum Research Institute and/or Remote

Research description:

Using Big Data, text analysis, and machine learning techniques, we will examine how liberals and conservatives differ in their moral values, cognitive styles, antiscience attitudes, and other psychological characteristics. Specifically, we have access to more than 5 million news articles from about 500 media outlets. The outlets vary widely in ideological leaning, from far left to far right. They also vary in veracity, from mostly fact-checked to mostly fake, conspiracy, and pseudoscience news. Our team has already completed preprocessing of all the news articles. The SUDS Scholar will apply automated text analysis and machine learning techniques to these articles in order to identify linguistic patterns and biases depending on how liberal or conservative the media outlet is.

 

Researcher: Spike W. S. Lee, Rotman School of Management and Department of Psychology, UofT

 

Skills required:

  • Text analysis; natural language processing; machine learning

  • Interest in political, moral, social, behavioral, psychological, or cognitive science

Primary research location:

  • University of Toronto, St George Campus and/or Remote

Research description:

Deep learning (DL) can potentially substitute traditionally used expensive quantum mechanical (QM) methodologies in chemistry to predict various chemical properties by offering to generate fast and accurate mathematical models better suited to everyday computers. However, training DL models like
neural networks require hundreds of thousands to millions of data points of a particular chemical property to attain good generalization. Such a requirement is currently hindering the development of DL models for chemistry because the generation of large training data using accurate QM methodologies has an infeasible computational cost.
The SUDS Scholar will tackle this problem and utilize a novel quantum mechanics-based approach to efficiently yet accurately generate large QM training data (hundreds of thousands to millions of data points) for a chosen chemical property. Once the QM training data becomes available, they will utilize it to generate new DL models for the chosen property to demonstrate the acceleration in DL for chemistry provided by applying the novel quantum mechanics-based approach.

 

Researcher: Hans-Arno Jacobson, Edward S. Rogers Sr. Department of Electrical and Computer Engineering, Faculty of Applied Science and Engineering, UofT

 

Skills required:

  • Familiarity with PyTorch, Keras, and TensorFlow.
  • Knowledge of active learning, transfer learning, and chemistry is preferred but not required.

Primary research location:

  • University of Toronto, St George Campus and/or Remote

Research description:

In the Salmena lab, we commonly mine numerous cancer databases to identify genetic and transcriptional alterations that may be driving disease or may serve as a novel therapeutic strategy in cancer. Furthermore, we routinely perform RNA sequencing to identify transcriptional alterations subsequent to genetic or pharmacological interventions. The SUDS Scholar be involved in projects mining the TCGA database to identify genetic and transcriptional alterations associated with leukemia and pancreatic cancer development and progression. Furthermore, they will assist in the analysis of RNA sequencing data from genetic knockout mouse models to aid in the discovery of novel mechanisms of disease.

 

Researcher: Leonardo Salmena, Department of Pharmacology and Toxicology, Temerty Faculty of Medicine, UofT

 

Skills required:

  • Motivated to learn about cancer biology and be able to perform various computational methods including DNA and RNA sequencing analysis, gene ontology, patient data mining, perform clinical correlations.

Primary research location:

  • Department of Pharmacology and Toxicology, on-campus

Research description:

People adjust the way they speak depending on the context. What factors influence this kind of style shifting? For example, past work has suggested that women adjust their language more than men in male-dominated social settings. We use data science methods on large-scale social media text (from Reddit) to test such hypotheses at scale, and increase our understanding of the relationship between language expression and social constructs like gender and power.
This research contributes to a growing scientific literature using computational models and statistical analyses to study how people use language in social media interaction, and what that reveals about social attitudes and norms. Advances in these topics are needed to inform the design and moderation of social media platforms, and of artificial intelligence systems that can effectively interact with people.
This project will take a truly interdisciplinary data sciences approach, with possible SUDS Scholar tasks including: on the computational side, learning to write programs to process social media data, using current methods from natural language processing; on the linguistics and cognitive science side, designing and running an online survey; on the statistics side, deploying various data analysis methods to analyze both social media data and human survey data.

 

Researcher: Suzanne Stevenson, Department of Computer Science, Faculty of Arts & Science, UofT

 

Skills required:

  • Strong interest in languages and/or linguistics; computer programming experience (university-level course(s) and/or practical experience); and demonstrated quantitative competence (e.g., stats or quantitative methods course(s) and/or project experience).
  • Familiarity with or interest in computational linguistics and/or psycholinguistics and/or cognitive science is desirable. 

Primary research location:

  • University of Toronto, St George Campus and Remote

Research description:

The delivery of medical imaging services involves significant resources, including equipment, materials and labor. Proper coordination of these resources positively impacts the productivity of the radiology department, which, in turn, influences access to care, costs, and quality. Currently, the most commonly used techniques to measure work task productivity in radiology departments are manual and have not changed in over 40 years. This research project focuses on computer vision-based approaches to capturing workflow activity data related to the delivery of medical imaging services.
The SUDS Scholar will work closely with the principal investigator and students in the lab to design and implement edge computing prototypes using devices such as the Raspberry Pi and Google Coral Dev Board to collect video data. The SUDS Scholar will use open-source frameworks such as Gstreamer for image handling and Tensorflow to apply deep learning to the image data. The data collected from these approaches will be used to extract metrics important in department performance such as cycle time, flow rate, capacity and utilization. These video-based approaches may provide a nonintrusive, easy, inexpensive, and rapid mechanism for generating operational information and knowledge on the productivity of the medical imaging department.

 

Researcher: Andrew Brown, Unity Health Toronto

 

Skills required:

  • Passionate about technology, with an interest in learning new areas outside his/her comfort zone
  • Self-motivated and capable of working independently
  • Strong work ethic, ability to be proactive and responsive in high-stakes situations
  • Experience with the Python programming language and Linux would be an asset

Primary research location:

  • St. Michael's Hospital and/or remote

Research description:

Our team has validated a method of environmental sampling for SARS-CoV-2 in which swab samples are taken from floors and processed using quantitative polymerase chain reaction (qPCR). Our work in hospitals and long-term care (LTC) homes has shown that COVID-19 burden is reflected in built environment samples, and that SARS-CoV-2 can be detected in the built environment days before an outbreak is recognized. Based on this promising data, we believe that built environment sampling is feasible as a new paradigm for SARS-CoV-2 surveillance. We have data from approximately 5,000 swabs taken at LTC homes, along with covariates for each home.

We are seeking a SUDS Scholar to help identify the sampling covariates that provide greatest power for predicting pending outbreaks in LTC, and to help the team to design a machine learned model that would provide forecasts for the current and upcoming risk of an outbreak. We will use multiple supervised machine learning techniques ranging from logistic regression to extreme gradient boosting.

 

Researcher: Michael Fralick, Lunenfeld-Tanenbaum Research Institute, Sinai Health System

 

Skills required: 

  • Programming in R (intermediate to expert)
  • Machine Learning experience an asset
  • Strong critical thinking skills
  • Medical knowledge (level of second year medical student or higher) an asset
  • High level interpersonal, verbal, and written communication skills

Primary research location:

  • Lunenfeld-Tanenbaum Research Institute and/or Remote

Research description:

Recording from the peripheral nervous system can be used to decode control signals exchanged throughout the body, with applications in creating assistive technologies and treating chronic diseases. Our laboratory has collected unique datasets from multi-channel nerve cuff electrodes, which record data from the surface of nerves. We have developed neural networks to decode these recordings by classifying the source of each detected neural event. A key next step is to ensure that the performance of these networks remains stable over time. Using existing data, this project will involve three steps:
1) Characterize the changes in the signals over several repeated recordings.
2) Create data augmentation methods to replicate these changes (in other words, generate synthetic data that mimics the changes that occur over time in the real data).
3) Demonstrate that these new data augmentation strategies can lead to more stable neural network performance over time after training only on augmented baseline data.

The SUDS Scholar will have the opportunity to gain a better understanding of real-world challenges in data science applications, and of strategies to manage these obstacles when developing deep learning systems.

 

Researcher: Jose Zarrifa, Toronto Rehabilitation Institute (KITE), UHN

 

Skills required:

  • Experience with deep learning, including data augmentation, transfer learning, and rigorous performance evaluations.
  • Experience in signal processing is strongly preferred.

Primary research location:

  • KITE Research Institute - Toronto Rehab - UHN and/or Remote

Research description:

The main focus of this project would be designing computer vision models that support fast, flexible and composable representations of 3D physical spaces. Recently, a lot of research has emerged along these lines (e.g. Neural Radiance Fields, Scene Representation Networks), yet a lot of desirable properties are still missing from these representations, such as – composable representation of objects, non-rigid deformations, fast rendering. We will be taking inspiration from recent advances in computer graphics, unsupervised, self-supervised or adversarial learning to build representations for these tasks to build better 3D models from images.

The two key problems we wish to answer are:

a.) What are the right inductive biases that should be built in the neural architectures to learn from 3D models from 2D images?

b.) How to exploit data on the internet (e.g. other datasets) to improve the speed of learning such models?

Working on the project, the SUDS Scholar will gain valuable experience in training neural networks and implementing novel computer vision pipelines and would eventually write and submit their work at a top-tier AI/Robotics conference or workshop (CVPR, NeurIPS, ICRA, etc).

 

Researcher: Igor Gilitschenski, Department of Mathematical and Computational Sciences, University of Toronto Mississauga, UofT

 

Skills required:

  • Solid background in coding up neural networks (either in Pytorch or Tensorflow) and a good understanding of linear algebra, popular neural architectures (CNNs, LSTMs, Transformers, etc).
  • Awareness of the recent literature in 3DV, graphics, self and unsupervised learning would be a big plus.

Primary research location:

  • University of Toronto Mississauga and/or Remote

Research description:

Horizontal Gene Transfer (HGT) is a process in which organisms acquire foreign genes from different species. HGT contributes to organismal evolution and has been an important source of genetic diversity. HGT was commonly identified in prokaryotes but rarely reported in eukaryotes. However, our understanding of HGT in eukaryotes is quickly expanding with the production of genomic resources and the development of detection tools. The Kingdom Fungi represent a striking example, especially the ones known as obligate symbionts which interact with various host organisms intimately. Our research group has dedicated to detecting fungus-related HGT elements and has discovered several such cases in early-diverging groups including the mosquito gut-dwelling fungi (doi:10.1093/molbev/msw126), herbivorous mammal rumen fungi (doi:10.1128/mSystems.00247-19), amphibian gastrointestinal fungi (doi:10.1534/g3.120.401516), and photobionts associated fungi (doi:10.1016/j.cub.2021.01.058). This project aims to identify novel HGT using lab newly assembled fungal genomes representing underexplored lineages on the Tree of Life.
The SUDS Scholar working on this project will help refine lab existing pipelines and analyze the fungal genomes as well as related host data to reconstruct the evolutionary history of identified genes by conducting comparative genomics. A high-impact research report will be accomplished and aimed for publication at the end of the project.

Researcher: Yan Wang, Department of Biological Sciences, University of Toronto Scarborough, UofT

Skills required:

  • Minimum requirements: Basic programming skills in Linux, Python, and R; effective communication skill.
  • Preferred qualification: strong interests in comparative genomics, data visualization, and competencies in writing and public speaking.

Primary research location:

  • University of Toronto Scarborough and/or Remote

Research description:

Automatic speech processing (speech recognition, speech synthesis) has a serious problem with “low-resource languages”: languages which, unlike the few dozen languages for which such tools do exist (English, Spanish, Mandarin, etc), lack the necessary “resources” - typically large, linguistically or textually annotated speech recording data sets - required to train traditional statistical/machine-learned speech tools. Low-resource languages represent the vast majority of the world’s thousands of languages, language varieties, and dialects/accents, and are typically, but not always, minority languages, languages with less sociopolitical clout, or overtly stigmatized.
Recent developments in “self-supervised learning” of representations for speech (e.g., wav2vec, HuBERT) have demonstrated the power to substantially reduce the linguistic resources needed to develop speech tools. However, the promise for low-resource languages and varieties is still mostly limited to research papers, and there is no careful benchmark to measure exactly how well these new representations actually do in real-life low-resource settings.
The SUDS Scholar will work with a team of linguists and engineers in the Perceptimat research team at the University of Toronto, and remotely with colleagues in the Cognitive Machine Learning team in Paris, on extending the Zero Resource Speech Benchmark to include linguistically-informed benchmarks for a low-resource language.

 

Researcher: Ewan Dunbar, Department of French, Faculty of Arts & Science, UofT

 

Skills required:

  • Students should be motivated to do research in computational linguistics and willing and able to make a full-time commitment to the project.
  • Solid experience with Python programming and Unix/Linux necessary, and with developing and testing code in a team
  • Strong experience with neural network training preferred.
  • Experience with speech and language would be a bonus.

Primary research location:

  • University of Toronto, St George Campus and/or Remote

Research description:

Government agencies, including the Ontario Ministry of Health and Long-Term Care collect and use a plethora of administrative data for planning, policy and program development and evaluation. In most cases, trained analysts access these data to develop robust analytic products to supply evidence for decision-making; however, this kind of robust analytic support may not be required for simple questions from the public, media, policy, and program staff of the ministries. Our team has successfully experimented with GPT-3 platform to create a product to query COVaxON database containing data on all vaccinated Ontarians. We would like now to expand the coverage to the remaining administrative databases, including acute hospitals, ambulatory care, physician services and long-term care reporting to expand the applications of such data and queries across health system applications. The SUDS scholar will work with a team to develop and validate a model to query health databases using the GPT-3 platform as well as develop a user-friendly interface. The SUDS scholar will contribute to a manuscript summarizing the process and outcomes and present the findings to various academic and government stakeholders.

 

Researcher: Laura Rosella, Dalla Lana School of Public Health, UofT

 

Skills required:

  • Knowledge of Natural Language Processing and Python programming.
  • Potentially also familiarity with GPT-3 platform.

Primary research location:

  • University of Toronto, St George Campus and/or Remote

Research description:

Large scale sequencing of animal genomes and transcriptomes has revolutionized biological research. These animals are often infected with pathogens and parasites and these microbes are incidentally sequenced at the same time. Methods have been recently developed to search this large amount of sequencing data to efficiently identify these parasites.
Microsporidia are a large phylum of obligate eukaryotic intracellular parasites that cause death and disease in humans. Microsporidia species have evolved to infect most types of animals. Several species of microsporidia infect honeybees, shrimp, crab, and fish species, reducing their growth and yields, and causing economic losses to these agricultural industries. Concerningly, microsporidia are emerging pathogens, with infections caused by many species only being identified within the last two decades.
Using a variety of computational approaches, the SUDS scholar will identify samples deposited in the Sequence Read Archive that are infected with microsporidia. The Suds scholar will then assemble the 18S SSU RNA sequences within these datasets. Microsporidia sequences will be identified and compared phylogenetically. This approach will allow us to identify both novel microsporidia species as well as new hosts for previously identified microsporidia species. By comparing these newly identified sequences, we will gain insight into the specificity of microsporidia for different hosts.

 

Researcher: Aaron Reinke, Department of Molecular Genetics, Temerty Faculty of Medicine, UofT

 

Skills required:

  • Experience with coding in python, R, or other languages is preferred. Also, previous experience with cluster computing.
  • Previous experience with genomic assembly and analysis is preferred but not necessary.

Primary research location:

  • On site, MaRS west tower, 16th floor.

Research description:

Chemical property predictions using quantum machine learning is a highly relevant and interdisciplinary topic at the intersection of machine learning, quantum algorithms, and chemistry. This project will focus on how chemical properties can be predicted quickly and accurately with the help of quantum
computers.
The SUDS Scholar will be involved in developing new quantum analogs of conventional machine learning models, particularly for the task of regression. They will mainly work with quantum mechanically calculated chemical data and investigate the nature of generalization error attained by
the various developed quantum models. In this context, they will be required to inspect whether there is any benefit/advantage due to the use of a quantum computing approach in making accurate predictions for chemical properties. Throughout each step of the project, there is room for creativity and opportunities to try techniques and technologies that have not yet been explored. Much of the quantum computation done in this project will require the student to code in Python utilizing software libraries such as Qiskit and Pennylane. The student will also be required to extend the implementations
to enable experimentation via execution on actual quantum processing devices.

 

Researcher: Hans-Arno Jacobsen, Edward S. Rogers Sr. Department of Electrical and Computer Engineering, Faculty of Applied Science and Engineering, UofT

 

Skills required:

  • Background in machine learning and ideally in quantum computing. Knowledge of chemistry is preferred but not required.

Primary research location:

  • University of Toronto, St George Campus and/or Remote

Research description:

The SUDS Scholar will be using cutting edge potential outcomes framework mediation analysis to explore whether the relationship between lifestyle and biological factors with disease and death are mediated by ectopic fat (i.e. fat in ‘bad’ locations most strongly linked with disease). Binary regression and advanced survival methods (Fine & Gray regression) will be undertaken.
Data is sourced from the UK Biobank, the largest biomedical imaging dataset available with ~50,000 MRI scans of ectopic fat depots (the mediator) and ~20 years of follow-up. The student will produce models for each combination of the outcome (cardiovascular disease, cancer, diabetes, Alzheimer’s disease, COVID, and all-cause and cause-specific mortality) and exposure (physical activity, dietary factors, and blood measures such as lipids, HbA1c and hormones) measures.
The SUDS Scholar will be co-supervised by Prof. Kirkham's postdoc, Rebecca Christensen, a PhD epidemiologist and Prof. Kirkham. This will allow for comprehensive statistical training, as well as the opportunity to participate in human data collection and custom MATLAB analysis of ectopic fat using state-of-the-art 3T MRI. This unique opportunity for the student provides training that bridges the gap between clinical and epidemiological research as well as novel data science methods.

 

Researcher: Amy Kirkham, Faculty of Kinesiology & Physical Education, UofT

 

Skills required:

  • Intermediate R or SAS knowledge, and at least one statistics course.
  • Able to work well in a team and independently.
  • Experience engaging with diverse audiences (e.g., lay, academics, and physicians), and preparing statistical findings for publication are an asset.

Primary research location:

  • Goldring high performance Centre at the University of Toronto, St George Campus and/or Remote

Research description:

Many machine learning models are able to make accurate predictions, tailored to the user, for a range of tasks such as personalized medicine, mental health, matching platforms for services and people. However, these models require the use of sensitive personal data for both training models and inference. This project aims to tackle various challenges in providing strong data privacy guarantees while retaining the benefits of the models including:

i) the development of efficient and fast platforms and systems;

ii) evaluating the tradeoffs between model accuracy and various privacy-preserving mechanisms; and

iii) building real world applications and testing their effectiveness.

 

Researcher: Nandita Vijaykumar, Department of Computer and Mathematical Sciences, University of Toronto Scarborough, UofT

 

Skills required:

  • Strong programming skills required.
  • Any computer systems/ML/privacy knowledge desirable.

Primary research location:

  • University of Toronto, St George Campus and/or Remote

Research description:

Text-to-speech synthesis systems - the systems that are central to tools like Siri and Alexa, which transform the systems’ (ultimately textual) responses to queries into realistic human-like speech - are typically created for one single language and one single voice at a time, by using supervised machine learning, i.e., by starting from a collection of labelled (i.e. textually transcribed) speech recordings and by applying machine-learning to build systems that map the textual transcriptions into acoustic features that can be played back as audio. More recently, however, research has begun to explore the possibility of multilingual, or even universal, speech synthesis, which would allow for the generation and combination speech sounds from multiple human languages. However, such research is in its infancy, and it is unclear whether these research systems satisfy basic consistency criteria that would be necessary in order for them to be of practical use.

The SUDS Scholar will work with a team of linguists and engineers in the Perceptimat research team at the University of Toronto, and remotely with colleagues at Université-Grenoble-Alpes, on developing simple phonetically-informed consistency measures for typical current approaches to multilingual speech synthesis. Time permitting, the student will explore approaches to building better systems than the ones that currently exist.

 

Researcher: Ewan Dunbar, Department of French, Faculty of Arts & Sciences, UofT

 

Skills required:

  • Motivated to do research in computational linguistics and willing and able to make a full-time commitment to the project.
  • Solid experience with neural network training and with developing and testing code in a team.
  • Experience with speech and language would be a bonus.

Primary research location:

  • University of Toronto, St George Campus and/or Remote

Research description:

In human PSC differentiation towards lung epithelial cell phenotypes, a heterogenous population of progenitor cells emerge in culture that seemingly gives rise to different cell types. However, to better understand this, isolation of these progenitor cells is required to functionally and phenotypically interrogate their developmental potential. The SUDS Scholar will analyze single cell RNA sequencing datasets generated in the lab to computationally determine the expression of unique cell surface molecules that can be used to isolate each progenitor population for validation. The SUDS Scholar will get to use their computational skills to analyze large datasets to help identify new cells/genotype/phenotypes and GRN regulating human-specific developmental or disease mechanisms.

 

Researcher: Amy Wong, Developmental & Stem Cell Biology Labs, SKH

 

Skills required:

  •  Computational skills (Python, R, Linux)

Primary research location:

  • Hospital for Sick Children Toronto, On site

Research description:

Wound healing begins with a calcium signal at the wound site that propagates as a wave throughout the tissue. Our data indicate that calcium wave propagation is important for rapid wound closure. However, studies of calcium wave propagation have been impeded by poor automated segmentation tools to identify cells from time-lapse microscopy images and lack of mechanistic models that predict the spread of calcium. We have developed a recurrent deep neural network which segments objects from microscopy movies with increased accuracy.
The SUDS Scholar will train, test, and improve upon the current network using a large dataset (~2000 images) of cell outlines. They will apply the network to an existing collection of wound healing movies that visualize cell outlines and calcium. They will use the segmentation results to establish the range, time-scale, and potential anisotropy and limits of calcium spreading. The student will use an existing mathematical model of calcium propagation to fit the experimental results and determine potential molecular mechanisms underlying calcium spread. Overall, the SUDS Scholar will contribute machine learning tools and models to investigate embryonic development and tissue repair, and will generate hypotheses about the mechanisms that mediate collective cell movements during embryonic wound repair.

 

Researcher: Rodrigo Fernandez-Gonzalez, Institute of Biomedical Engineering, Faculty of Applied Science and Engineering, UofT

 

Skills required:

  • Strong interest in data-based approaches to biological research.
  • Previous coding experience is required, with a background in Python and Numpy/Scikit-image preferred.
  • Experience with deep learning (Keras/TensorFlow) or mathematical modelling is a plus.

Primary research location:

  •  MaRS2 West Tower, On site

Research description:

The SUDS Scholar will utilize multi-modal molecular and imaging data to build machine learning framework for cancer risk stratification.

 

Researcher: Sushant Kumar, Princess Margaret Cancer Centre, UHN

 

Skills required:

  • Machine Learning, Computer Science, Programming, Statistics, Applied Math

Primary research location:

  • Princess Margaret Cancer Research Tower and/or Remote

Research description:

The SUDS Scholar will be studying historic factors and current policies associated with land and urban development and their effects on growth and social inequality. They will employ a broad range of research techniques, often combining theoretical and empirical methodologies to answer policy-relevant economic questions using detailed micro-data. In particular, they will work with historical data, tax assessor data, geo-coded spatial data, and administrative firm data.

 

Researcher: Aradhya Sood, Department of Management, University of Toronto Scarborough, UofT

 

Skills required:

  • Ideally, candidates should be interested in urban and spatial economics, economic history, or public policy and have taken an econometrics or statistics courses. The RA will work on procuring data, combining various datasets together, and estimating regression models.

  • Required qualifications::
    Stata or R
    Econometrics
    Demonstrated ability to work independently
    Ability to work with multiple datasets in an organized fashion

  • Preferred qualifications:
    Experience georeferencing and geocoding and GIS
    Python/Matlab/Julia

Primary research location:

  • Virtual/Remote

Research description:

Learning analytics involves the collection and analysis of student and course data, including interactions with educational technology such as a learning management system (LMS), for the purposes of better understanding and optimizing student learning and learning environments. Most investigations of student engagement have focused on the course-level activity analysis. However, this project will investigate whether the analysis of engagement across courses and academic terms may inform program-level practices in learning design that can benefit all students.
The project will involve identifying, visualizing and analyzing LMS and administrative data from the University of Toronto to investigate how the data might effectively be used to identify student patterns of activity across courses and academic terms. Possible questions that could be considered include: Are there patterns of student activity across courses that appear to be productive and patterns that do not? How do these patterns vary with student program of study or prerequisite courses? Can LMS activity data be used as a proxy for student engagement across courses taken concurrently? What program-level aspects are related to changes in a student’s odds of dropping out of a course? The particular question will be determined based on the interests and background of the SUDS Scholar.

 

Researcher: Alison Gibbs, Department of Statistical Sciences, Faculty of Arts & Science, UofT

 

Skills required:

  • Experience in data preparation and analysis using R, particularly with methods for visualization, inference, logistic and linear regression, mixed models, and survival analysis.
  • Experience with the R Shiny package may also be an asset.
  • Strong communication skills and interest in understanding how data can be used to support student learning.

Primary research location:

  • University of Toronto, St George Campus and/or Remote

Research description:

The SUDS Scholar will use data from the UK Biobank, a large prospective study of more than 500,000 adults, to explore whether lifestyle factors can predict the occurrence of different diseases (cardiovascular disease, cancer, diabetes, Alzheimer’s disease, COVID-19) and death (all-cause and cause specific). Approximately 20 different lifestyle factors will be selected based on being clinically assessable via self-report or low-cost measures (e.g., body mass index). The use of these types of predictors will allow for the incorporation of our findings into primary and secondary clinical care settings to improve patient outcomes.
The SUDS Scholar will use unsupervised machine learning (kmeans and fuzzy clustering) to identify naturally occurring clusters of participants based on their lifestyle factors. The student will use various metrics (e.g., elbow plots and silhouette widths) to identify an optimal number of clusters. Once the clustering has been completed, the student will use binary and survival regression analysis to explore the relationship between these clusters and each outcome factor.
The SUDS Scholar will be co-supervised by Prof. Kirkham's postdoc, Rebecca Christensen, a PhD epidemiologist and Prof. Kirkham to ensure there is the requisite statistical and subject matter expertise for the success of the project.

 

Researcher: Amy Kirkham, Faculty of Kinesiology & Physical Education, UofT

 

Skills required:

  • Intermediate R or Python knowledge, and at least one statistics course.
  • Able to work well in a team and independently.
  • Basic knowledge of nutrition or physical activity, and experience preparing statistical findings for publication are an asset.

Primary research location:

  • Goldring high performance Centre at the University of Toronto, St George Campus and/or Remote

Research description:

The SUDS Scholar will develop and apply machine learning models and algorithms to solve predictive problems in healthcare. There are a variety of ongoing projects in the lab that they may be assigned to based on their research interest. Below is a description of two ongoing projects in the lab.

a) Identifying patients at highest risk of dying on the liver transplant waitlist both across the United States and in Toronto. This project will leverage large datasets (~100K patients in the US and ~2K patients in Canada) to identify patterns in longitudinal (time-varying) clinical biomarkers that are predictive of patient mortality in order to help clinicians, patients and hospital systems make more informed choices on allocating livers. The models developed will be assessed to ensure their predictive performance is equitable across various patient subgroups.

b) GEMINI (https://www.geminimedicine.ca/data) is a large scale dataset of clinical data from patients in hospitals across Ontario. Our group is building tools to create statistical guardrails to understand and assess the fairness, trustworthiness and the failure modes of predictive risk scores used in different hospitals.

The SUDS Scholar will be paired with a graduate student mentor who will work with them to provide additional guidance, support and mentorship during the summer internship.

 

Researcher: Rahul Krishnan, Department of Computer Science, Faculty of Arts & Science, UofT

 

Skills required:

  • Completed an introductory course on machine learning (at UofT this would be CSC311).
  • A candidate looking to lead their own research project would need to have completed an advanced course on deep learning and/or probabilistic modeling.
  • The SUDS Scholar is expected to be proficient in python, atleast one automatic differentiation framework such as Pytorch/JAX or tensorflow and be eager to work with an interdisciplinary team of scientists and clinicians to advance the frontier of ML in healthcare.

Primary research location:

  • University of Toronto, St George Campus and/or Remote

Research description:

The SUDS Scholar will be working on this research and will be a co-author on the publication.
Introduction: Between 2000 and 2019, the number of systematic reviews (SRs) increased more than 20-fold. With this explosion, duplication of SRs on the same clinical, public health, or policy questions becomes problematic for decision makers who must choose between the different SRs.
Objectives: We aim to develop an algorithm (called WISEST) to support decision makers in comparing and choosing between multiple SRs on the same question.
Features: The items mapped from the AMSTAR 1, AMSTAR 2 and ROBIS tools to assess the quality of SRs will be used as the “quality” features. These manual tools have undergone face validity and usability testing.
Work: (a) with support, clean a dataset of 100 SRs; (b) develop code to train, test and validate machine learning models (Random Forests and Neural Network) classification models such as Facebooks’ StarSpace and Fasttext. We will use a “supervised” learning model which learns by making predictions given examples of data, and the models are corrected to better predict the expected target outputs in the training dataset; (c) develop a shiny app to house the pilot (beta) WISEST tool (https://engineering-shiny.org/successful-shiny-app.html; and https://shiny.rstudio.com/tutorial/).

 

Researcher: Andrea Tricco, UHN

 

Skills required:

  • Skills in app development in any programming language is desirable; especially in Python programming.
  • Familiarity with natural language processing, applied machine learning, TensorFlow, Scikit-learn, PyTorch, and Huggingface projects is desirable but not necessary.
  • Students with a special interest in AI, machine learning and programming are encouraged to apply.

Primary research location:

  • Remote with in-person feasible

Research description:

Our Milky Way is surrounded by a lot of small galaxies and globular clusters. These systems can be disrupted to form stellar streams orbiting around the Milky Way, which can be used to study the formation of galaxies as well as the nature of dark matter. (A good example can be found in our latest story from The Globe & Mail: https://www.theglobeandmail.com/canada/article-star-streams-reveal-milky-ways-ravenous-history/ )
The SUDS Scholar will develop a Bayesian model to assess the membership probability of each stream candidate star as well as the properties of these streams, using the positional, kinematic, and chemical information from large astronomical datasets from various modern surveys. This project will explore developing and applying new statistical and computational techniques which will be largely used in the next-generation astronomical surveys.

Researcher: Ting Li, David A. Dunlap Department of Astronomy and Astrophysics, Faculty of Arts & Science, UofT

 

Skills required:

  • Basic computer programming skills in Python required; interest in working on a research project involved in Bayesian statistics, nested sampling algorithms, and model comparison.
  • Programming skill in C++ a plus but not a requirement.

Primary research location:

  • University of Toronto, St George Campus and/or Remote

Research description:

Large astronomical surveys obtain high-resolution spectra of millions of stars that can be used to understand the formation and evolution of our own Milky Way Galaxy. But measuring the abundance of different chemical elements from these spectra in an automated manner is challenging. In this project, the SUDS scholar will adapt our successful astroNN (https://astronn.readthedocs.io/en/latest/) methodology for determining elemental abundances using a deep-learning technique to spectra from the new SDSS-V survey and use them to make a chemical map of our Milky Way. Specifically, the SUDS scholar will adapt the existing astroNN implementation (in tensorflow and keras) so it can be applied to spectra from SDSS-V, run tests of the adapted implementation and check the accuracy of results, work with other people in the group to incorporate the adapted technique into the SDSS-V pipeline, and explore the chemistry of the Milky Way with the resulting abundances.

 

Researcher: Jo Bovy, David A. Dunlap Department of Astronomy and Astrophysics, Faculty of Arts & Science, UofT

 

Skills required:

  • Deep learning, neural networks, tensorflow, keras
  • Interest in astrophysics

Primary research location:

  • University of Toronto, St George Campus and/or Remote

Research description:

Machine learning is widely used to support decision making in a range of application domains. In safety critical applications, such as in healthcare, it is vital to be able to explain the reasoning of the machine learning system and to guarantee that certain dangerous or unwanted behaviors will not appear. In this research project, the goal is to investigate how mathematical optimization can be used to train interpretable machine learning models that are guaranteed to satisfy a set of required constraints.

 

The SUDS Scholar working on this project will:

  • Read, implement, and empirically evaluate state-of-the-art mathematical optimization models for machine learning from the literature.
  • Develop new mathematical models and extend existing models from the literature with an emphasis on interpretability and supporting domain-specific constraints; Efficiently implementing these models using state-of-the-art optimization software.
  • Run experiments to evaluate the efficiency and effectiveness of the mathematical models using standard benchmarks and different real-world datasets.

 

Researcher: Eldan Cohen, Department of Mechanical and Industrial Engineering, Faculty of Applied Science and Engineering, UofT

 

Skills required:

  • Knowledge in mathematical optimization models: discrete and/or continuous, constrained and/or unconstrained.
  • Knowledge in machine learning and hands-on experience with data analysis and machine learning tools.
  • Experience coding in Python, Julia, or another relevant language.
  • Ideally: hands-on experience in mathematical optimization using popular solvers or modelling frameworks.

Primary research location:

  • University of Toronto, St George Campus and/or Remote

Research description:

Processing air quality-based time series data: The SUDS Scholar will help identify associations within the time series of air pollution data collected at sites across Canada. This will include data from SOCAAR's sites, government sites, and inexpensive air quality monitors. One goal will be to resolve changes in emissions due to initiatives to promote decarbonization as part of Canada's climate change plans.

 

Researcher: Greg Evans, Department of Chemical Engineering and Applied Chemistry, Faculty of Applied Science and Engineering, UofT

 

Skills required:

  • Familiarity with analysis of time series data, querying SQL data bases, correlation analysis, and application of other
    statistical techniques.
  • Familiarity with aspects of air pollution and climate change is desirable.

Primary research location:

  • University of Toronto, St George Campus and/or Remote

Research description:

The Global Longitudinal University Enrolment Dataset (GLUED) is a novel dataset on higher education around the world and over time. The dataset represents a significant new effort to capture institutional-level data in order to better understand the rise of private higher education as a global phenomenon. GLUED systematically compiles and/or estimates student enrolment data from roughly 17,000 universities in 194 countries and territories between 1950 and 2020. A major concern with the dataset is that there is missing student enrolment data for many institutions, including both partial missing data (i.e., missing enrolment data in some years for institutions that have data at other time periods), as well as a large number of institutions that we know exist but for which we never have any student enrolment data.

Through this project, you will be asked to train the data to estimate missing values. This will include modeling growth rates of universities enrolments, analyzing which factors are most strongly associated with enrolments and growth, running regression models to determine model fit and estimating missing values.

More information on the dataset: Buckner, E. (2022). The Global Longitudinal University Enrolment Dataset (GLUED). International Higher Education, (112), 9-11. https://ejournals.bc.edu/index.php/ihe/article/view/15729/11549

 

Researcher: Elizabeth Buckner, Department of Leadership, Higher, and Adult Education, Ontario Institute for Studies in Education, UofT

 

Skills required:

  • Experience training data to estimate missing values;
  • Experience with panel datasets (repeated measures over time);
  • Experience calculating growth rates;
  • Experience interpolating missing values;
  • Experience extrapolating values for missing data;
  • Experience with running various regression models and conducting analyses to assess model fit, with the goal of estimating missing values.

Primary research location:

  • OISE at the University of Toronto, St George Campus and/or Remote

Research description:

Bladder cancer is the most expensive cancer to treat per patient due to high recurrence rates and need for lifelong cystoscopic surveillance. One-fifth of patients will also progress to higher stages of disease – leading to worse cancer-specific survival. Current monitoring strategies for recurrence and progression are costly, patient unfriendly, and lack supporting evidence. Accurate and timely prediction of these patient outcomes remains a clinically important but unmet need.
Our group has previously developed NIMBLE - an AI tool to predict the risk of tumour progression in non-muscle invasive bladder cancer patients (https://github.com/JCCKwong/NIMBLE). NIMBLE was developed using a large cohort of European patients.
As part of this project, the SUDS Scholar will:
1. Collect data from the Princess Margaret Cancer Centre for local validation of NIMBLE
2. Assist in updating NIMBLE to predict the risk of tumour recurrence in non-muscle invasive bladder cancer patients
3. Assess for model bias with respect to age group, sex, ethnicity, and socioeconomic status

 

Researcher: Girish Kulkarni, Princess Margaret Cancer Centre, UHN

 

Skills required:

  • Proficiency in Excel, data collection, and basic statistical analysis.
  • Proficiency in navigating the University Health Network electronic health record (EPR and EPIC) is an asset, but not required.
  • Coding experience (Python) is not required. Students will be taught how to run NIMBLE to generate predictions with their data.

Primary research location:

  • Princess Margaret Cancer Centre at University Health Network and/or Remote

Research description:

We are performing a novel gender-inclusive approach focusing on understanding barriers faced by women, Indigenous people, youth, and other underrepresented groups to increase recruitment, improve retention, expand and stabilize the construction and industrial workforces across Canada. This proposal builds on our prior research on workplace factors associated with health professions’ workplace stressors, injuries and retention, and my former collaborative professional practice with injured miners, employers, and unions on workers’ return to work. Our research is developing and implementing strategies to increase worker participation and retention in the construction and industrial workforce, based on gender-, age-, and ethnicity-informed systematic analysis of barriers to recruitment and retention:
1) Emerging Trends and Practices
2) i) Identifying recruitment and retention factors in underrepresented groups in the construction and industry
ii) Evaluating the lived-experiences of apprentices in the construction and industry
3) Workplace Organization Perspective
4) Solutions for Future Training and Education
5) Whole-genome analysis of genetic markers of the participants (apprentices) at the study entry.
6) i) Genome-wide methylation analysis (epigenetics mechanisms) and stress level of employers and apprentices at study entry (cross-sectional)
ii) Analysis of genome-wide methylation changes and stress levels in the employers and employees (longitudinal).

The SUDS scholar will be involved in all steps of this project. They will learn about the submission of a research ethics application for an occupational therapy study, data collection, dealing with large genetics and epigenetics data files, data cleaning, and genetic and epigenetic data analyses. For the last step, the SUDS scholar will learn about whole genome data analysis (files with 2.5 million Single Nucleotide Polymorphisms [SNPs]) as well as whole genome DNA methylation analysis (files with 850,000 epigenetic biomarkers [CpGs]). They will learn to use PLINK that is an open-source whole genome association analysis toolset, designed to perform a range of basic, large-scale analyses in a computationally efficient manner. At the end of this project the SUDS scholar has a good knowledge of working with big data, and how to perform association analysis for the genetic and epigenetic biomarkers with different phenotypes such as burnout, occupational stressors, and diseases. They will also have a greater understanding of how genetics is related to health, well-being and functioning."

 

Researcher: Behdin Nowrouzi-Kia, Department of Occupational Science and Occupational Therapy, Temerty Faculty of Medicine, UofT

 

Skills required:

  • Excellent interpersonal skills
  • Strong computer experience including statistical analyses
  • Outstanding organizational skills
  • Demonstrated ability to maintain confidentiality
  • Ability to be a team-player
  • Experience working in a mental health context
  • Detail-oriented and dependable
  • Flexible individual with initiative and capacity to handle a complexity of tasks simultaneously
  • Interest in health professions

Primary research location:

  • ReSTORE Lab (http://restore.rehab) at the Department of Occupational Science and Occupational Therapy, Temerty Faculty of Medicine, University of Toronto, St George Campus and/or Remote

Research description:

The SUDS Scholar will be involved with developing and evaluating learning activities for introductory university courses in statistics and data science by using and extending mverse—an R package for teaching multiverse analysis.  We will use Multiverse Analysis as a framework for teaching data analysis as an interactive, iterative process of problem-solving where learners gain experience developing feasible choices for converting “raw" data into “analysis" data which in turn gives rise to a multiverse of statistical results (Steegen et al. 2016) that can be examined for robustness.

 

Researcher: Nathan Taback, Department. of Statistical Sciences, Faculty of Arts & Science, UofT

 

Skills required:

  • Intermediate/advanced programming in R/python (experience with package development preferred).
  • Knowledge of intermediate/advanced undergraduate statistical concepts and models such as mathematical/simulation-based probability, mathematical/simulation-based statistics (e.g., significance testing, confidence intervals, applied Bayesian analysis), general linear models.
  • Knowledge of basic statistical/machine learning models.
  • Excellent oral and written communication skills.

Primary research location:

  • University of Toronto, St George Campus and/or Remote

Research description:

What are the characteristics of a satisfying, purposeful, and engaging life? The Population Well-being Lab in the Department of Psychology focuses on studying the determinants, consequences, and policy relevance of subjective well-being. We are interested in using data sciences to uncover individual- and country-level factors that promote a 'good' life.
The SUDS Scholar will have the opportunity to work with the Gallup World Poll - a global survey with over 2 million participants from over 170 countries - to systematically identify the population-level factors that can best predict improvements in population-level well-being. The project will involve looking at a detailed set of population characteristics, including but not limited to technological (e.g., the use of robotics in the workplace), labor (e.g., policy-mandated amount of work hours), environmental (e.g., pollution), social (e.g., the level of residential segregation by race and ethnicity), and political factors (e.g., freedom of press). A machine learning approach will then be applied to identify a subset of national indicators that can best predict population well-being.

 

Researcher: Felix Cheung, Department of Psychology, Faculty of Arts & Science, UofT

 

Skills required:

  • Strong multidisciplinary background, experiences in quantitative methods.
  • Familiarity with a statistical programming language will be preferred.

Primary research location:

  • Sidney Smith Hall SS600J at the University of Toronto, St George Campus and/or Remote

Research description:

Organisms are comprised of thousands of traits, but if the same genes affect many of those traits, they cannot evolve independently of each other. This can lead to slow and constrained evolutionary responses, and potential extinction of populations, which is a growing concern with climate change.  Quantifying the extent and magnitude of genetic correlation, not just between pairs of traits, but between many traits simultaneously (multivariate studies) is necessary to answer critical questions in medicine, agriculture, and evolutionary genetics.  For example, how many side effects will be caused by editing out an unfavorable gene in a GMO, how pervasive are disease comorbidities, or how likely is this trait to evolve if the climate changes? In this project, the SUDS Scholar will analyze quantitative genetic data to answer questions about the evolution of genetic variation in and selection acting on high-dimensional traits. They will use multivariate linear mixed effects models to analyze these data. Students may also have the opportunity to participate in the empirical collection of these data, if they choose to.

 

Researcher: Jacqueline Sztepanacz, Department of Ecology and Evolutionary Biology, Faculty of Arts and Science, UofT

 

Skills required:

  • Proficiency in R, background in statistics, experience using MCMCglmm, lmer, or other R packages to fit mixed effect models.
  • Background in genetics and/or computer science would be an asset.
  • High attention to detail, ability to work as part of a team.

Primary research location:

  • EEB Department at the University of Toronto St. George Campus and/or Remote

Research description:

Patient-specific treatment is a strategy that optimizes clinical efficacy by tailoring the diagnosis and treatment methods based on the specific conditions of patients. For machine learning (ML)-based diagnosis and treatment, patient-specific treatment typically requires data collection for training the machine learning models. However, long-term data collection in closely monitored environments, such as epilepsy monitoring units (EMUs), often comes at high costs and is inconvenient for the patients. There is a compelling need for data augmentation techniques. Generative adversarial network (GAN) is an emerging technique that generates new datasets with saint statistics as the training set. GAN employs two ML models, a generator and a discriminator, to contest with each other until the generator can “fool” the discriminator. Combined with transfer learning from gained knowledge, GAN can generate data segments from limited recordings, such as epileptic patients’ ictal segments, which occur at very low rates. The augmented dataset can then be used for training ML models for detecting epileptic onsets. In this project, the goal is to test this hypothesis using a subset of a prerecorded, expert-labeled large database. A high-performance discriminator trained on the complete database will be used for evaluating the performance of the transferring GAN.
The SUDS Scholar will work on developing GAN networks using PyTorch or Tensorflow. The developed GAN networks will augment medical data pre-recorded from patients with insomnia or epilepsy, which have severely imbalanced class distribution and limited data. The synthesized data will be benchmarked with authentic data and used to improve the performance of classifiers for neurological disorder detection. The SUDS scholar will collaborate with other team members but complete the assigned tasks independently. The SUDS Scholar will attend group meetings and will have 1:1 meetings periodically with the supervisor. Paper submission to a peer-review conference or journal is expected upon the completion of the project. No background knowledge is required to conduct the project.

 

Researcher: Xilin Liu, Edward S. Rogers Sr. Department of Electrical and Computer Engineering, Faculty of Applied Science and Engineering, UofT

 

Skills required:

  • Machine learning, Python, generative adversarial network, transfer learning.

Primary research location:

  • University of Toronto, St George Campus and/or Remote

Research description:

Understanding the formation and evolution of our own Milky Way Galaxy requires being able to "wind back the clock" for many of the stars we see all across the night sky to find out when they were born and where they originally came from. Unfortunately, most stars don't come with labels attached that tell us these things! While astronomers have developed approaches to try and estimate the origins and properties of stars under various conditions, many of these methods are either too slow or too imprecise to keep up with the data we have today (over a billion stars and counting!).
The SUDS scholar will help build a state-of-the-art probabilistic machine learning model to try and estimate stellar properties such as ages. They will train this model using a combination of (1) simulated data of stellar brightness and colours from theoretical stellar evolutionary models and (2) real data collected from telescopes on the ground and in space (e.g., SDSS, Gaia) in collaboration with the SUDS supervisor. Altogether, this project provides a unique opportunity to learn about astrophysics, statistics, and machine learning while tackling a novel problem with large-scale potential applications.

 

Researcher: Joshua Speagle, Department of Statistical Sciences, Faculty of Arts & Science, UofT

 

Skills required:

  • No prior background in astronomy is required.
  • Some background in statistics is preferred but not required.
  • Past experience with machine learning, particularly some exposure to neural networks, is also preferred but not required.

Primary research location:

  • University of Toronto, St George Campus and/or Remote

Research description:

We are leveraging human stem cell models combined with single cell "omics" to understand lung development, disease and regeneration. The SUDS Scholar will have the opportunity to bioinformatically determine the characteristics of elite cells that out-compete in a population to drive cell fate decisions and population characteristics. To do this, stem cells are genetically tagged through new DNA barcoding methods, for cellular lineage tracing as the stem cells differentiation into multiple cell types in culture. The goal is to unmask the genes/GRN regulating fitness of cells to better understand how cells emerge during development.

 

Researcher: Amy Wong, Developmental & Stem Cell Biology Labs, SKH

 

Skills required:

  • Computational skills (Python, R, Linux)
  • The SUDS Scholar will get to use their computational skills to analyze large datasets to help identify new cells/genotype/phenotypes and GRN regulating human-specific developmental or disease mechanisms.

Primary research location: 

  • SKH, On Site

Research description:

My lab uses a variety of imaging techniques to examine the activity of neurons across the brains of mice while they are learning and remembering. We then analyse the data using a variety of techniques to explore patterns and try to identify key patterns that correlate with memory acquisition and retrieval. We believe this is a first step in understanding how the brain acquires stores and uses information. The SUDS Scholar will help acquire and analyze all the data for this project.

 

Researcher: Sheena Josselyn, Neurosciences & Mental Health Labs, SKH

 

Skills required:

  • Broad skills in math and data analysis approaches.
  • A broad interest in the brain.
  • We will teach students everything else.

Primary research location:

  • Hospital for Sick Children (PGCRL tower), On Site

Research description:

We are exploring applications of student data analytics to understand how undergraduate student experiences are connected to learning trajectories while at university and career pathways after graduation. Part of this research involves identification of student personas based on clustering of curricular, co-curricular and internship data.
The SUDS Scholar will apply different approaches to mining student data in order to resolve clusters. Natural language processing techniques will be used to support processing of qualitative data collected through student surveys.

 

Researcher: Greg Evans, Institute for Studies in Trans-Disciplinary Engineering Education and Practice, Faculty of Applied Science and Engineering, UofT

 

Skills required:

  • Experience with cluster analysis techniques and/or natural language processing

Primary research location:

  • University of Toronto, St George Campus and/or Remote

Research description:

Many adults over 60 years live with some form of hearing impairment that makes speech comprehension difficulty. However, progress in predicting speech-comprehension difficulties in everyday life has been limited, in part, because hearing-science research has mainly focused on speech comprehension of short, disconnected sentences that lack a topical thread and are not relevant to the listener. New approaches to understanding naturalistic speech listening are thus critical to gaining insight into impaired speech processing.
The SUDS Scholar will be involved in research that leverages novel natural language processing (NLP) approaches (e.g., sentence embeddings) with graph theoretic approaches (e.g., network centrality) to better understand how individuals listen to naturalistic speech. The student will analyze transcripts of spoken stories and transcripts of individuals recalling these stories after listening to them (using NLP), and will integrate relevant information from these analyses to capture the structure in which individuals comprehend speech (using graph theory). The SUDS Scholar will program the analyses and visualize the results using Python/MATLAB. Theyt will work with the supervisor and a graduate student with biophysics and psychology background. The lab provides ample opportunities to learn how sophisticated data-analysis tools can be used to facilitate research in basic science with clinical applicability.

 

Researcher: Bjorn Herrmann, Baycrest

 

Skills required:

  • Required: Advanced computer programming skills (Python or MATLAB); effective oral and written communication skills; inter-cultural competence; ability to work independently and within a team

  • Beneficial: background in artificial intelligence; experience with natural language processing; knowledge of graph theory; interest in auditory research

Primary research location:

  • Rotman Research Institute at Baycrest Health Sciences, On Site

For more information

SUDS.dsi@utoronto.ca

SUDS Info Session Slides are available now.

News

A summer of learning, fun and community for 2022 DSI SUDS Scholars.

Read the full story.

Students may also be interested in the Urban Data Science Corps Summer Internships offered by the School of Cities.

Learn more