Data Sciences Institute (DSI)

AI used to ‘democratize’ how we predict the weather

Photo: James Requeima received post-doctoral funding for his work with the Aardvark Weather project from the Data Sciences Institute

Original story & photo courtesy of Diane Peters for U of T News

Weather prediction systems provide critical information about dangerous storms, deadly heatwaves and potential droughts, among other climate emergencies.  

But they’re not always accurate. And, ironically, the supercomputers that generate forecasts are also energy-intensive, contributing to greenhouse gas emissions while predicting increasingly erratic weather caused by climate change.  

“The process right now is very computationally expensive,” says James Requeima, a post-doctoral researcher in computer science at the University of Toronto and the Vector Institute.

Enter Aardvark Weather, a weather prediction model developed by Requeima and other researchers using artificial intelligence (AI). Described in a recent Nature article, the system produces results comparable to traditional methods, but is 10 times faster, uses a tiny fraction of the data and consumes 1,000 times less computing power.  

In fact, the model can be run on a regular computer or laptop. It’s also open-source and easily customizable, allowing small organizations, developing countries or people in remote regions to input the data they have and generate local forecasts on a minimal budget. 

The development could be a timely one. As Texas continues to deal with the fallout from catastrophic floodsManitoba grapples with its most destructive wildfire season in 30 years and Europe reels from deadly heatwaves, there’s a clear need for accessible and accurate weather forecasting around the world.

“You hear a lot about the promise of AI to help people and hopefully make humanity better,” Requeima says. “We’re hoping to enact some of that promise with these weather prediction models.” 

Aardvark Weather is being developed at Cambridge University — where Requeima completed his PhD in engineering and machine learning — and the Alan Turing Institute. Requeima joined the project in 2023. He received post-doctoral funding for the project last year from U of T’s Data Science Institute, an institutional strategic initiative.  

U of T News recently spoke to Requeima about the project and his role. 

How is weather currently predicted? 

The big weather forecasters, such as the U.S. National Weather Service and the European Centre for Medium-Range Weather Forecasts, take initial conditions representing the current state of the atmosphere and put that information into a supercomputer. They then run a numerical simulation and propagate that forward into the future to get forecasts of the future states of the atmosphere.  

Then they take observations from real-world sensing instruments and incorporate them into their current belief about the atmosphere and re-run the forecast. There’s a constant iterative loop. From these atmospheric predictions, you can build a tornado forecaster or a precipitation forecaster. 

How can AI do better and with less computing power? 

End-to-end deep learning fundamentally changes how we approach weather prediction. Rather than the traditional, iterative process that relies on expensive numerical simulations, we train our model to map directly from sensor inputs to the weather variables we care about. We feed in raw observational data — from satellites, ships and weather stations — and the model learns to predict precipitation, atmospheric pressure, and other conditions directly. While training the initial model requires computational resources, once trained, it’s remarkably efficient. The resulting system is lightweight enough to run on a laptop, making predictions orders of magnitude faster and more accessible than traditional supercomputer-based methods.

This means communities can deploy these models locally to generate their own forecasts for the specific weather patterns that matter to them.

Have others used AI for weather prediction? 

Machine learning has been applied to climate modelling before, but previous approaches still depended on numerical simulations as their input. Our key breakthrough is demonstrating that you can move out of this paradigm and map directly from observation to targets. This proof of concept opens up a fundamentally new approach to forecasting — we’ve demonstrated that accurate weather prediction doesn’t require supercomputer simulations as an intermediate step.

How can this technology be used in practice? 

We are open sourcing this model — making it available to the community so others will improve upon our model to make changes and train it to do local modelling. We’re hoping this will help democratize weather prediction.  

Forecasting quality is correlated with wealth, so developing nations don’t have access to as good forecasting as wealthier nations do. If we can help bring high-quality forecasting to areas that don’t have it before, that’s a really big positive of this work.  

David [Duvenaud, an associate professor of computer science in U of T’s Faculty of Arts & Science] — my adviser — and I want to use AI in positive ways. Climate prediction is an important tool for assessing and developing ways of dealing with climate change — and the better climate models we have, the better our science can be around tackling that problem. That’s a driving motivation for me. 

What was your contribution to this work? 

During my PhD, I worked on neural processes — a type of neural network model that is effective for numerical forecasting. We discovered it was well-suited for scientific applications, especially climate modelling. For Aardvark, I helped design the model architecture and the multi-stage training scheme. 

Where did the name Aardvark Weather come from?  

The first author on this research, Anna Allen from Cambridge, did a lot of the heavy lifting on this — which is going out and finding the data sources, including a lot of Canadian data from weather stations, weather balloons and ship observations. She’s from Australia and is a lover of interesting animals like sloths — and aardvarks.  

DSI Welcomes Undergraduate Scholars to the Summer Undergraduate Data Science Program

by Sara Elhawash

The Data Sciences Institute (DSI) is thrilled to welcome 38 undergraduate students from across Canada for an immersive data sciences research experience through the Summer Undergraduate Data Science (SUDS) research opportunity. 

SUDS Opportunities offer students the experience to apply data science methodologies and tools across diverse disciplines. Projects range from research on the robustness for machine learning models, to commercial determinants of health in online gaming, mass extinctions, nocturnal behaviour, sleep modeling, neural systems, and searching for stellar streams in the milky way. SUDS Scholars are supervised by DSI member researchers across U of T and external funding partners. Alongside their research projects, SUDS Scholars have access to a comprehensive suite of data science skills, networking events, and professional development opportunities. 

The programming for SUDS Scholars commences the week of May 6 with the DSI Data Science Bootcamp, where Scholars will gain proficiency in data science skills including Unix Shell, R, Python, and machine learning. 

Advika Gudi, a SUDS Scholar from U of T , expressed her excitement for the program, stating, “I’m looking forward to using data to tell a story – by identifying trends and measuring the impact of policies, to ultimately improve policy outcomes on society is incredibly motivating for me.”   

Rachel Way, another SUDS Scholar, will be working with Professor Spike Lee, Rotman School of Management on the project Automated Text Analysis of Fake News and Biased News. This project aims to analyze approximately 7 million news articles from around 500 media outlets to discern differences between fake news and real news in terms of moral themes, cognitive styles, antiscience attitudes, emotional valence, and other psychological characteristics. 

“The SUDS Scholar will apply automated text analysis and machine learning techniques to these articles in order to identify linguistic patterns and biases depending on how fake or real and how left-leaning or right-leaning the media outlet is,” says Professor Lee. 

“I have a passion for social data science. I am fascinated by the ability to use data to answer pressing questions in the social sciences,” says Way. 

The DSI’s cohort programming includes the Data Science@Work Series, where representatives from the private sector and government organizations share data science applications in the workplace. The program culminates in August with the DSI Showcase, during which SUDS Scholars present their research findings. 

“Our SUDS Scholars benefit from acquiring data science expertise and professional growth opportunities. We are enthusiastic about the prospect of inspiring these students and, hopefully, launching their careers in data science,” says Professor Laura Rosella, DSI Associate Director of Education and Training.   

SUDS offers students a valuable pathway to engage in high-quality and enriching data science learning, serving as stepping stone for students aspiring to build careers in data science. 

See the full list of 2024 Scholars, Supervisors and research opportunities here 

DSI and TISS partner to seed research to advance wearable health technology

by Sara Elhawash

Wearable devices have long been praised as the future of healthcare, fitness tracking, and athletics. However, the reliability of these devices has been hampered by the presence of motion artifacts, which greatly diminish the quality of data collected. 

The Data Sciences Institute (DSI) and the Tanenbaum Institute for Science in Sport (TISS) at the University of Toronto are pleased to award a Catalyst Grant for research in the Development of Convolutional Neural Network for Motion Artifact Mitigation in Wearable PPG Devices. Co-Led by Professors Daniel Franklin (Institute of Biomedical Engineering, Temerity Faculty of Medicine, University of Toronto) and Chris McIntosh (Department of Medical Biophysics, Temerity Faculty of Medicine and University Health Network), the project aims to revolutionize athletics and sports medicine by integrating novel sensors with advanced machine learning algorithms. 

The researchers propose a novel approach to overcome motion artifacts in wearable devices by enabling real-time motion artifact cancellation in optical wearables. This includes the development of a multimodal sensor coupled with deep learning models. The sensor will combine force and multiwavelength optical measurements to capture relative motion at the sensor interface, addressing a critical limitation of current wearable devices. 

“Conventional wearable devices capture global motion, but our approach focuses on capturing relative motion at the sensor-skin interface, which is crucial for accurate data interpretation. Wearable technologies offer a unique glimpse into patient function and biology outside of episodes of care. If AI is the present, wearables with AI are the next frontier,” says Professor Daniel Franklin. 
 
The research project will progress through several phases, including controlled lab experiments and real-world examples of motion. By collecting a novel multi-modal motion artifact dataset, the team aims to develop a robust algorithm for real-time optical motion artifact cancellation. 

“We are thrilled to partner with the DSI to award this seed grant. This project has the potential to significantly advance wearable health monitoring technologies for applications to the healthcare, fitness, and sports sectors,” says Dr. Ira Jacobs, Director of the Tanenbaum Institute for Science in Sport. 

The implications of this research extend beyond healthcare into consumer health, sports, and athletics. By enhancing the usability and interpretability of wearable device datasets, the project promises to advance remote health management and athletic performance tracking. 

“We envision a future where wearable devices provide more accurate and actionable insights, leading to improved patient care and athletic performance,” added Professor McIntosh. 

The DSI and TISS partner to co-sponsor Catalyst Grants focused on innovative and novel data science in sport and sport analytics.  

DSI and T-CAIREM co-fund two Catalyst Grants that are breaking bias in medical research and supporting children with complex communication needs

by Sara Elhawash

The Data Sciences Institute (DSI) and the Temerty Centre for AI Research and Education in Medicine (T-CAIREM) at the University of Toronto join efforts for the second consecutive year to co-fund two 2024 Catalyst Grant Awards focused on innovative and novel data science methodologies in medicine and health. 

Each catalyst grant provides up to $100,000 in seed funding for multidisciplinary researchers forming Collaborative Research Teams (CRTs) that are developing novel statistical or computational tools that address important societal needs. 

“These jointly funded catalyst grants are directed at highly innovative initiatives that have the potential to transform healthcare with data science, and this year’s winners are no exception,” says Muhammad Mamdani, executive director T-CAIREM. “It’s rewarding to see initiatives that not only focus on specific segments of our population to improve their quality of life but also those that have far reaching implications for society at large.” 

Examining biases due to confounders and colliders in observational health data using individual-based simulation models

Sharmistha Mishra (St. Michael’s Hospital, Unity Health Toronto and Rafal Kustra (Dalla Lana School of Public Health, University of Toronto)  

In the realm of medical research, observational studies leveraging large health-administrative datasets are crucial. However, bias in the data, including residual confounding and collider bias, can produce misleading results, potentially skewing policy decisions, resource allocation, and clinical management.  

This research aims to enhance public health outcomes during infectious disease outbreaks by employing simulation modeling combined with causal inference and statistical learning methods to identify and address different types of biases that could undermine inference drawn studies of health using observational data. 

Specifically, the researchers plan to generate synthetic datasets using simulation models that replicate the complex dynamics of the 2022 Mpox outbreak in Toronto, in collaboration with clinicians, public health teams, and community-based organizations. They intend to use statistical learning methods to predict how big the problem of residual confounding and collider biases could get when inferring risk factors and the effectiveness of interventions during an evolving outbreak. They will then pilot-test analytic approaches to reduce these biases. 

“This work has the potential for applicability across health conditions by helping to improve validity in estimating risks and intervention impact,” says Professor Mishra. 

Decoding unintelligible speech: a conversational context-aware assistive technology for children with complex communication needs (CCN) 

Tom Chau (Holland Bloorview Kids Rehabilitation Hospital) and Monika Molnar (Temerty Faculty of Medicine, Department of Speech-Language Pathology, University of Toronto)  

Children with CCN often prefer to vocalize, but their sounds are typically unintelligible to those unfamiliar with them. They are often excluded from fully participating in education, society, and eventual employment. 

This Catalyst Grant proposal is dedicated to helping children with complex communication needs (CCN), potentially leading to the development of assistive devices. 

The research team plan to utilize machine learning to decode the unintelligible speech of these children using an existing audio-video dataset of speech samples. This project could pave the way for the development of artificial intelligence-driven electronic devices tailored for children with CCN. 

“There are currently no assistive technologies that can accurately decode their speech sounds,” explains Professor Chau. “As a result, children with CCN remain excluded from full participation in education, society, and eventual employment.” 

The researchers hope this project will accelerate the impact of data sciences in the fields of rehabilitation and biomedical engineering, driving positive social change for children with CCN. 

The DSI’s Catalyst Grants, co-funded by T-CAIREM, play a crucial role in supporting these research projects by providing the essential seed funding and fostering the collaboration among research teams needed to realize this impactful work and apply for external funding in the future. 

Data Sciences Institute announces the 2024 Catalyst Grants recipients 

by Sara Elhawash

The Data Sciences Institute (DSI) is pleased to announce the 2024 recipients of the annual DSI Catalyst Grant competition. Fourteen interdisciplinary teams across all three campuses and external funding partners received grants, for research that focusses on harnessing the transformative nature of data sciences.  

Catalyst Grants are awarded to teams working on the development of novel statistical or computational tools, as well as the use of existing methodology in innovative ways to address questions of major societal importance and effect positive social change. The intent is for the grants to serve as seed funds that bring cross-institutional multidisciplinary teams together including data science leadership, to innovate in traditional disciplines and position the teams for external competitive research funds.  

“The 2024 DSI Catalyst Grant recipients exemplify our commitment to multidisciplinary collaboration, uniting researchers to tackle pressing societal issues. This year’s projects promise innovative solutions and showcase the collective expertise driving positive change,” says Gary Bader, Associate Director, Data Sciences Institute. 

The DSI-funded research spans disciplinary areas and includes collaborations tackling critical issues in urban road safety and forest management amidst escalating wildfire risks (see full list below). This year, several Catalyst Grants are co-funded by the Temerty Centre for AI Research and Education in Medicine (T-CAIREM) with a focus on innovative and novel data science methodologies in medicine and health and the Tanenbaum Institute for Science in Sport (TISS) on innovative and novel data science in sport and sport analytics. 
 
Eye on the street: Using computer vision to capture the determinants of road safety 

In urban road safety research, comprehensive datasets detailing road network modifications are essential for evaluating intervention effectiveness and informing evidence-based policy decisions. 

With their project, Professors Brice Batomen (Dalla Lana School of Public Health) and Marianne Hatzopoulou (Department of Civil and Mineral Engineering, Faculty of Applied Science and Engineering) aim to address the critical public health issue posed by traffic collisions, which are a leading cause of premature death. 

They begin by compiling detailed information on road modifications in Canadian cities, starting with Toronto and Montreal, with the ultimate goal of promoting safer urban environments. 

“This research aims to impact public and environmental health by analyzing the effectiveness of road safety interventions,” says Batomen. By employing advanced causal inference methods and creating comprehensive datasets, the project aims to inform policy-making, reduce traffic-related fatalities and injuries, and foster safer, more equitable urban environments. 

“Through interdisciplinary collaboration facilitated by the DSI, the project brings together epidemiologists, computer scientists, and transportation engineers, laying the groundwork for impactful research with broader implications,” says Batomen. 

Effect of forest management on insurable wildfire risk in Northern Ontario 

Amidst the escalating frequency and severity of wildfires, particularly impacting regions like Northern Ontario, the intersection of forest management and wildfire risk assessment emerges as a critical focal point for research and policy intervention. 

Professors Rasoul Yousefpour (John H. Daniels Faculty of Architecture, Landscape, and Design) and Silvana Pesenti (Department of Statistical Sciences, Faculty of Arts and Science) flag that “Forest fires are occurring at an alarming rate, posing a significant challenge to the insurability of affected landscapes in Ontario, including indigenous communities.”  

Their research endeavors to unravel the intricate connections between forest management practices and wildfire risk assessment, essential for informing policy decisions and fostering equitable wildfire insurance mechanisms. “The DSI Catalyst grant represents precisely the resource required to pioneer innovative big data-driven technologies and models aimed at unraveling the impact of forest management on the insurability of forest fires in Ontario,” say Yousefpour and Pesenti. 

Over the two-year funding period, the grant will empower the recruitment of graduate students who will collaboratively establish connections between forest and fire data using advanced data science methodologies.  

“The findings of this research will not only inform forest management best practices but also raise awareness and contribute to the establishment of equitable wildfire insurance mechanisms for all citizens, including First Nations communities,” says Yousefpour and Pesenti.  

They envision the integration of cutting-edge technology to disseminate across both fields of study, providing guidance for future research and policy analysis in fire-prone forest landscapes.  

Congratulations to all the 2024 DSI Catalyst Grant collaborative research teams!  

Coronavirus in the Urban Built Environment (CUBE) 

  • Michael Fralick (Department of Medicine, Temerty Faculty of Medicine, University of Toronto) David Guttman (Department of Cell and Systems Biology, Faculty of Arts and Science, University of Toronto) 

Decoding unintelligible speech: a conversational context-aware assistive technology for children with complex communication needs 

  • Project co-funded by T-CAIREM  
  • Tom Chau (Holland Bloorview Kids Rehabilitation Hospital) and Monika Molnar (Department of Speech-Language Pathology, Temerty Faculty of Medicine, University of Toronto) 

Developing Algorithms & Statistical Analysis Techniques for Adaptive Experimentation 

  • Joseph Williams (Department of Computer Science, Faculty of Arts and Science, University of Toronto), Felix Cheung (Department of Psychology, Faculty of Arts and Science, University of Toronto), Anna Heath (The Hospital for Sick Children) and Michael Liut (Department of Mathematical and Computational Sciences, University of Toronto Mississauga) 

Development of Convolutional Neural Network for Motion Artifact Mitigation in Wearable PPG Devices 

  • Project co-funded by Tanenbaum Institute for Science in Sport (TISS)  
  • Daniel Franklin (Institute of Biomedical Engineering, Faculty of Applied Science and Engineering, University of Toronto) and Chris McIntosh (University Health Network, Toronto General Hospital Research Institute) 

Effect of forest management on insurable wildfire risk in Northern Ontario 

  • Silvana Pesenti (Department of Statistical Sciences, Faculty of Arts and Science, University of Toronto) and Rasoul Yousefpour (John H. Daniels Faculty of Architecture, Landscape, and Design, University of Toronto) 

Enhancing the Reliability of Large Language Models for Structured Data Extraction in Chemical Sciences 

  • Seyed Mohamad Moosavi (Department of Chemical Engineering and Applied Chemistry, Faculty of Applied Science and Engineering, University of Toronto) and David Sinton (Department of Mechanical and Industrial Engineering, Faculty of Applied Science and Engineering, University of Toronto)

Examining biases due to confounders and colliders in observational health data using individual-based simulation models 

  • Project co-funded by T-CAIREM  
  • Sharmistha Mishra (St. Michael’s Hospital, Unity Health Toronto and Rafal Kustra (Dalla Lana School of Public Health, University of Toronto) 

Eye on the street: Using computer vision to capture the determinants of road safety 

  • Brice Batomen Kuimi (Dalla Lana School of Public Health, University of Toronto) and Marianne Hatzopoulou (Department of Civil and Mineral Engineering, Faculty of Applied Science and Engineering, University of Toronto)  

Interpretable and fair machine learning for equitable assessment of patient safety in hospitals 

  • Eldan Cohen (Department of Mechanical and Industrial Engineering, Faculty of Applied Science and Engineering, University of Toronto), Sheila McIlraith (Department of Computer Science, Faculty of Arts and Science, University of Toronto), Amol Verma (St. Michael’s Hospital, Unity Health Toronto) and Fahad Razak (St. Michael’s Hospital, Unity Health Toronto) 

Investigating the biological function of the m6A epitranscriptome using Oxford Nanopore direct RNA sequencing 

  • Ina Anreiter (Department of Biological Sciences, University of Toronto Scarborough) and Jared Simpson (Ontario Institute for Cancer Research) 

Scaling up highly multiplexed imaging with compressed sensing 

  • Kieran Campbell (Lunenfeld-Tanenbaum Research Institute) and Hartland Jackson (Lunenfeld-Tanenbaum Research Institute) 

Toolkit for Improved Climate Hazard and Risk Assessment in Ontario 

  • Robert Soden (Department of Computer Science, Faculty of Arts and Science, University of Toronto) and Paul Kushner (Department of Computer Science, Faculty of Arts and Science, University of Toronto). 

Using generative models to “fix” missing structures and artifacts in MRI images 

  • Evdokia Anagnostou (Holland Bloorview Kids Rehabilitation Hospital) and David Duvenaud (Department of Computer Science, Faculty of Arts and Science, University of Toronto)