Data Sciences Institute (DSI)

Data Sciences Institute Catalyst Grants support transformative data science research

by Chris Sasaki

The Data Sciences Institute (DSI) is pleased to announce the 2023 recipients of the annual DSI Catalyst Grant competition.

Catalyst Grants are awarded to multidisciplinary collaborative research teams focused on harnessing the transformative nature of the data sciences. The grants are given to teams working on the development of novel statistical or computational tools or the use of existing methodology in innovative ways to address questions of major societal importance and effect positive social change. Two of this year’s Catalyst Grants are co-funded by the Temerty Centre for AI Research and Education in Medicine at the University (T-CAIREM). T-CAIREM seeks to establish new AI tools and data-driven projects that integrate clinical, translational and basic research into real-world applications.

“The recipients of 2023 grants are an illustration of the Institute’s mission to bring researchers together from across disciplines, divisions, campuses, as well as from the greater community, to address today’s critical challenges,” says Gary Bader, the Institute’s Associate Director, Research & Software. “It’s a truly remarkable list of impactful projects and an inspiring group of people.”

Thirteen interdisciplinary teams spanning all three campuses and external funding partners received grants (full list below), including three collaborations tackling diverse problems regarding harmful social media content, machine learning in healthcare and how marketing affects children’s health:

Reducing harmful content on social media through community-powered AI

Misinformation and hateful social media content has emerged as a critical threat, with several damaging impacts on our health, environment and society. While most social media platforms have taken measures to moderate and identify harmful content, and limit its spread, human moderators and AI algorithms often fail to identify it correctly and take proper actions. Historically marginalized groups are most affected by these failings as they have fewer representations among the human moderators, and their data are less available for the algorithms.

With their project, Syed Ishtiaque Ahmed from the Faculty of Arts & Science’s Department of Computer Science, Shion Guha from U of T’s Faculty of Information, and Shohini Bhattasali from University of Toronto Scarborough’s Department of Language Studies aim to improve the content moderation process by involving the communities affected by harmful or hateful content; and through a more pluralistic, contestability framework that allows multiple perspectives.

Ahmed, Guha, and Bhattasali will design, develop, deploy and evaluate the proposed system to address potentially Islamophobic and Sinophobic posts on Twitter in support of two Canadian non-profit organizations: the Chinese Canadian National Council for Social Justice (CCNC-SJ) and the Islam Unravelled Anti-Racism Initiative.

“Annotating data becomes challenging when the annotators are divided in their opinions” says Ahmed. “Democratically resolving this issue requires representing diverse values through the participation of different communities, which is currently absent in data science practices.

“This project addresses the issue by designing, developing and evaluating a pluralistic framework of justification and contestation in data science while working with two historically marginalized communities in Toronto.”

 

Tackling the labelling bottleneck in machine learning in healthcare

In many domains, labelling of objects — a fundamental step in machine learning — can be done by non-experts; for example, labelling can be as simple as drawing a box around a cyclist in an image.

“But in machine learning for healthcare (ML4HC) applications, the labeller needs to be a clinical domain expert,” says Sebastian Goodfellow, from the Department of Civil & Mineral Engineering, Faculty of Applied Science & Engineering.

“The time of clinician domain experts is scarce, and labelling becomes the rate-limiting step for ML4HC projects, preventing evaluation of the utility of machine learning in many medical applications. How to address this bottleneck is a knowledge gap in healthcare that leads to a translation gap, which our team will address.”

The goal of this project is to develop and evaluate two possible solutions to better understand this bottleneck. The first is a methodology for crowdsourcing labels from non-experts. The second is a novel framework for labelling medical waveform data by using a “human-in-the-loop,” semi-supervised learning pipeline and an interactive visualization approach.

The project is co-funded by T-CAIREM and the team comprises Goodfellow; and from the Hospital for Sick Children, Mjaye Mazwi, Translational Medicine Labs; Anica Bulic, Translational Medicine Labs; and Melissa McCradden, Genetics & Genome Biology Labs.

Says Goodfellow, “Our team is grateful to the DSI for this funding which will enable us to address an important challenge that affects many industries beyond healthcare. We’re excited to get started!”

 

Using deep learning and image recognition to measure child-directed food marketing

Childhood obesity and nutrition-related chronic disease are urgent global public health concerns. One of the factors contributing to the problem is that highly processed, energy-dense and nutrient-poor food is being marketed to children through tactics such as cartoon characters, toys and other fun enticements — all of which affect children’s attitudes, preferences and consumption behaviours.

But measuring child-directed marketing is time- and labour-intensive, requires in-depth training and validation, and is often subjective. As a result, there is a paucity of data with which to guide national and global legislation and policies aimed at protecting children from harmful industry practices.

The goal of this project is to develop new systems to capture food labels; develop methodologies using image recognition and deep learning technology to measure indicators of child-directed marketing on food and beverage packaging; and evaluate the relationship between child-directed marketing on food packaging, nutritional quality and price. These methodologies will enable evaluation of how child-directed food marketing may be perpetuating existing dietary and health inequities and whether policies are in fact reducing these disparities and protecting children’s right to health.

The team comprises experts from across four U of T academic divisions: Mary R. L’Abbé, Department of Nutritional Sciences, Temerty Faculty of Medicine; David Soberman, Joseph L. Rotman School of Management; Laura Rosella, Dalla Lana School of Public Health; and Steve Mann, Edward S. Rogers Sr. Department of Electrical & Computer Engineering, Faculty of Applied Science & Engineering.

“This project will help guide the development, implementation and evaluation of national and global food policies aimed at protecting children from harmful food industry marketing practices,” says L’Abbé. “Parliament is in the process of finalizing Bill C-252 to restrict the marketing of unhealthy foods to children and this grant can have a huge policy impact as part of our program on Food and Nutrition Policy for Population Health.” 

 

Launched in 2021, the DSI is the University of Toronto’s hub and incubator for data science research, training and partnerships, unifying research across the University, its affiliated institutes and external partners. 

Congratulations to all the 2023 DSI Catalyst Grant collaborative research teams!

A computational sociolinguistic approach for studying gender inequities in social media interactions

  • Suzanne Stevenson (Department of Computer Science, Faculty of Arts & Science, U of T); Barend Beekhuizen (Department of Language Studies, University of Toronto Mississauga)

A high-throughput data and AI-driven mRNA transfection (HART) platform for immune cell engineering

  • Bowen Li (Leslie Dan Faculty of Pharmacy, U of T); Bo Wang (Department of Laboratory Medicine and Pathobiology Temerty Faculty of Medicine, U of T)

Accelerating machine learning in healthcare: Solving the labelling bottleneck

  • Project co-funded by T-CAIREM
  • Sebastian Goodfellow (Department of Civil and Mineral Engineering, Faculty of Applied Science Engineering, U of T); Mjaye Mazwi (Translational Medicine Labs, The Hospital for Sick Children); Anica Bulic (Translational Medicine Labs, The Hospital for Sick Children); Melissa McCradden (Genetics & Genome Biology Labs, The Hospital for Sick Children)

Automating sedation state assessments

  • Project co-funded by T-CAIREM
  • Aaron Conway (Lawrence S. Bloomberg Faculty of Nursing, U of T); Babak Taati, Toronto Rehabilitation Institute, KITE, University Health Network); Sebastian Mafeld (Toronto General Hospital Research Institute, University Health Network)

DynaMELD and DynaCOMP: Using machine learning to revamp pre-and-post transplant care

  • Rahul Krishnan (Department of Computer Science, Faculty of Arts & Science, U o f T); Mamatha Bhat (Toronto General Hospital Research Institute, University Health Network)

Inequality in childcare: The case of nannies in Canada

  • Ito Peng (Department of Sociology, Faculty of Arts & Science, U of T); Monica Alexander (Department of Statistical Sciences, Faculty of Arts &Science, U of T)

Machine-learning-assisted screening of metallo-cyanines as light-absorbing and transport layers for organic and perovskite photovoltaics

  • Oleksandr Voznyy (Department of Physical & Environmental Sciences, University of Toronto Scarborough); Timothy Bender (Department of Chemical Engineering and Applied Chemistry, Faculty of Applied Science & Engineering)

Providing data to improve representation by public officials

  • Peter Loewen (Munk School of Global Affairs & Public Policy, Faculty of Arts & Science, U of T); Rohan Alexander (Faculty of Information, U of T); Aya Mitani (Dalla Lana School of Public Health, U of T); Elena Tuzhilina (Department of Statistical Sciences, Faculty of Arts & Science)

Something in the air: Is there an association between exposure to unconventional natural gas development (UNGD) and exacerbations of asthma in northeastern British Columbia?

  • Élyse Caron-Beaudoin (Department of Health & Society, University of Toronto Scarborough); Marianne Hatzopoulou (Department of Civil & Mineral Engineering, Faculty of Applied Science & Engineering, U of T)

Spectroscopy by the millions: A fast, reproducible framework to yield chemical compositions of four million stars

  • Joshua Speagle (Department of Statistical Sciences, Faculty of Arts & Science, U of T); Ting Li (David A. Dunlap Department of Astronomy & Astrophysics, Faculty of Arts & Science, U of T)

The rise of social media and the transformation of influence: Joining foundational sociological theory and data science to rethink influence in social systems

  • Peter Marbach (Department of Computer Science, Faculty of Arts & Science, U of T; Vanina Leschziner (Department of Sociology, Faculty of Arts & Science, U of T; Daniel Silver (Department of Sociology, University of Toronto Scarborough)

Toward reducing harmful contents on social media with pluralistic justifications through community-powered AI

  • Syed Ishtiaque Ahmed (Department of Computer Science, Faculty of Arts & Science, U of T); Shion Guha (Faculty of Information, U of T); Shohini Bhattasali (Department of Language Studies, University of Toronto Scarborough)

Using deep learning and image recognition to develop AI technology to measure child-directed marketing on food and beverage packaging and investigate the relationship between marketing, nutritional quality and price

  • Mary R. L’Abbé (Department of Nutritional Sciences, Temerty Faculty of Medicine, U of T); David Soberman (Joseph L. Rotman School of Management, U of T); Laura Rosella (Dalla Lana School of Public Health, U of T); Steve Mann (Edward S. Rogers Sr. Department of Electrical & Computer Engineering, Faculty of Applied Science & Engineering, U of T)

 

DSI welcomes Women’s College Hospital as a partner

The Data Sciences Institute (DSI) is excited to announce a new partnership with Women’s College Hospital (WCH).  For more than 100 years WCH has been developing revolutionary advances in healthcare. Today, WCH is a world leader in health equity and Canada’s leading academic ambulatory hospital focused on delivering innovative solutions that address our most pressing issues related to population health, patient experience and health system costs.

“Women’s College Hospital (WCH) is reimaging and redesigning healthcare to enhance access, address inequities, and innovate more readily. To do that, we are leveraging data insights to identify areas for improvement, test new models of care and ultimately improve care for everyone. As a research leader in the field of data science, this collaboration with DSI will enable our teams to further their work, pursue new opportunities, and expand our partnership network,” said Dr. Rulan Parekh, vice president of Academics at Women’s College Hospital.

DSI collaborates with organizations eager to support world-class researchers, educators, and trainees advancing data sciences. We facilitate inclusive research connections, supporting foundational research in data science, as well as supporting the training of a diverse group of highly qualified personnel for their success in interdisciplinary environments. As one of our external funding partners, WCH researchers can apply for research grants, training and networking opportunities at the DSI.   

“We are delighted to announce this partnership with WCH. Our goal is to create a hub to elevate data science research, training, and partnerships. By connecting data science researchers, data and computational platforms, and external partners, the DSI advances research and nurtures the next generation of data- and computationally focused researchers. We are very excited to have WCH researchers join the DSI community.” - Lisa Strug, Director, Data Sciences Institute

DSI launches call for new ideas that push the boundaries of data science

Data science is an ever-evolving field. It continues to change, as new innovations come to light, and data scientists continue to revolutionize the way we use, analyze, collect and store data.  

Learn more about how you can apply.

Deadline LOI: March 3, 2023

Deadline full proposal: May 26, 2023

So, what is the next big thing for data science? 

The Data Sciences Institute (DSI) is launching a new Emergent Data Sciences Program competition designed to fund researchers and advance cross-disciplinary data science in areas where the University of Toronto is already a leader or has the capacity to become one. The DSI seeks to promote and expand the awareness and role of data science in all research activities across UofT. This call is intended to attract and develop new communities that want to enhance their activities using data science. 

Emergent Data Sciences Programs are a DSI core activity that helps fulfil its mission of bringing people together for collaborative generation and application of new ideas that support emergent areas in the data sciences. 

“At the DSI, we are proud to be a part of the University of Toronto, one of the world’s leading universities. I have no doubt that we will see many proposals that attempt to push the boundary of what has been done before from across UofT, as well as explore new multi-disciplinary applications of data science. We hope these will establish new fields and techniques that are so often the preludes to future scientific, technological and societal breakthroughs,” says David Lie, DSI Associate Director, Thematic Programming and Data Access.

When asked about what he thinks the next big thing in data science is Dr. Lie says, “The collection, application, and curation of large datasets focused on the public has been largely spearheaded by private entities trying to improve their enterprises and businesses. However, the Covid-19 pandemic demonstrated that there are a host of socially beneficial uses of that data. This is just the tip of the iceberg. I believe the next stage of data science will be to devise new techniques, and governance policies, that will enable data collected by private and public organizations to be shared and applied in other, important socially beneficial uses. To do this, we must overcome significant challenges, such as how we can share large data sets in privacy-preserving ways, and how we can identify and mitigate security risks that might arise. But this is just one example of many.”

Emergent Data Sciences Program proposals should include a broad span of activities that lead to the development of innovative data science methodologies, deep connections with computation and applied disciplines, new training programs, collaboration, knowledge mobilization, and impact. Ideal program proposals should establish or elevate local cross-disciplinary activity that advances the data sciences by pursuing the next big-but-yet-unknown data-driven field or computational or analytic breakthrough.  

Building environmental data sets to illustrate climate change in Northern Canada

DSI Catalyst Grants, supporting collaborative research teams for impact

Arctic regions experience climate change at a significantly faster rate than the rest of the planet. Residents in Northern Canada, and other Arctic regions, have long perceived anomalies in weather patterns, changes in long-standing sea ice patterns, and ecosystem stress. But these changes have been difficult to document, making it challenging to understand how they will ultimately impact human health and food security.  

The Data Sciences Institute (DSI) is funding cross-disciplinary research teams focused on using the data sciences to solve complex and pressing problems. Yuhong He (Geography, Geomatics & Environment, UTM) and Kent Moore (Chemical & Physical Science, UTM), one of the multidisciplinary collaborative research teams to receive a DSI Catalyst Grant, are using environmental data to help gain a more complete understanding of the changes happening in Northern Canada.  

“Cross-disciplinary data science research has the potential to solve some of the most pressing challenges we face today. Professors He and Moore’s research is just one example of many. We are beginning to see the impact of DSI Grants and the capacity of bringing collaborative research teams together. We are excited to see how Catalyst Grant recipients continue to catalyze the transformative nature of the data sciences,” says Gary Bader, DSI Associate Director, Research and Software.

The power of environmental data science

Professor Moore focuses on the cryosphere. The cryosphere is made up of all the frozen places on our planet like glaciers, continental ice sheets, permafrost, snow and ice. He uses theoretical, computational, and observational techniques to gain insights into the dynamics of the climate system. This helps place observed changes to our climate into a long-term context. 

Professor He’s research centers on the biosphere. She integrates multi-source remote sensing big data into ecological research for a better understanding of the drivers and mechanisms shaping these changes in vegetative ecosystems. Her research helps improve conservation efforts. 

Together the team uses Earth observation data and machine learning to reveal patterns and trends in land surface changes and their possible impacts on people. These results provide a crucial basis to develop long-term strategies to help cope with the climate crisis and its resulting environmental, societal, and economic impacts.   

The funding support from DSI increases the team’s capacity across a range of disciplines and helps them conduct an analysis of the environmental changes impacting northern Canada by developing open-access geospatial datasets. The funding also supports reproducibility and the establishment of an Earth observation data management system for sharing and using these datasets. Reproducibility is a DSI Thematic Program that strives for the development of widely adoptable methodology, processes, and infrastructure to share data and code locally and in privacy-compliant ways. 

Helping northern communities access reliable environmental data

“Pressing global issues like climate change require integrated, interdisciplinary approaches to successfully address research questions involving complex environmental systems. Both Professor Moore and I have extensive experience using Earth observation data and machine learning approaches, and our research on the cryosphere and biosphere make us an ideal team to establish a complete Earth observation data management system for northern Canada,” says Professor He.

For many northern communities, access to reliable data that illustrates the impact of climate change on regional ecosystems is difficult to access. An aggregate data set does not exist in a usable or scalable way. Local and regional approaches to environmental and climate action, like those taken by Nunavut’s Qaujigiartiit Health Research Centre, require access to longitudinal data to make informed decisions about the health of residents. The establishment of this Earth observation data management system will enable a network of researchers to upload, share, and download spatial data spanning a nearly 50-year period.    

“This research will not only advance and redefine our understanding of climate and ecosystems in this region but also provide potential users with direct knowledge and insights to develop local and regional adaptation strategies,” says Professor He.

Data science to make our society better

How do we get people to understand how data influences their lives?

Data science has infiltrated our everyday lives and, although a powerful tool, with it come cases of bias, injustice, and discrimination. Consider the emerging discourse around the metaverse, within which people only exist as data. These data provide opportunities for research and innovation, but also commodification and surveillance. 

So how do we conduct data science responsibly?

That’s exactly what the new DSI@UTM initiative is tackling. The DSI at the University of Toronto Mississauga is leading a tri-campus initiative to encourage research activity in Responsible Data Science that includes community-building, workshops and seed funding for research.

Data science will continue to restructure aspects of our world and it is important to maintain a commitment to questions of power, inequity, responsibility, surveillance, justice, and harm. Especially, to ensure that collecting, manipulating, storing, visualizing, learning from, and extracting useful information from data is done in a reproducible, fair, and ethical way.

Why is UTM the right place for this initiative?

UTM has a cluster of faculty working across questions of responsible data science. One example is the Institute of Communication, Culture, Information and Technology (ICCIT), which looks at technology, media, and society and considers how algorithms affect the world. The campus is also comprised of  researchers working on sustainability, management, and geography along with  initiatives focused on giving back to the Mississauga community, including working with Indigenous community leaders. 

During an interview about this initiative, Associate Director of the DSI@UTM, Professor Bree McEwan, highlighted the revised UTM Strategic Framework. The Framework expresses core priorities and commitments that will strengthen consensus, inspire action, and guide investment. It includes priorities such as embracing place and encouraging collaboration.

“Responsible data science is about how we do data science, not just for the purpose of doing data science, but doing data science in a way that is making our society, our environment, etc. better for everyone. Therefore, the idea of responsible data science fits hand in glove with the other pieces of the Framework at UTM. How do we get lots of people to understand how data influences their lives, the idea of responsible data science? At UTM, we already have some strengths in how what we do here at the University influences the community around us,” says McEwan.

“The University of Toronto Mississauga is brimming with world-class researchers, focused on changing the world. UTM is a great place for this initiative, and we are thrilled to be building this within the DSI, as responsible data science needs to be a key part of both our research at UTM and our daily lives,” says Elspeth Brown, Associate Vice-Principal Research (AVPR) in the Office of the Vice-Principal, Research (OVPR). 

Events to look out for

A big focus of this initiative is bringing researchers working with data science at the UTM campus, and beyond, together. On December 7, DSI@UTM will be hosting its first Data Digest, Data & Sustainability. These networking events feature UTM data science researchers and provide attendees with the opportunity to engage in Responsible Data Science. Each month will feature a selection of short interdisciplinary research-based talks on a topic and explore challenges and opportunities related to data science. 

In February 2023, DSI@UTM will be hosting a Data in the Metaverse workshop. This event seeks to imagine future possibilities, challenges, and implications of data creation, collection, analysis, and deployment in the metaverse. Current discussions of the metaverse and the increase in VR adoption make this an opportune time to consider how data can, is, and could be employed in virtual reality and immersive environments.

Critical Investigation of Data Science Grant

The DSI@UTM Critical Investigation of Data Science (CIDS) grant is designed to provide seed funding for scholars. Projects can vary in scope from the analysis of specific data science projects and approaches to the articulation of potential harms in data science from a broader perspective.

“It’s about putting our money where our mouth is, in that we should be inviting critique of the data sciences in order to improve the data sciences. These grants will allow people to have some support for exactly those kinds of projects. Building in this critical angle, this self-reflection, into the data sciences is also important to make sure that we are doing data science responsibly,” says McEwan.