Data Sciences Institute (DSI)

Reproducibility: The heart of the research method

Data Sciences Institute Reproducibility Thematic Program

The growing use of large-scale complex data across disciplines has brought the challenge of reproducibility to the forefront. But how can we foster trust in data-informed research? The Data Sciences Institute (DSI) Reproducibility Thematic Program aims to address such questions by focusing on the development of widely adoptable methodology and processes to share data and code, as well as the development of infrastructure, methods, and models that support reproducible and reliable research. Ambitions for the program include educating the next generation of researchers on the importance of transparency, removing the intimidation factor from reproducibility, and supporting researchers in identifying their path to reproducibility.

“There is a lot of discussion about how the lack of reproducible results may make people question their confidence in science. We saw this play out during the pandemic. Robust and reproducible processes are paramount to maintaining confidence in the research enterprise and ensuring the generation of reliable results upon which science builds. We want to push to make the DSI a centre for reproducible science, and help researchers understand how to adopt best practices in reproducibility,” says Timothy Chan, DSI associate director of research and thematic programming.

Last fall, the DSI held an open call for researchers to co-lead this thematic program. Reproducibility co-leads Professors Rohan Alexander, Benjamin Haibe-Kains and Jason Hattrick-Simpers represent different research fields– from social sciences to life and physical sciences – but they are united in their passion for supporting and advocating for transparency and reproducibility in research. They are in the process of developing programs and community-building activities.

Stay tuned for Reproducibility activities and opportunities.

2022 Toronto Workshop on Reproducibility

The Reproducibility Thematic Program kicked off with a multi-day workshop that brought together over 460 academic, industry, and non-profit participants on the critical issue of reproducibility in applied statistics and related areas. The workshop was hosted by the DSI and CANSSI Ontario.

Toronto Workshop on Reproducibility

Topics at the workshop ranged from reproducibility in language modelling and machine learning, to biomedical research, to integrating reproducibility in undergraduate social science programs and reproducibility in crowd science.

The workshop featured over forty speakers from the University of Toronto, University Health Networks, Canadian and International research universities and focused on evaluating and teaching reproducibility, as well as reproducibility practices. “We were tremendously pleased with the caliber of the speakers,” says Rohan Alexander, the lead organizer. “The deep engagement speaks to the importance of reproducibility. To create understanding it is important that others can trust results.”

Reproducibility champion and world-renowned computer scientist, Professor Joëlle Pineau, spoke to improving reproducibility in machine learning research. “Reproducibility is a minimum necessary condition for a finding to be believable and informative,” says Pineau, an associate professor at the School of Computer Science at McGill University. She co-directs the Reasoning and Learning Lab at McGill and also leads the Facebook AI Research lab.

Presentation slide from Reproducibility Workshop.

Professor Colm-Cille Patrick Caulfield from the University of Cambridge discussed why an honest discussion of uncertainty in models is critical for climate science. “We have such a complex climate system, only by ensuring transparency can we be 100% confident in our predictions,” Caulfield says. The DSI and the C2D3 Cambridge Centre for Data-Driven Disc are planning to host joint workshops around the DSI’s Thematic Programs of Inequity and Reproducibility.

Meet the Reproducibility Co-Leads

The DSI is excited to announce co-leads for its Thematic Program in Reproducibility. The co-leads are responsible for the thematic program events, activities, and community-building.

Rohan Alexander is an assistant professor at the Faculty of Information and Department of Statistical Sciences at the University of Toronto. He is the assistant director of CANSSI Ontario, a senior fellow at Massey College, and a faculty affiliate at the Schwartz Reisman Institute for Technology and Society. He is interested in using statistics to understand the world.

“I am particularly interested in how we turn something as complicated as society into a dataset that can be analyzed, and what we lose in exchange for the benefits that such statistical modeling brings. I applied to be a Reproducibility co-lead to contribute to the improvements that are happening in the social sciences and to share and learn from other disciplines.”

Rohan Alexander

Benjamin Haibe-Kains is a senior scientist with the University Health Network and an Associate Professor of Medical Biophysics at the Temerty Faculty of Medicine. His research program focuses on developing multimodal models, using radiological images and large-scale genomic data, to predict the survival and therapy response of cancer patients.

“I have always been passionate about research transparency and reproducibility, key components of Open Science. When I saw that the newly created Data Sciences Institute had an open call for leading their Reproducibility Theme, I could not miss this unique opportunity to educate on how to make research more transparent and reproducible.”

Benjamin Haibe-Kains

Jason Hattrick-Simpers is a Professor at the Department of Materials Science and Engineering, at the University of Toronto and a Research Scientist at CanmetMATERIALS. His research focuses on the creation of tools to enable the discovery of new corrosion-resistant materials or new materials for converting waste heat into usable energy.

“Reproducibility is at the heart of the scientific method and is what allows us to live in a world filled with technological wonders.”

Jason Hattrick-Simpers

Applications open for Doctoral Student Fellowships

DSI Doctoral Student Fellowships support multi-disciplinary training and collaborative research in data sciences that include faculty from the University of Toronto and external funding partner institutes.

Fellows receive stipend support of $25,000/year for up to 3 years and have access to travel awards to present their work. Fellows will also participate in DSI cohort programming for professional development.

Q&A information session: March 3, 2022, 9-10 am ET

Deadline for applications: April 11, 2022

Generating evidence and tools to support social change

Data Sciences Institute Inequity Thematic Program

The Data Sciences Institute (DSI) is excited to announce co-leads for its Thematic Program in Inequity. The availability of new and diverse data in each of these domains has transformed traditional disciplines, encouraging researchers to address pressing questions in innovative, data-driven ways. The opportunity to link these new resources with socioeconomic and demographic data, and to co-develop research projects with under-resourced communities, helps researchers use more data-driven approaches to understand social inequities and empower communities.

“With the Inequity Thematic Program, we hope to encourage the generation of evidence and tools to enhance our understanding of inequity and support equitable social change,” says Timothy Chan, DSI associate director of research and thematic programming. “This is relevant across disciplines, whether it’s the social sciences, the humanities, health sciences, or physical sciences.”

Earlier this year, the DSI had a call for research co-leads for its DSI Inequity Program. Professors Angelina Grigoryeva, Arjumand Siddiqi and Azadeh Yadollahi, the three co-leads, bring a broad disciplinary perspective and will have substantial flexibility in developing the Program and activities. These can include supporting scholarly exchange (workshops, conferences, seminars), organizing new research collaborations, and supporting applications for substantial funding opportunities. Several of the DSI’s recently announced Catalyst Grant awards focus on using the transformative nature of data sciences to address inequities and drive positive social change.

“The DSI encourages innovative data science methodology development and application in these areas by offering support and funding for research efforts and programming. We are very excited to serve as co-leads and work with DSI member researchers to create opportunities for exchange and learning,” says Angelina Grigoryeva, one of the co-leads.

The co-leads plan to bring together DSI member researchers to better understand their interests and needs, conduct a grant writing session, develop a speaker series, and, importantly, facilitate informal presentations and networking with the community.

Meet the Inequity Co-Leads

Angelina Grigoryeva is an assistant professor for the Department of Sociology at the University of Toronto Scarborough. She is also a member of the DSI Research & Academics Committee. She researches social stratification and inequality. More specifically she focuses on patterns of wealth, race and gender inequality and examines household economic lives in the context of large-scale socio-economic transformations, with a focus on both between- and within-household inequalities.

Angelina’s current work focuses on the social implications of the large-scale shift towards finance-based capitalism in North America. More specifically, she examines how families became increasingly involved in financial markets in recent decades, how access to financial markets remains socially stratified, and the consequences for growing wealth inequality.

Angelina dreams of leveling the playing field and addressing growing inequality. “What we do know is that inequality in North America, in the United States and Canada alike, has been on the rise. It has been increasing steadily in the past several decades, and it doesn’t look like this trend will reverse anytime soon based on what we see today.”

Arjumand Siddiqi is the Canada Research Chair in Population Health Equity and a professor and the Division Head of Epidemiology at the Dalla Lana School of Public Health, University of Toronto. She works as a social epidemiologist. This is the study of how social factors contribute to health and disease over a period of time.

“We want to know what’s happening in society and specifically, we’re trying to figure out how that’s influencing health. So, what are the large societal dynamics that are at play?” she says.

Her work looks at what inequality looks like, and who is holding the power. However, she’s not looking at individuals. Instead, she’s looking for groups and that’s why the data becomes important. She’s looking at the population level and trying to figure out what’s happening differently.

“I love that when you look at the data and you try to answer questions, it actually leads to other questions. Sometimes it’s frustrating because you can’t solve everything you want to solve, but in some ways, it’s exciting that you can kind of start to at least get a sense of what we don’t know, which is important.”

Azadeh Yadollahi is a Canada Research Chair, a senior scientist at KITE, Toronto Rehabilitation Institute, at the University Health Network, as well as an associate professor at the Institute of Biomedical Engineering at U of T. Her area of research is health equity and sleep. More specifically she develops technologies, often wearable technologies, to monitor physiological signals, and assess sleep-related breathing problems, such as sleep apnea or snoring.

She works with underserved communities, for example, those experiencing homelessness. She hopes her work will change current policies for diagnosing and treating sleep apnea for these populations because a lot of the policies require individuals to come and sleep in a lab to get treatment, which is not always possible. She also hopes to raise awareness about sleep problems.

“I am very passionate to make a meaningful impact in the lives of individuals who are socially and systematically disadvantaged and do not have equitable access to care.”

2022 Toronto Workshop on Reproducibility

CANSSI Ontario and the Data Sciences Institute at the University of Toronto are excited to host the Toronto Workshop on Reproducibility, February 23-25, 2022. This three-day workshop brings together academic and industry participants on the critical issue of reproducibility in applied statistics and related areas.

This virtual workshop is free and open to all.

The Workshop has three broad focus areas

Evaluating reproducibility

Systematically looking at the extent of reproducibility of a paper or even in a whole field is important to understand where weaknesses exist. Does, say, economics fall flat while demography shines? How should we approach these reproductions? What aspects contribute to the extent of reproducibility.

Teaching reproducibility

While it is probably too late for most of us, how can we ensure that today’s students don’t repeat our mistakes? What are some case studies that show promise? How can we ensure this doesn’t happen again?

Practices of reproducibility

We need new tools and approaches that encourage us to think more deeply about reproducibility and integrate it into everyday practice.

Data Sciences Institute Catalyst Grants support transformative data science research

The Data Sciences Institute (DSI) at the University of Toronto is funding seventeen cross-disciplinary research teams focused on using the transformative nature of data sciences to solve complex and pressing problems.

“The global and research challenges we face today are increasingly complex. The DSI Catalyst Grant projects bring together collaborative research teams focused on the development of new data science methodology or the application of existing tools in innovative ways to address these challenges,” says Lisa Strug, director of the DSI. “We were floored by the cutting-edge advances proposed in the applications we received for our inaugural competition.”

Here we highlight a few of the inspiring funded proposals and research teams. The full list of recipients can be found below.

Using data science to rethink water practices and equity in India

The United Nations states that water scarcity affects more than 40 per cent of the global population. India is working hard to expand access to water pipe networks. However, because of rapid urbanization and inadequate infrastructure, most Indian pipe networks provide water for less than four hours per day, impacting 390 million people. To cope, residents invest in water storage infrastructure and seek alternative water sources, imposing significant financial, environmental, educational, health, and time costs, especially on women and girls.

Professors David Meyer (Civil & Industrial Engineering, Faculty of Applied Science & Engineering), Nidhi Subramanyam (Geography & Planning, Faculty of Arts & Science), and Carmen Logie (Factor Inwentash Faculty of Social Work) are developing tools and metrics that harness water data and empower water planners, communities, and activists to help achieve water equity in India. With their project Harnessing Data to Visualize and Mitigate Urban Water Inequities within the Cauvery River Basin, India, their team brings together diverse disciplinary perspectives on water and data – including a deep understanding of water engineering, water governance, and equity. Once they have collected the data, they will combine it to create novel insight-generating metrics and visualizations for planners regarding the equity and equality of water availability.

“We are delighted to receive this award,” says Nidhi Subramanyam. “The DSI provides an exciting platform for those of us interested in using data for justice to come together, discuss and reflect on how new technologies and streams of data can help us rethink pedagogy and practices within our respective fields.”

Improving data derived from single-cell sequencing

Single-cell sequencing, the ability to look at cells at the individual level, has been revolutionary. However, the technology also poses challenges. For example, different types of sequencing approaches produce distinct data sets that don’t integrate with each other.

Professors Zhaolei Zhang (Donnelly Centre/Molecular Genetics, Temerty Faculty of Medicine), Dehan Kong (Statistical Sciences, Faculty of Arts & Science, UofT), and Dennis Kim (Princess Margaret Cancer Centre, UHN) received a Catalyst Grant for their project, Developing rigorous statistical methods for multimodal single-cell sequencing data analysis, which aims to tackle this problem. Their research is co-funded by Medicine by Design.

“We need robust and effective statistical tools to handle the data being measured from the thousands of cells and tens of thousands of genes in one experiment. Our framework will be able to integrate that data generated from different approaches in a very efficient and accurate way,” says Zhaolei Zhang.

Using datasets collected primarily from brain or blood cells, this multi-disciplinary research team aims to refine a method that can give researchers a more complete picture of single-cell data.

Data science to support policies for gender equity

Early evidence indicates that the pandemic has profoundly and disproportionately impacted women. Many predict that the social and economic burden of the pandemic will be shouldered by women and girls worldwide. To help policymakers mitigate the damage, it is important to have up-to-date accurate information and data.

“The COVID pandemic has had differential impact on many communities and these impacts are not yet fully documented, and it is not clear yet what this means for these communities going forward. Our research will attempt to quantify the pandemic’s impacts on researchers and inventors across gender, location, and discipline by creating yearly measures of their productivity and research team diversity pre- and post-COVID,” say Professors Michelle Alexopoulos (Department of Economics, Faculty of Arts &Science) and Kelly Lyons (Faculty of Information, UofT). They are recipients of a Catalyst Grant for their project Using Data Science Methods to Understand the Differential Impact of COVID on Researchers and Inventors by Gender.

The DSI Catalyst Grant will support their collaborative research team to apply data mining and natural language processing techniques to data on publications, grant applications and patents. The resulting metrics, and extracted location and gender identifiers, will be combined with socio-economic information on outbreaks, government interventions across jurisdictions (such as lockdowns and school closures), locations of researcher’s team members, and measures of gender diversity within research teams to explore how the magnitudes of the pandemic’s impacts are influenced by these factors.

Building a community of data scientists   

“The Data Sciences Institute is committed to fostering new opportunities to cultivate multi-disciplinary collaborations between data science methodologists and researchers in various application domains. This is just the beginning,” says Timothy Chan, DSI associate director of research and thematic programming. “With this inaugural round, we received 70 highly competitive proposals which were carefully assessed by a multidisciplinary Review Panel.”

The DSI Catalyst Grants are supported by the University of Toronto Institutional Strategic Initiatives and external funding partners, the University Health Network, the Hospital for Sick Children and the Lunenfeld-Tanenbaum Research Institute. The grants are designed to fund multidisciplinary research teams focused on the development of new data science methodology or the innovative use of data science to address questions of major societal importance. Each grant is valued at up to $100,000 for one to two years.

Two of this year’s Catalyst Grants are co-funded by Medicine by Design, one of these projects is described above. Medicine by Design awards funding to multi-disciplinary, multi-institutional research teams that are finding solutions to key challenges in regenerative medicine. Medicine by Design receives funding from the Canada First Research Excellence Fund.

Congratulations to the 2022 DSI Catalyst Grant collaborative research teams! 

Using Data Science Methods to Understand the Differential Impact of COVID on Researchers and Inventors by Gender

  • Michelle Alexopoulos (Department of Economics, Faculty of Arts & Science, UofT); Kelly Lyons (Faculty of Information, UofT)

Stellar Flares in Hiding: Discovering Flares in Stellar Time-Series Data with Hidden Markov Models 

  • Gwendolyn Eadie (Astronomy & Astrophysics, Faculty of Arts & Science, UofT); Radu Craiu (Statistical Sciences, Arts & Science, UofT)
  • Read the announcement from the Department of Statistical Sciences.

Attention-based coupling, or learning how to swim, thousands of neurons at a time 

  • Guillaume Filion (Biological Sciences, UTSC); Minoru Koyama (Biological Sciences, UTSC)

50 years of spatial-explicit environmental data to examine changes in northern Canada

  • Yuhong He (Geography, Geomatics & Environment, UTM); Kent Moore (Chemical & Physical Science, UTM)
  • Read the story highlighting UTM Scholars.

MIRA Clinical Learning Environment (MIRA-CLE) for Lung 

  • Andrew Hope, Tony Tadic and Chris McIntosh (Radiation Medicine Program, UHN)

Preventing a Reproducibility Crisis in Quantum Computing: Benchmarking Quantum Computing Against Classical Algorithms for Molecular Property Predictions

  • Hans-Arno Jacobsen (Electrical & Computer Engineering, Faculty of Applied Science & Engineering, UofT); Ulrich Fekl (Chemical and Physical Sciences, UTM)
  • Read the story from the Faculty of Applied Science and Engineering.
  • Read the story highlighting UTM Scholars.

Using Geometric Data to Construct More Equitable Living Spaces 

  • Alec Jacobson (Computer Science, Arts & Science, UofT); Maria Yablonina (Daniels Faculty of Architecture, Landscape, and Design); Brady Peters (Daniels Faculty of Architecture, Landscape, and Design)
  • Read the full story from the John H. Daniels Faculty of Architecture, Landscape and Design.

Robust Risk-Aware Reinforcement Learning for Financial Modeling 

  • Sebastian Jaimungal (Statistical Sciences, Faculty of Arts & Science, UofT); John Hull (Joseph L. Rotman School of Management)
  • Read the announcement from the Department of Statistical Sciences.

Harnessing Data to Visualize and Mitigate Urban Water Inequities within the Cauvery River Basin, India

  • David Meyer (Faculty of Applied Science and Engineering); Nidhi Subramanyam (Geography & Planning, Faculty of Arts & Science); Carmen Logie (Factor-Inwentash Faculty of Social Work)
  • Read the story from the Faculty of Applied Science and Engineering.

Bioimage Informatics for Exploring Heterogeneous Cell Communities and Accelerating the Development of Effective Cancer Treatments

  • Project co-funded by Medicine by Design. Read the full story by Medicine by Design.
  • Joshua Milstein (Chemical and Physical Sciences, UTM); Alison McGuigan (Chemical Engineering & Applied Chemistry & Biomedical Engineering, UofT); Rodrigo Fernandez-Gonzalez (Chemical Engineering & Applied Chemistry &Biomedical Engineering, UofT)
  • Read the story from the Faculty of Applied Science and Engineering.
  • Read the story highlighting UTM Scholars.

Removing unwanted variations from heterogeneous neuroimaging and genomic data

  • Jun Young Park (Statistical Sciences, Faculty of Arts & Science, UofT); Laurent Briollais (Lunenfeld-Tanenbaum Research Institute); Michael Wilson (The Hospital for Sick Children)
  • Read the announcement from the Department of Statistical Sciences.

Predicting And Preventing Chronic Disease burden in populations (PREPARED): Deploying decision-support tools for the prevention of chronic diseases 

  • Laura Rosella (Dalla Lana School of Public Health); Birsen Donmez (Faculty of Applied Science & Engineering, UofT); Myrtede Alfred (Mechanical & Industrial Engineering); Hailey Banack (Dalla Lana School of Public Health); Greg A. Jamieson (Mechanical & Industrial Engineering)
  • Read the story from the Faculty of Applied Science and Engineering.

Informatics platform for a pan-Canadian drug discovery chemical library 

  • Matthieu Schapira (Pharmacology & Toxicology, Temerty Faculty of Medicine UofT); Robert Batey (Chemistry, Arts & Science, UofT)
  • Read the story from the Department of Pharmacology & Toxicology.

Methods for genome-wide studies of variants with sex differences in genetic effect 

  • Lei Sun (Statistical Sciences, Arts & Science, UofT); Andrew Paterson (The Hospital for Sick Children)
  • Read the announcement from the Department of Statistical Sciences.

Machine Learning for Dynamic and Short-term Fall Risk Assessment in People with Dementia 

  • Babak Taati (Kite Research Institute, Toronto Rehab, UHN); Andrea Iaboni (Department of Psychiatry, Temerty Faculty of Medicine)

Developing rigorous statistical methods for multimodal single-cell sequencing data analysis

  • Project co-funded by Medicine by Design. Read the full story by Medicine by Design.
  • Zhaolei Zhang (Donnelly Centre/ Molecular Genetics, Temerty Faculty of Medicine); Dehan Kong (Statistical Sciences, Faculty of Arts & Science, UofT); Dennis Kim (Princess Margaret Cancer Centre, UHN)
  • Read the announcement from the Department of Statistical Sciences.

Reduce Early Revisions of Joint Replacements through Data Science Strategies

  • Yu Zou (Faculty of Applied Science & Engineering, UofT); Qiang Sun (Computer & Mathematical Sciences, UTSC); Adele Changoor (Lunenfeld-Tanenbaum Research Institute)
  • Read the story from the Faculty of Applied Science and Engineering.