Archives for May 26, 2022

Bringing together the hammer and the nails – encouraging collaborations between methodologists and applied researchers

The Data Sciences Institute recently held a competition for Seed Funding for Methodologists. This funding is designed to catalyze new Collaborative Research Teams and encourage new partnerships between data science methodologists or theorists and applied researchers. Data science is inherently interdisciplinary and building capacity in data science has the potential to advance research frontiers across a broad spectrum of fields.

“This competition was about uniting cutting-edge methodologists with applied researchers to form new collaborations. By presenting and bringing to the fore innovative methodological and theoretical work, our goal is to ensure that new Collaborative Research Teams are forged with new and unexpected connections,” says Michael Brudno, professor at the Department of Computer Science, Faculty of Arts & Science, and chief data scientist at the University Health Network.

“Imagine that you have this amazing new hammer that you spent ages perfecting. But you are missing the nails on which to use your hammer. This seed funding is about finding those nails,” says Eyal de Lara, professor at the Department of Computer Science, Faculty of Arts & Science.

Presenting the three inaugural methodologists 

Aya Mitani, from the Dalla Lana School of Public Health, is developing a methodology that applies multilevel matrix-variate analysis to longitudinally collected dental data while accounting for correlation. The unique correlation structure of teeth provides an excellent application area, and Mitani aims to connect with researchers and oral health practitioners to prevent and manage oral diseases with greater precision, improving oral and general health outcomes across populations by applying these new methods and tools.

Linbo Wang, from the University of Toronto Scarborough, Department of Computer and Mathematical Sciences is developing innovative tools to find causal relationships with observational and/or experimental datasets. These new tools will allow researchers to better understand the underlying causal mechanisms and help decision-makers make more informed decisions. There is broad and impactful potential for the application of these methods.

Murat Erdogdu, from the Faculty of Arts and Science, Department of Computer Science and Statistical Sciences is developing theoretical tools to compute the asymptotic generalization error of certain overparameterized estimators and characterize the convergence rate of overparameterized neural networks beyond the kernel regime. This new theoretical tool will enable researchers to more carefully develop machine learning models that take their model’s limitations into account, across many application areas.

Showcasing innovative data science methodologies   

One key deliverable for this award is that recipients present their methodology or theory focusing on building new applied collaborations. 

Join us on June 16 for a discussion on potential application areas as Mitani, Wang and Erdogdu present their innovative methodological techniques. We welcome applied researchers from any discipline interested in learning more about how these methodologies might be applicable to their research.  

Register today to learn more. 

Student teams investigate just how difficult it is to reproduce research

Reproducibility is essential for research. How can we know if a study is reliable if it is not reproducible or replicable? A researcher should be able to pick up any piece of published research and replicate it, provided they have the right materials 

But just how hard is it to reproduce research?  

Six student teams, from across U of T, including the Faculty of Applied Science and Engineering, the Temerty Faculty of Medicine, the Dalla Lana School of Public Health, and the Faculty of Arts and Science, set out to try and answer this question by attempting to reproduce published analytic research. Each team presented its discoveries during the recent Student-Led Reproducibility Challenge that aimed to raise awareness of reproducibility amongst students and emphasize that robust and reproducible processes are critical to maintaining confidence in research.  

Reproducibility is a key Thematic Program for the Data Sciences Institute, and this challenge was pioneered by the Reproducibility co-leads Rohan Alexander, Faculty of Information and Statistical Sciences, Faculty of Arts & Science; Benjamin Haibe-Kains, University Health Network and Medical Biophysics at the Temerty Faculty of Medicine; and Jason Hattrick-Simpers, Faculty of Applied Science and Engineering.  

“We were very excited to see the high level of engagement, and everyone was impressed with the level of work and commitment to the challenge. We hope this is the first of many student-led reproducibility challenges and hackathons. It is so important that future researchers value reproducibility and carry these lessons forward in their work,” says Hattrick-Simpers, team captain for one of the student teams.

“The student teams had such great presentations. The level of commitment to thoroughness and dedication to reproducing the computational methods was impressive. Transparency, reproducibility, and replication in research are more important than ever. We need to be better at communicating the solutions and challenges we face as researchers,” says Prof. Aya Mitani, team captain for one of the student teams and assistant professor at the Dalla Lana School of Public Health.

The challenge of reproducibility 

The teams reported multiple cross-cutting challenges. For example, many teams had trouble accessing the necessary data publicly. This was the case for Alyssa Schleifer and Hudson Yuen, whose research paper was about safety, health, and isolation in prisons. They reported that the original datasets were unavailable in many cases, with only pre-processed subsets accessible, this affected their ability to check for robustness or derive new findings.  

Kimlin Chin, who replicated a paper concerning falling US birth rates, said that finding the right paper was difficult. “Finding a paper that had sufficient content to make an interesting reproduction paper and also came with a complete reproduction package i.e., was not missing any code, data, or output files was a challenge. 

Sanjot Grewal, Steve Jeoung, Walid Maraqa, and Mu Yang attempted to reproduce the results of a paper studying the effect of social distancing on COVID-19 cases in the US during the early days of the pandemic. The team discovered inconsistencies in the reproduced results and realized that the authors of the original paper had incorrectly documented a critical formatting and pre-processing step. Christie Lau, Laurie Lu, Ruiyan Ni and Emily So, worked on a paper concerning predictive models of drug response, and found that the lack of proper data resources and identification set off the reproducibility process on the wrong foot. 

Having the right technology was also a challenge. Daniel Persaud, who used a paper looking at a general-purpose machine learning framework for predicting properties of inorganic materials, experienced computational constraints since he had to rely on using his personal computer. 

“It was remarkably interesting to hear from students working in different research fields. Most of the obstacles they faced were quite universal, but some were specific to their field where programming languages and ways to access data differ. We have a lot to learn from each other and more importantly, we need to work together to improve transparency and reproducibility,” says Haibe-Kains, team captain for one of the student teams.

Full group from reproducibility challenge.

The importance of reproducibility in the workplace 

Holly Xie, Senior Applied Scientist – Machine Learning Products at Xero Accounting, and Chris Henry, Senior Economist at the Bank of Canada spoke about the importance of reproducibility for organizations. Henry discussed the importance of reproducibility at the Bank of Canada, highlighting the necessity of creating reproducible content and maintaining records for when new employees are onboarded.  

“Reproducibility is important because things change over time. You need to know how things were done. People also come and go. Reproducibility is essential in helping new team members come up to speed,” he said during his presentation.

The student teams consisted of:

  • Alyssa Schleifer and Hudson Yuen, Challenge: Western, B. (2021). Inside the Box: Safety, Health, and Isolation in Prison
  • Kimlin Chin, Challenge: Kearney, M. S., Levine, P. B., & Pardue, L. (2022). The Puzzle of Falling US Birth Rates since the Great Recession 
  • Swarnadeep Chattopadhyay, Arsh Lakhanpal, and Olaedo Okpareke, Challenge: Kearney, M. S., Levine, P. B., & Pardue, L. (2022). The Puzzle of Falling US Birth Rates since the Great Recession 
  • Christie Lau, Laurie Lu, Ruiyan Ni and Emily So, Challenge: Ma, J., Fong, S. H., Luo, Y., Bakkenist, C. J., Shen, J. P., Mourragui, S., & Ideker, T. (2021). Few-shot learning creates predictive models of drug response that translate from high-throughput screens to individual patients 
  • Sanjot Grewal, Steve Jeoung, Walid Maraqa, and Mu Yang, Challenge: Siedner, M. J., Harling, G., Reynolds, Z., Gilbert, R. F., Haneuse, S., Venkataramani, A. S., & Tsai, A. C. (2020). Social distancing to slow the US COVID-19 epidemic: Longitudinal pretest-posttest comparison group study 
  • Daniel Persaud, Challenge: Ward, L., Agrawal, A., Choudhary, A., & Wolverton, C. (2016). A general-purpose machine learning framework for predicting properties of inorganic materials