Data Sciences Institute (DSI)

Fostering Collaborative Problem-Solving Through Data Science Over Coffee

by Sara Elhawash

The Data Sciences Institute and the Department of Statistical Sciences jointly spearhead the Data Sciences Café, aimed at addressing research challenges through statistical advice. The weekly Data Sciences Café, which brings together researchers from diverse backgrounds, is an exceptional platform for non-statistical researchers seeking to harness statistical techniques to enhance their work. 

Since its start in 2022, the Data Sciences Café has garnered attention, attracting over 40 students and faculty members from the University of Toronto. Every week, participants convene for coffee, engaging in productive discussions and presentations, receiving valuable statistical advice tailored to their specific datasets. The Café’s structure allows attendees to tap into the expertise of a team comprising faculty members, graduate students, and senior statistical consultant students, all working collaboratively to assist in refining research methodologies and analyzing data effectively. 

The sessions begin with researchers presenting their projects and the challenges they face, articulating their needs for statistical guidance. Following these presentations, a 30-minute consultation session is conducted by Dr. Samantha-Jo, Assistant Professor, Teaching Stream, Department of Statistical Sciences and the driving force behind the Café’s success. 

Diego Proano Falconi, a second-year PhD student, Faculty of Dentistry, shares his experience of how the Café benefited his research. Working on a project centered on evaluating the financial hardship of dental out-of-pocket expenses in Canada, Diego needed to analyze complex datasets. He shares, “The Data Sciences Café was an invaluable experience that allowed me to articulate my thesis project and gain insights and methodological clarity from my fellow statistician colleagues.”  

Diego adds, “I’m very thankful for the friendly and academically enriching environment. Following my presentation, engaged students offered constructive feedback, shedding light on areas that could strengthen my analysis. This collaborative atmosphere not only improved my work but also opened doors to future collaborations. Whenever I encountered statistical challenges, I knew I could rely on this network for guidance.” 

“This café has emerged as a remarkable platform, supporting both faculty and students in leveraging data science to overcome research obstacles. The utilization of statistical techniques to enhance data analysis has been instrumental in achieving more robust results. The Café’s ability to unite researchers from various disciplines has further fostered collaboration, sparking innovative solutions through the lens of statistics,” says Dr. Samantha-Jo   

Amanda Ng, Statistics and Mathematics student, Department of Statistical Sciences, reflects on her involvement as a statistical student consultant, “Participating in the weekly meetings, where graduate students presented their research, enabled me to cultivate skills in understanding multidisciplinary datasets. I gained the expertise to recommend appropriate statistical methodologies and identify potential biases within current research study methods. The Data Sciences Café creates an informal environment for students to communicate with researchers from different academic backgrounds, facilitating connections that extend beyond the sessions. In fact, I secured my first research assistantship through a connection made at the Café.” 

This sentiment is echoed by Sofia Panasiuk, a PhD student who served as a participant during the Café sessions. Sofia explains, “My primary aim was to find a platform where I could share and receive feedback on my initial research ideas, especially from data science students who were more experienced with the technique than me. The Café surpassed my expectations, as the students offered insightful comments, highlighted potential limitations, and proposed solutions to the challenges I was encountering. It was here that Amanda and I connected, leading to a valuable collaboration.” 

The Café has also proven to be a catalyst for meaningful collaborations. Amanda Ng’s partnership with Sofia led to a notable accomplishment — presenting the project’s outcomes at the 11th Annual Canadian Statistics Student Conference and securing first place for an undergraduate presentation. 

“The insights gained from the Café led me towards methodological papers that would have taken me ages to find. Someone had the reference I needed immediately and that saved me a significant amount of time streamlining my research process,” says Sofia. 

The Data Sciences Institute– a multi-divisional, tri-campus, and multidisciplinary hub for data science activities at U of T — plays a pivotal role in driving these collaborative efforts. Through initiatives like the Data Sciences Café, DSI fosters research connections, innovation, and enhances the teaching and learning experience in data sciences.

As the academic year resumes, the Café is set to recommence its weekly gatherings on September 28. For those intrigued by the prospect of collaborative problem-solving and statistical exploration, sign-ups for consultation are now open here. Join the Data Sciences Café to be part of this dynamic community at the forefront of leveraging statistics to conquer research challenges, all while enjoying a cup of coffee. 

Unveiling the Connection: DSI Research Explores the Connection Between Social Media and Sociological Theory

by Sara Elhawash

How does social influence operate within the realm of social media? DSI members Professors Peter Marbach (Computer Science, Faculty of Arts & Science), Vanina Leschziner (Sociology, Faculty of Arts & Science) and Daniel Silver (Sociology, University of Toronto Scarborough), are exploring the relationship between social media and social influence with the aim of validating a theoretical model proposed decades ago. 

Recognizing the profound impact of social media on society, the researchers are unraveling the complex nature of influence in the digital age. Their work seeks to provide a comprehensive understanding of how individuals acquire, accumulate, and exercise influence within the realm of social media.  

Data science is inherently interdisciplinary, and building capacity in data science has the potential to advance research frontiers across a broad spectrum of fields. With the support of the DSI Catalyst Grant, this collaborative research team of sociologists and a computer scientist embodies the DSI mission of supporting interdisciplinary research for emerging societal issues.  

Professor Leschziner explains, “Though influence is a core phenomenon in sociology and research on social media, it is poorly understood. We lack a thorough understanding of how individuals acquire and maintain (or lose) influence within a social system.” Professor Marbach adds, “Answering these questions is crucial in order to understand social media and possible regulation of social media.” 

The sociologist Talcott Parsons put forward the hypothesis that influence in a social system operates similarly to money in a market economy. While money coordinates action within the economic domain (e.g., production, trade, consumption), influence coordinates action within the social domain (e.g., attention, association, support, recognition). This hypothesis, which has sparked critical discussion among sociological theorists, has never been verified. 

Building on a mathematical model of influence that was developed in the group of Professor Marbach, the researchers use a combination of data science and computational methods to investigate, and potentially validate, Parson’s hypothesis.  Additionally, they aim to examine the distribution of influence within online communities and understand how influence changes over time. 

By harnessing the power of data science and computational methods, the researchers are investigating whether influence in social media can be understood as a general system of interchange, similar to money and power. “Our research utilizes a data science approach to study how influence is acquired, accumulated, and circulated within social media. We develop new methodologies to characterize the demand and supply of social media content, study the flow of attention and content within a community, and analyze the causes of changes in individual community members’ influence,” says Marbach. 

The research holds significant implications for the broader discourse on social influence, social media and inequality. “Our findings have the potential to inform the development of safeguards against the manipulation and abuse of influence on social media platforms. By gaining a deeper understanding of how influence operates within social systems, it becomes possible to promote responsible design of social media platforms and social media algorithms such as content recommendation,” says Marbach. 

Professor Marbach shares that this project is a new interdisciplinary collaboration between sociology and computer science. “We use data science in order to contribute to foundational theory in sociology.” He adds, “The DSI Catalyst Grant was the key to initiating this collaboration and providing the resources to carry out the research. Our goal is to use this project as a catalyst to build a long-term research collaboration between researchers in foundational sociological theory and computer science at U of T.” 

Their groundbreaking work serves as a timely reminder of the profound impact of social media on society, urging further exploration of the dynamics of influence in our increasingly interconnected digital world. 

 

Unlocking the Power of Data: DSI and UNICEF Collaborate to Advance Data Science Research and Training

by Sara Elhawash

In a significant collaboration aimed at advancing data science research and training, the Data Sciences Institute (DSI) at the University of Toronto is partnering with the United Nations Children’s Fund (UNICEF)’s Frontier Data and Tech team to leverage data for addressing complex challenges concerning children. This collaborative effort aligns with DSI’s strategic goal of fostering knowledge mobilization to promote the greater public good. This partnership represents a significant milestone in the journey towards leveraging the immense potential of data for effecting positive social change.  

DSI will work with UNICEF to strengthen UNICEF’s knowledge and capacities to use data science and methodologies to innovate learning, through joint research and training. This collaboration will involve joint research and training initiatives.  DSI collaborates with organizations committed to supporting world-class researchers, educators, and trainees who are at the forefront of advancing data sciences.  

“This partnership is a significant milestone for our Frontier Data Network, a global community of practice that leverages data science to positively impact the lives of children worldwide. Together, we are poised to unlock new insights, drive evidence-based decision-making, and pave the way to a brighter future for children everywhere,” says Yves Jaques, Chief of the Frontier Data and Technology Unit, UNICEF. 

As a first collaboration, Dr. Manuel Garcia-Herranz, Data Principal Researcher and Karen Avanesyan, Statistics and Monitoring education specialist at UNICEF’s Division of Data, Analytics, Planning and Monitoring (DAPM) at UNICEF, are collaborating with Professor Zahra Shakeri, Dalla Lana School of Public Health on a 2023 Summer Undergraduate Data Science (SUDS) Opportunities Program. The SUDS project, Understanding Predictive Models that can be Used to Prevent School Dropouts aims to revolutionize early warning systems in education through the application of cutting-edge AI technology. The SUDS opportunity allows a SUDS Scholar, Ziqi Shu, to gain practical experience by working on fictional yet reality-based case studies focusing on social problems affecting children. By identifying at-risk students and schools with high dropout rates, UNICEF aims to support countries with a strong Education Management Information System (EMIS) and household survey data.  

The SUDS Scholar project aims to use and generate new sources of real-time information to better inform decision makers in the development and humanitarian ecosystem. UNICEF’s Frontier Data Tech Network is a global initiative to explore and use frontier data technologies to address the most complex challenges for children in an ethical way.  

“Our aim is to develop a pilot tool that provides a comprehensive representation of the machine learning-based school dropout prediction landscape, bridging the knowledge gap in this area. This tool will utilize innovative data analysis and visualization techniques, benefiting researchers, practitioners, and other stakeholders in exploring the factors influencing school dropout among children. The long-term goal of this project is to harness the power of data science and create an adaptable, publicly accessible system that could support countries in addressing the critical issue of school dropouts. By leveraging AI technology and early warning systems, our aim is to identify and support at-risk students and schools, ultimately safeguarding every child’s right to education,” says Zahra Shakeri, Director of HIVE Lab, Institute of Health Policy, Management, and Evaluation, Dalla Lana School of Public Health. 

The UNICEF-DSI partnership paves the way for further research and training collaborations. There will be opportunities to connect with the DSI community during its DSI Research Day on September 27, 2023. Dr. Manuel Garcia-Herranz will deliver the keynote address and Yves Jaques will participate in a panel discussion on Developing an Effective Data Science Workforce. The discussion will focus on equipping graduates with essential Data Science skills required in today’s diverse fields and industries. The DSI Research Day showcases the work of the DSI community, fostering connections and engagement among academia, industry, and government stakeholders. 

“By combining our community’s expertise in data science with UNICEF’s commitment to driving results for children globally, we have the opportunity to make a profound impact. Through our joint efforts, we aim to strengthen UNICEF’s knowledge and capacities in utilizing data science methodologies, fostering innovation in learning and ultimately creating a brighter future for children worldwide,” says Lisa Strug, Director, Data Sciences Institute.  

DSI-Supported Study Demonstrates Reproducibility and Success in Predicting Cancer Treatment Response

by Sara Elhawash

Can reproducibility pave the way for groundbreaking advancements in the field of precision oncology and transform cancer treatment decisions? A resounding answer emerges from an exciting reproducibility project born out of the DSI Student-Led Reproducibility Challenge. This project, led by DSI members and Professor Benjamin Haibe-Kains (University Health Network and Medical Biophysics, Temerty Faculty of Medicine, University of Toronto) and Bo Wang (Department of Laboratory Medicine & Pathobiology, University of Toronto) and a team of U of T student researchers including Emily So and Grace Fengqing Yu, is currently making significant strides in advancing research within the field. 

Reproducibility and Reusability in Action 

In a recent Reusability report Evaluating reproducibility and reusability of a fine-tuned model to predict drug response in cancer patient samples published in Nature Machine Intelligence, the team successfully reproduced and applied a new Artificial Intelligence (AI) method, called Transfer of Cellular Response Prediction (TCRP), originally published by the Ideker group at the University of California San Diego, in Nature Cancer in 2021, to clinical trial data. 

The project originated from the DSI Thematic Program in Reproducibility, which aims to raise awareness of reproducibility, including a Student-Led Reproducibility Challenge in 2022. Given the increasing utilization of large-scale and intricate datasets and computational methods across various disciplines, the challenge of reproducibility has come to the forefront. Establishing reproducibility standards for research has emerged as a foundational aspect of data science. Therefore, it becomes essential to clearly articulate and widely integrate standards for open, reproducible research with big data. This integration is crucial not only within the University of Toronto but also on an international scale. 

Emily So, a master’s student and co-researcher, reflects on the importance of reproducibility and open science principles in the context of groundbreaking methods like AI and machine learning, In agreement with FAIR (Findability, Accessibility, Interoperatibility and Reusability) principles well established in research, usually new articles will come with data and computer code available for the scientific community. To fully understand the impact of new innovations and uncover their applications to new scientific problems, it is imperative that available resources are fully reproducible and can produce expected results easily. 

The DSI Student-Led Reproducibility Challenge attracted researchers and trainees dedicated to exploring reproducibility. DSI support was instrumental in organizing the Challenge where students showcased their efforts in reproducing key papers in the field of engineering, social and health sciences. Emily So and Grace Yu were part of one of these teams. Their results were so exciting that we decided to push the analysis further and publish it as a Reusability Reports in Nature Machine Intelligence, says Benjamin Haibe-Kains. 

We were able to demonstrate the gaps that exist in open science for computational biology. This outreach made available by the DSI has allowed our group to project our experience to the scientific community as well as provide further rationale for our subsequent documentation about our project, says Emily So. 

The team’s work aims to address two key objectives: confirming the performance of the TCRP model in its published context and expanding its application to a larger compendium of preclinical pharmacogenomic and clinical trial data.  

Through extensive evaluation, the researchers found that the TCRP method surpassed established statistical and machine learning approaches in predicting drug response in novel clinical contexts. This remarkable finding highlights the superiority of TCRP in both preclinical and clinical settings. 

Our results highlight the immense potential of the TCRP method and its ability to outperform existing approaches. This opens new avenues for optimizing clinical trial design and improving patient outcomes, says Haibe-Kains. 

In the field of precision oncology, ensuring the reliability and generalizability of new techniques in clinical settings is crucial. Reproducibility studies play a vital role in verifying claims made by predictive models, while reusability studies assess their applicability in diverse contexts. The publication of the Reusability Report in Nature Machine Intelligence signifies a significant step forward in promoting reproducibility and reusability in the field. 

Our work emphasizes the importance of reproducibility and reusability, which are essential for advancing precision oncology. By documenting new data contexts and exploring the model’s reusability, we can drive further progress in tailored cancer treatments, says Haibe-Kains 

Reproducing the results of this method was no easy task, but it provided a glimpse into the power and impact it could have. It was an exciting endeavor to explore the possibilities of this machine learning approach, shares Emily So, masters student and co-researcher. 

Collaboration, Transparency, and Future Applications 

The impact of this work extends beyond the research community. The study’s reliance on open science principles, where authors share their code and data, highlights the importance of collaboration and transparency. By making their materials publicly available, the researchers contribute to education, enabling the training of future health data scientists, bioinformaticians and computational biologists. 

Emily So emphasizes the potential future applications of their models, This evaluation is timely because there is a potential future application of these models in assisting clinicians in the treatment decision process. Setting a reproducibility standard is crucial to properly evaluate machine learning approaches suitable for preclinical and clinical settings, ultimately optimizing the course of action for patients. 

With the successful reproduction of the TCRP model and its outperformance of existing approaches, the potential for optimized clinical trial design and improved patient outcomes becomes a tangible reality. 

Advancing the integration of data sciences in the design and development of public policies – Launching the Policy Lab

by Sara Elhawash

How can we advance data science integration in policy settings and build programming and training to enable new capacity in advancing data science in the public service?   

To address this challenge, the Data Sciences Institute (DSI) and the Dalla Lana School of Public Health (DLPSH) are launching the Policy Lab, to advance the integration of data sciences in the design and development of public policies, creating a healthier and more just society. 

The Policy Lab will engage in strategic partnerships with ministries, agencies, and various policy-oriented groups to strategize on the most effective ways to build capacity and demand across the public sector for data sciences insights. By collaborating with these groups, the Policy Lab intends to cultivate a vibrant community of data scientists and data science users, leading to increased utilization of data sciences across diverse policy domains. 

One of the key features of the Policy Lab is its hosting of visiting Researchers-in-Residence from the public sector, who will focus on building and advancing data science within the health system. The goal is to advance data science integration in policy settings and build programming and training to enable new capacity in advancing data science in the public service that effectively meets the needs and realities of working with data in this type of setting.

By collaborating with the Data Sciences Institute and the Dalla Lana School of Public Health, we have a unique opportunity to leverage data-driven insights in designing and implementing evidence-based policies that positively impact the health and well-being of Ontarians,” says Dr. Michael Hillmer, Assistant Deputy Minister of Digital and Analytics Strategy, Ontario Ministry of Health/Ministry of Long-Term Care. 

The initial focus of the Policy Lab will be on public health and health systems, with insights generated from this work serving as a foundation for future projects in data sciences and public policy across various sectors. To foster collaboration and knowledge exchange, the Policy Lab will define compelling data science use cases motivated by real examples from the public sector and engage policymakers and stakeholders from diverse backgrounds to advance critical dialogues on data science for policy. 

Laura Rosella, Associate Director of Education & Training, DSI and Associate Professor, DLSPH, expressed her enthusiasm for the launch of the Policy Lab, stating: “Through the Policy Lab, we have an unprecedented opportunity to shape the future of public policy and transform the way we approach complex societal challenges. We are excited to work with our partners to advance data science integration and empower the public service with the necessary tools and training to use data to support decision-making that improves population health.”

The launch of the Policy Lab marks an important milestone in the convergence of data sciences and public policy. As data-driven decision-making becomes increasingly crucial, the Policy Lab paves the way for transformative policy interventions that prioritize health and equity.