Data Sciences Institute (DSI)

Data Sciences Institute Explores the Impact of Generative AI on Diverse Communities

by Sara Elhawash

As generative AI like ChatGPT and Large Language Models become increasingly integrated into our daily lives, how can we strike a balance between harnessing their potential for innovation and ensuring responsible and ethical usage? 

Funded through the Emergent Data Sciences Program competition, University of Toronto Professors Syed Ishtiaque Ahmed (Department of Computer Science, Faculty of Arts & Science), Shurui Zhou (Edward Department of Electrical & Computer Engineering, Faculty of Applied Science & Engineering), Lisa Austin (Faculty of Law), Shion Guha and Anastasia Kuzminykh (Faculty of Information), are co-leading Toward a Fair and Inclusive Future of Work with ChatGPT. 

The program focuses on the responsible development and ethical implementation of generative AI. It aims to shed light on the societal implications of using ChatGPT, with a particular emphasis on its impact on diverse communities. By gaining a deeper understanding of the social and ethical aspects of generative AI, the program seeks to empower researchers and users to make informed decisions and employ responsible practices when utilizing these technologies. 

It will feature a series of talks, discussions, and participatory design sessions involving individuals from various backgrounds, including students, instructors, practitioners, academics, and artists.  

“We recognize the profound influence of generative AI technologies on diverse communities. Our program seeks to bridge the gap in evaluation frameworks and provide a platform for diverse voices to express their experiences and insights. By fostering inclusivity and promoting ethical considerations, we aim to empower users and researchers to navigate the responsible use of generative AI with confidence,” says professor Shurui Zhou 

Activities of the program include events that will provide a platform for diverse perspectives and experiences with ChatGPT, workshops and public-facing meetups to foster inclusivity and encourage open dialogue, with a focus on amplifying the voices of minority communities. Academic workshops will be co-located with major conferences, such as the Conference on Human Factors in Computing Systems (CHI), Computer-Supported Cooperative Work & Social Computing (CSCW), and Neural Information Processing Systems (NeurIPS), to disseminate research findings and engage with a wider audience. To ensure ongoing interdisciplinary discussions and knowledge sharing, the researchers will create a repository of videos, talks, and posts, hosted on the Data Sciences Institute’s website, related to the societal implications of generative AI. 

A course syllabus module to educate students about the ethical considerations surrounding generative AI will also be developed. One of the unique aspects of the program involves students at the University of Toronto engaging in year-long projects that incorporate the use of ChatGPT within their workflows. This practical experience will enable students to share their findings and lessons learned through a poster presentation, contributing to collaboration and knowledge exchange. 

As the program kick starts its activities, Professor Shurui is conducting an interview study to understand how large language models (LLMs), such as ChatGPT, might affect the practices of scientists and research software engineers to collaborate and develop software. To participate, visit our website here. 

Recipients of the DSI Emergent Data Sciences Program competition are funded for their programs which foster the development of innovative data science methodologies, deep connections with computation and applied disciplines, new training programs, collaboration, knowledge mobilization, and impact beyond the academy. 

“This Emerging Data Science program is driven by a shared mission to assess the societal implications of generative AI. Together, we aim to create a robust framework that promotes trust, accountability, and transparency in the AI ecosystem, ensuring that these technologies benefit all members of society,” says David Lie, Associate Director of Thematic Programming & Data Access at the Data Sciences Institute. 

Data Sciences Institute brings together Data science and Causal inference for better policy recommendations

by Sara Elhawash

In an era where data-driven insights fuel innovation and inform decisions, policymakers and stakeholders increasingly seek guidance in research from various areas such as criminal justice, health, and labour law. However, the wealth of data gathered to understand human behaviour can lead to misguided recommendations if not approached appropriately during the analysis phase. This challenge has inspired the question: How can we elevate the quality of data analysis to better inform decision-making? 

Funded through the Emergent Data Sciences Program competition, University of Toronto Professors Linbo Wang (Department of Statistical Sciences, Faculty of Arts & Science, University of Toronto Scarborough), Gustavo J. Bobonis (Department of Economics, Faculty of Arts & Science), Ismael Mourifié (Department of Economics, Faculty of Arts & Science), and Raji Jayaraman (Department of Economics, Faculty of Arts & Science), are co-leading Bringing Together Data Science and Causal Inference for Better Policy Recommendations.  

The program promotes cross-disciplinary exchange and collaboration among experts in data science, causal inference, and applied research. Its overarching mission is to influence the landscape of data sciences by advancing the state of the art in causal inference and its applications to real-world policy problems. The program aims to tackle key challenges in data science, including algorithmic fairness, bias from confounding variables, and the need for more robust statistical inference methods. 

The program aims to achieve this by creating an inclusive forum for discussions across diverse disciplines. Here, researchers will get to share their research questions, data limitations, and challenges related to causal methods. Experts in data sciences and causality will introduce new and existing methods, encouraging the pursuit of research goals. Applied researchers will present key limitations informed by practice, jointly addressing the barriers to using current methods in solving policy problems of our time. 

Featured activities include three workshops and a lecture series on causality. In these workshops, data scientists, causal inference experts, and empirical researchers collaborating with policymakers convene to present their work. The lecture series focuses on sharing the state of literature with a non-specialized audience. The first workshop, Forging a Path: Casual Inference and Data Science for Improved Policy, is scheduled for November 10-11.  

“Our collaborative effort will enable us to address pressing policy questions with a newfound depth, ensuring that data-driven decisions are rooted in robust causal understanding,” say Professors Ismael Mourifié and Linbo Wang. “We look forward to working alongside fellow experts to drive meaningful impact in both academia and policymaking.” 

Recipients of the DSI Emergent Data Sciences Program competition are funded for their programs, which foster the development of innovative data science methodologies, deep connections with computation and applied disciplines, new training programs, collaboration, knowledge mobilization, and impact beyond academia. 

“This Emerging Data Science program exemplifies DSI’s commitment to fostering collaboration and innovation in data science research. It reflects our dedication to addressing complex challenges at the intersection of data analysis and real-world policymaking. We are confident that this initiative will have an impact,” says David Lie, Associate Director of Thematic Programming & Data Access, Data Sciences Institute. 

DSI Receives Grant from Sloan Foundation to Shape the Future of Virtual Reality

Testing out an Augmented Reality headset at the 2023 DSI Data and the Metaverse workshop.

by Sara Elhawash

As the world of virtual reality (VR) continues to expand, untapped possibilities await exploration, bringing with them numerous unanswered questions. The Data Sciences Institute (DSI) has been awarded a grant from the Alfred P. Sloan Foundation to delve into the realm of VR technology and its profound implications for human interaction and communication. This award builds upon the foundation laid by the DSI Data and the Metaverse workshop held earlier this year at the University of Toronto Mississauga (UTM) spearheaded by Prof. Bree McEwan, DSI Associate Director UTM. 

This grant serves as a beacon of support and recognition for DSI’s commitment to pushing the boundaries of knowledge and innovation in the data sciences. This award stands as a testament to its potential to drive groundbreaking research and foster collaboration across disciplines to support a thought-provoking exploration into the domain of social interactions within mediated environments, encompassing the multifaceted world of VR, augmented reality (AR), extended reality (XR), and mixed realities (MR). 

Co-led by Profs. McEwan and Sun Joo (Grace) Ahn, Grady College of Journalism and Mass Communication at the University of Georgia, the grant will provide critical support for the Questioning Reality: Explorations of Virtual Reality and Our Social Future conference in 2024 that will bring together leading scholars, industry professionals, and enthusiasts to collaboratively shape the future research landscape of VR.   

Moreover, the grant will serve as a catalyst for generating a Debates in Digital Media edition focused on Virtual Reality and its associated data and will fund five mini-grant research projects. These projects are designed to propel the field forward through innovative research endeavors. 

“We’re entering a critical phase of discovery that will not only enhance our understanding of technology’s impact but also drive the responsible development of immersive experiences,” says Professor Ahn.  

There is a lot more flexibility in the metaverse than in the real world, Ahn continues. “In the metaverse, people can change their identity or their perspectives. They can really play with that dimension of both time and space by traveling to the past or to the future. We are interested in generating a body of knowledge that is very new examining how people use emerging technology,” says Professor Ahn. 

Professor Bree McEwan emphasizes, “We’re laying the groundwork for the future of virtual reality research, where the synergy between scholars and industry experts shapes the very essence of the future of immersive communication. With DSI’s leadership and the Sloan Foundation’s support, we’re creating a platform for researchers, designers, and industry experts to come together and drive the discourse on social interaction within virtual reality.”  

This grant intends to ignite fresh perspectives and innovative ideas by initiating conversations right at the beginning of the research cycle. In doing so, DSI aims to pave the way for a dynamic exchange of insights that will decisively shape the trajectory of VR and augmented reality (AR). 

The timing of this research is particularly significant, as the growing presence of VR and AR in society raises practical and ethical challenges. Experts in social interaction research and human communication will collaborate to explore the seamless integration of these technologies into our lives, while considering privacy, identity formation, social presence, and safety. 

The Sloan Foundation, a not-for-profit, mission-driven grantmaking institution, is instrumental in making this possible “VR and AR technologies have the potential to reshape how we interact as individuals, groups, and as a society,” says Joshua M. Greenberg, program director at Sloan. “What’s needed now is to further develop a research agenda and to ensure that academic researchers have the technical access to study these changes.  We’re excited to support DSI’s efforts to help bring the research community together on that shared journey.” 

DSI seeks to fuel pioneering research endeavors that delve into the intricate dynamics of human interaction within mediated immersive environments, fostering collaboration across disciplines and between academia and industry and organization.

SUDS Scholars Showcase their Newly Acquired Data Science Skills

by Sara Elhawash

After months of dedicated data science learning and exploration, the Data Sciences Institute’s Summer Undergraduate Data Science (SUDS) Opportunities Program reached its highly anticipated finale—a full day of showcase where 37 exceptional undergraduate students unveiled their research projects. The atmosphere buzzed with a blend of excitement, eager anticipation, and unwavering enthusiasm as the culmination of their efforts took center stage. 

The SUDS program, known for its cross-disciplinary approach, provided an enriching summer experience for students to apply data science techniques in diverse fields, ranging from humanities and life sciences to engineering and public health. Supervised by Data Sciences Institute (DSI) researchers, the scholars had the chance to explore real-world applications of data science methods and tools. Their research included a diverse array of research topics, spanning from genetic covariance analysis in fruit flies, investigations into the linguistic dynamics of political discourse, to school dropout prediction using machine learning, showcasing the program’s broad spectrum of data science applications. 

In addition to their research, SUDS Scholars are provided with a full set of data sciences networking, academic, and professional development opportunities. They delved into the Data Science Bootcamp, gaining valuable skills in Unix Shell, R, Python, and Machine Learning. This technical foundation supported their research endeavors. In addition to technical training, the SUDS cohort program emphasizes career growth and professional development. The scholars participated in professional development sessions, ranging from scientific abstract writing to effective networking. Notably, they had the opportunity of learning from industry experts like Zia Babar, PhD, Director of Cloud Engineering at PwC Canada, who shared insights on Accelerating Machine Learning Development with GitHub Copilot. 

DSI’s mission is to nurture collaboration, research, and excellence in data science. Professor Laura Rosella, DSI Associate Director of Education and Training, described the showcase as the pinnacle of the scholars’ journey. “With their findings presented to their peers and supervisors, the room buzzed with excitement and celebration, encapsulating the remarkable culmination of their hard work. Without a doubt, the DSI SUDS program has launched these aspiring scholars into promising careers in Data Science.” 

“I enjoyed seeing the growth my student has achieved in just over 3 months. She is motivated to continuously work and expand her research skills and I am delighted to learn that she is planning to apply for graduate school,” says one supervisor, anonymously. 

“As is often the case when working with students outside of astrophysics, I appreciated the fresh perspective on our research area from a student arriving without preconceptions about the field. They were eager to learn and highly professional in their conduct, making the research project a breeze,” says another supervisor, anonymously. 

The SUDS showcase offered a platform for the scholars to share their final findings and also allowed students to recognize their peers’ exceptional posters and presentations. Awards were given out to Shuyu Van Kerkwijk, Anton Sugolov and Nicholas Taylor for their exceptional posters, while Sanchaai Mathiyarasan and Akil Huang were celebrated for their outstanding presentations. 

SUDS Scholars praise the research opportunities and SUDS cohort program 

Nicholas Taylor, who was voted for a top poster for his project, Quantum Machine Learning for Regression Tasks in Computational Chemistry, tackled the challenge of balancing accuracy and computational cost in chemical property predictions. He explains, “Such information holds significant implications for drug discovery and material science.” Reflecting on his experience, Nicholas says, “From not knowing the first thing about academic research to building quantum machine learning methods, I’ve learned a lot that I can take with me to future applications.” 

Lu Huang, who has been diligently dedicated to her research project, Bayesian Analysis of the Genetic Covariance Between Mating Success and Fitness in Drosophila serrata, aims to uncover the genetic interrelation between mating success and fitness in fruit flies. The goal is to ascertain whether mating success can serve as a viable indicator of overall fitness in the context of evolutionary studies. Existing research suggests that female Drosophila serrata possess the capacity to assess and choose mates based on olfactory and chemical cues. 

Reflecting on the program, Lu Huang says, “The weekly seminars provided a valuable opportunity for people from different backgrounds to engage in discussions about the daily tasks of data scientists, enhance networking skills, and strategize for resume improvement. It was a great learning opportunity.” 

One research project emerging from the SUDS program focused on School Dropout Prediction Using Machine Learning: An Interactive Presentation of the Evolving Landscape. Ziqi Shu collaborated with Professor Zahra Shakeri from the Dalla Lana School of Public Health at U of T, alongside Dr. Manuel Garcia-Herranz, a Data Principal Researcher at UNICEF, and Karen Avanesyan, a Statistics and Monitoring Education Specialist from the Division of Data, Analytics, Planning and Monitoring at UNICEF. Their collective effort aimed to combat school dropout rates through the utilization of data science. Notably, this venture marked one of the program’s initial steps toward external partnerships. Encouraged by this growing network, the SUDS program envisions an expansion of such projects and looks forward to fostering collaborations with diverse organizations. 

“Through this program, I understood how AI can address vital educational challenges by working on real-world cases. It gave me a sense of purpose, knowing my work could contribute to improving education access worldwide,” says Ziqi Shu. 

Fostering Collaborative Problem-Solving Through Data Science Over Coffee

by Sara Elhawash

The Data Sciences Institute and the Department of Statistical Sciences jointly spearhead the Data Sciences Café, aimed at addressing research challenges through statistical advice. The weekly Data Sciences Café, which brings together researchers from diverse backgrounds, is an exceptional platform for non-statistical researchers seeking to harness statistical techniques to enhance their work. 

Since its start in 2022, the Data Sciences Café has garnered attention, attracting over 40 students and faculty members from the University of Toronto. Every week, participants convene for coffee, engaging in productive discussions and presentations, receiving valuable statistical advice tailored to their specific datasets. The Café’s structure allows attendees to tap into the expertise of a team comprising faculty members, graduate students, and senior statistical consultant students, all working collaboratively to assist in refining research methodologies and analyzing data effectively. 

The sessions begin with researchers presenting their projects and the challenges they face, articulating their needs for statistical guidance. Following these presentations, a 30-minute consultation session is conducted by Dr. Samantha-Jo, Assistant Professor, Teaching Stream, Department of Statistical Sciences and the driving force behind the Café’s success. 

Diego Proano Falconi, a second-year PhD student, Faculty of Dentistry, shares his experience of how the Café benefited his research. Working on a project centered on evaluating the financial hardship of dental out-of-pocket expenses in Canada, Diego needed to analyze complex datasets. He shares, “The Data Sciences Café was an invaluable experience that allowed me to articulate my thesis project and gain insights and methodological clarity from my fellow statistician colleagues.”  

Diego adds, “I’m very thankful for the friendly and academically enriching environment. Following my presentation, engaged students offered constructive feedback, shedding light on areas that could strengthen my analysis. This collaborative atmosphere not only improved my work but also opened doors to future collaborations. Whenever I encountered statistical challenges, I knew I could rely on this network for guidance.” 

“This café has emerged as a remarkable platform, supporting both faculty and students in leveraging data science to overcome research obstacles. The utilization of statistical techniques to enhance data analysis has been instrumental in achieving more robust results. The Café’s ability to unite researchers from various disciplines has further fostered collaboration, sparking innovative solutions through the lens of statistics,” says Dr. Samantha-Jo   

Amanda Ng, Statistics and Mathematics student, Department of Statistical Sciences, reflects on her involvement as a statistical student consultant, “Participating in the weekly meetings, where graduate students presented their research, enabled me to cultivate skills in understanding multidisciplinary datasets. I gained the expertise to recommend appropriate statistical methodologies and identify potential biases within current research study methods. The Data Sciences Café creates an informal environment for students to communicate with researchers from different academic backgrounds, facilitating connections that extend beyond the sessions. In fact, I secured my first research assistantship through a connection made at the Café.” 

This sentiment is echoed by Sofia Panasiuk, a PhD student who served as a participant during the Café sessions. Sofia explains, “My primary aim was to find a platform where I could share and receive feedback on my initial research ideas, especially from data science students who were more experienced with the technique than me. The Café surpassed my expectations, as the students offered insightful comments, highlighted potential limitations, and proposed solutions to the challenges I was encountering. It was here that Amanda and I connected, leading to a valuable collaboration.” 

The Café has also proven to be a catalyst for meaningful collaborations. Amanda Ng’s partnership with Sofia led to a notable accomplishment — presenting the project’s outcomes at the 11th Annual Canadian Statistics Student Conference and securing first place for an undergraduate presentation. 

“The insights gained from the Café led me towards methodological papers that would have taken me ages to find. Someone had the reference I needed immediately and that saved me a significant amount of time streamlining my research process,” says Sofia. 

The Data Sciences Institute– a multi-divisional, tri-campus, and multidisciplinary hub for data science activities at U of T — plays a pivotal role in driving these collaborative efforts. Through initiatives like the Data Sciences Café, DSI fosters research connections, innovation, and enhances the teaching and learning experience in data sciences.

As the academic year resumes, the Café is set to recommence its weekly gatherings on September 28. For those intrigued by the prospect of collaborative problem-solving and statistical exploration, sign-ups for consultation are now open here. Join the Data Sciences Café to be part of this dynamic community at the forefront of leveraging statistics to conquer research challenges, all while enjoying a cup of coffee.