Data Sciences Institute (DSI)

How Inclusive is Generative AI? DSI’s Emerging Data Science Program ChatGPT Workshop Sparks Dialogue

by Sara Elhawash

In the expansive realm of generative AI, where innovation thrives, researchers examine inherent biases within technologies. 

This issue was a focus of the Fairness – ChatGPT Workshop held on January 26 and 27. Professionals, researchers, and students met to explore the responsible development and ethical implementation and usage of generative AI, focusing particularly on the impact of ChatGPT on diverse communities. 

“The people who really benefit from AI are those who are already privileged,” said Professor Munmun De Choudhury of the Georgia Institute of Technology, whose keynote address laid the foundation for discussions on how inherent biases contribute to some of the challenges and ethical considerations surrounding generative AI. 

The Data Science Institute funds the Toward a Fair and Inclusive Future of Work with ChatGPT program as part of its Emerging Data Science Program. The initiative is led by University of Toronto Professors Syed Ishtiaque Ahmed (Department of Computer Science, Faculty of Arts & Science), Shurui Zhou (Edward S. Rogers Department of Electrical & Computer Engineering, Faculty of Applied Science & Engineering), Shion Guha and Anastasia Kuzminykh (Faculty of Information) and Lisa Austin (Faculty of Law).  

“It is our mission to unravel the complexities of generative AI’s impact on marginalized communities,” says Professor Zhou. “In the realm of responsible technology, our workshop sought to bridge the gap between innovation and inclusivity. Together, we’ve set the stage for a future where AI understands the importance of fairness and ethical considerations in its applications.”  

Day One of the workshop featured presentations from researchers and industry leaders who provided participants with insights and tools to comprehend ChatGPT and its impact on diverse communities. The focus was on understanding the capabilities, limitations and ethical considerations of AI. As an example, “ChatGPT provides the most accurate results only when using the English language setting,” said Ping Hu, a PhD student at the Ontario Institute for Studies in Education . “If you use ChatGPT from different regions, you may get different results that are not reliable.” 

Professor Matt Ratto, Faculty of Information, questioned what is considered ‘human-like’ and how these concepts impact AI design, while Professor Dakuo Wang of Northeastern University shifted the focus to Human-Centered AI (HCAI), exploring the paradigm of human-AI collaboration. 

Gender disparities in rankers based on Large Language Models (LLMs) were addressed by Professor Ebrahim Bagheri of Toronto Metropolitan University, who emphasized the need for automated ways to judge datasets. Professor Diyi Yang of Stanford University proposed a human-AI collaboration model to address conflicts and improve communication. 

“Can we think about tools that will allow people to personalize the process of building the models that are more accessible?” added Professor Swati Mishra of McMaster University. 

On the second day of the workshop, a panel discussion on Integrating LLM into Education, moderated by Professor Zhou, brought together industry experts and researchers to explore the multifaceted role of LLMs in education and featured two panels.  

The first, led by Professor Guha of the Faculty of Information, explored Responsive LLM Development. The second, moderated by Prof. Zhou, focused on the integration of LLM into education. These panel discussions included industry valuable insights from Dr. Alex Williams from Amazon. 

“If you have a hard time teaching a person something, then you will have a hard time teaching it to a machine,” emphasized Dr. Williams 

A question was proposed to the attendees: “In the process of creating systems, should we let conceptual ideas shape their development, or does the actual development of these systems shape and refine the nuances of the concepts?” 

A working group report integrated collaborative efforts and key insights generated during the workshop. The event wrapped up with a closing keynote delivered by Professor Edith Law from the University of Waterloo. She explored the challenges of aligning AI technologies with human values, highlighting the nuanced nature of human values in practical contexts. 

The Fairness – ChatGPT Workshop served as a platform for dialogue and laid the groundwork for a community committed to responsible AI development, with the goal of promoting trust, accountability and transparency in the evolving landscape of generative AI. This workshop is one of many activities that will come out of this program, including a speaker series and more outlined here. 

Mitacs funding to facilitate connection between industry and DSI Summer Undergraduate Data Science students

by Sara Elhawash

In today’s data-driven world, organizations face the challenge of effectively utilizing data to advance their work. Through new research funding awarded to the Data Sciences Institute (DSI) by Mitacs, DSI will connect industry with the next generation of data science leaders for Data-driven Decisions & Discovery: Innovation for Transformative Impact. The umbrella funding for 30 research internships is a reflection of the commitment by both DSI and Mitacs to equip industry and organizations, researchers and students with opportunities and skills needed to harness the power of data in real world applications.  

Mitacs, a national non-profit research organization that fosters growth and innovation, will enable academic and industry research collaborations through research internship opportunities for students in DSI’s Summer Undergraduate Data Science (SUDS) program. Mitacs funds matches industry contributions to provide stipends for students who will also participate in the SUDS data science bootcamp and professional development programming. This is an opportunity for organizations to access data science students for research internships at a rate subsidized by Mitacs.   

“DSI values the collaboration with industry and organizations,” says Professor Laura Rosella, DSI Associate Director of Education and Training. “These partnerships enrich the academic experience for students and provide our partners with access to cutting-edge research and emerging talent.”  

The four-month SUDS program provides students with data science training throughout the summer. Students engaged in industry projects benefit from participation in SUDS programming, focusing on career growth and professional development. Scholars actively take part in sessions that cover topics from scientific abstract writing to effective networking, as well as presenting their work at the SUDS Showcase in August.  

“The Institute’s collaborative environment empowers partners to make informed decisions and implement data science solutions in their operations and presents them with the opportunity to tap into a pool of skilled interns who contribute fresh perspectives, innovative ideas, and immediate value to ongoing projects,” says Sumaiya Hossain, DSI’s Partnership & Business Development Officer. “We support organizations for the Mitacs application process, and our MITACS umbrella award allows for a quick timeline from application to funding notifications.” 

As DSI continues to build bridges between academia and industry, the Institute is shaping the future of data science and contributing to the broader goal of advancing data science for societal benefit. “Our goal,” says Sumaiya “is to ensure that data science work has a real-world impact. By connecting with external partners, we facilitate a two-way exchange of knowledge and expertise.”  

Together, industry and scholars can turn data into decisions, ideas into innovations, and dreams into reality. DSI offers an exhilarating vision of a brighter, data-driven future, where collaboration, innovation and talent development converge. 

Learn more about the DSI Mitacs Accelerate funding here. 

DSI’s research software team transforms access to healthcare quality reports with GEMINI for Ontario physicians and hospitals

by Sara Elhawash

In the nuanced landscape of patient care, unlocking valuable insights is dependent on navigating the vast realms of data. However, what if the data is neither easily accessible nor user-friendly? That is where the expertise of the Data Sciences Institute’s (DSI) software development support team becomes essential. The DSI team developed a new user-friendly web portal to seamlessly and securely distribute individualized healthcare quality reports for the General Medicine Quality Improvement Network (GeMQIN), a program of Ontario Health. These reports are developed by the GEMINI team, based at St. Michael’s Hospital (a site of Unity Health Toronto).

Recognized for its deployment of a data and analytics platform, GEMINI harnesses information from hospital computer records, playing a vital role in generating insights to improve healthcare delivery. Holding data from over 30 Ontario hospitals, the project is Canada’s largest hospital data sharing network for research and analytics.

DSI’s software development program provides faculty and scientists access to skilled developers who refine existing software tools to enhance usability, robustness and functionality.

“The DSI software support provided web development capacity and skillset that greatly expedited our timeline in achieving this major project deliverable. We are excited to launch this new GEMINI portal in the coming months for GeMQIN and look forward to a much more streamlined process of delivering healthcare quality reports,” says Denise Mak, Director of Data Science & Innovation, GEMINI.

A screenshot capture of the GEMINI portal homepage.

The GEMINI portal, currently in its final stage of user testing, is streamlining distribution to 700+ report recipients, avoiding problems such as lost emails and spam filters. The portal aims to provide authorized users with easy and secure access to their confidential and personalized quality reports while reducing the report distribution workload for the GEMINI team.

The work completed by DSI sets the groundwork for many future expansion plans that include supporting other quality reporting programs, building custom dashboards for machine learning projects, and adding business intelligence tools to explore GEMINI data for research projects. .

“We’ve helped streamline and automate their workflow. The portal allows the GEMINI team to easily manage their users, upload reports, and access administrative controls, creating a more efficient and user-friendly experience,” says Wisam Al Abed, Senior Software Developer, DSI.

The successful collaborative efforts between DSI and GEMINI demonstrate that data isn’t just a tool — it’s a catalyst that supports researchers in making tangible differences, helping hospitals respond effectively to the dynamic needs of a growing population.

New professional certificates will help learners upskill for careers in data analytics and applied machine learning

New professional certificates in Data Science and Machine Learning Software Foundations, launched by U of T’s Data Sciences Institute and powered by Upskill Canada, will prepare workers for success in these fast-growing fields. Photo: skynesher via Canva, Getty Images.

by Tyler Irving

A new training initiative launched by the University of Toronto’s Data Sciences Institute (DSI) is helping Canada meet its growing need for talent in data science and machine learning

Applications for the DSI Data Science and Machine Learning Software Foundations Certificates opened in October to strong demand. DSI is now gearing up for a second session, scheduled to commence on January 15.

By 2026, digital literacy is projected to be essential for 90 per cent of jobs in Canada

The certificates offer affordable, flexible and rigorous upskilling opportunities, designed for learners with a university, college degree or diploma who have three years or more of work experience. 

Prospective DSI Certificate participants can be employed or actively seeking employment and do not need experience or education in the field of data science. These certificates are accessible to individuals from all backgrounds, and do not require prior affiliation with the University.

The certificates are powered by Upskill Canada, a national initiative powered by Palette Skills and funded by Innovation, Science and Economic Development Canada (ISED). Upskill Canada is designed to meet the talent needs of high-growth sectors while building a more inclusive economy.

Supported by funding from Innovation, Science and Economic Development Canada’s (ISED) Upskilling for Industry Initiative, more than 15,000 Canadian workers will benefit from an innovative approach to skills training. Central to the Upskill Canada initiative is the role of community training providers, who work closely with local and national employers to identify precise suites of skills being sought by industry. Equipping workers with these skills will create new career pathways for Canadians and better position Canadian companies to compete both domestically and internationally.

“What we’re hearing from our partners in industry is that targeted training in key areas can greatly increase the available talent pool in this fast-moving sector,” says Lisa Strug, Academic Director of the Data Sciences Institute and Professor in the Departments of Statistical Sciences and Computer Science (Faculty of Arts & Science) and the Division of Biostatistics (Dalla Lana School of Public Health) at U of T. Strug is also a Senior Scientist at The Hospital for Sick Children.

“We’re pleased to be able to leverage U of T’s leadership in machine learning and data sciences to provide new opportunities for workers in the digital economy.”

“Through the industry advisory group, prospective employers like Thomson Reuters are actively engaging with the Data Sciences Institute as they develop learning opportunities that address the evolving data science and machine learning demands across small, medium, and large-sized enterprises,” says Carter Cousineau, Vice President, Data and Model (AI/ML) Governance and Ethics, Thomson Reuters.

“This collaborative approach helps ensure learners gain the necessary skill sets to pursue new roles, or identify opportunities for advancement, in this swiftly changing landscape.”

Both certificates offer foundational concepts in data science and machine learning knowledge and provide opportunities for practical application through employer case studies. Each certificate also includes sessions dedicated to career advancement, from support for resume writing to networking and interview skills development.

The technical and job readiness programming will be delivered as online modules with in-person and hybrid opportunities for professional networking. Certificate recipients will be well positioned for roles such as data analysts, data managers or applied machine learning analysts.

The courses and job readiness sessions are offered part-time, allowing learners time to balance existing commitments and still accomplish their career goals. Over the course of the next two years, five cohorts of learners are expected to complete the 16-week certificates.  Initially, the training will be offered to learners at a substantially reduced rate of $425 (+HST) per certificate, thanks to the support of Upskill Canada. The DSI has also committed accessibility funding for those with financial need.

“We’re so proud to formally launch Upskill Canada with our inaugural class of workers and training service providers,” says Rhonda Barnet, CEO of Palette Skills, which was chosen by ISED to run the Upskill Canada initiative.

“This is a big first step – but it’s only the beginning. We’re looking forward to working with our supporters in government and industry to upskill many more Canadians so they can transition into high-demand roles in the modern workforce – and help fast-growing companies achieve their full potential.”

Data Sciences Institute Supported Research Reveals How Automating Food Analysis Can Improve Health Policy

by Sara Elhawash

When purchasing foods, many consumers give food labels cursory scans, taking in information such as calorie levels or sodium content. Why is streamlining this process crucial from a public and policy perspective? 

Creating and maintaining the databases needed by researchers and others to establish food policies and monitor the food supply is a significant task. This involves classifying and analyzing hundreds of thousands of foods, a process that is typically done manually and infrequently. 

Guanlan Hu, Postdoctoral Fellow in the Department of Nutritional Sciences (Temerty Faculty of Medicine, U of T), is on a mission to simplify this complex process. Her research explores the use of pre-trained language models and supervised machine learning to analyze unstructured food label text, thereby streamlining food categorization and other important classification tasks. Among her primary goals is to revolutionize the understanding and categorization of ultra-processed foods (UPFs), particularly for the benefit of the public and policy makers. Her aim is to improve public health and streamline the analysis of food, underscoring the broader impact and significance of her research. 

Supervised by Professor Emerita Mary R. L’Abbé (Temerty Faculty of Medicine, U of T), and co-authored by Postdoctoral Fellow Mavra Ahmed and PhD student Nadia Flexner, Hu’s presentation at the DSI Research Day signals a shift in the landscape of food classification and health policy.  

“Using cutting-edge language models and machine learning, we’ve automated food categorization, nutrition quality scoring and food processing level classification,” says Hu. “This streamlines food analysis and holds promise for swift, scalable monitoring of the global food supply, particularly in identifying ultra-processed foods.” 

Leveraging pre-trained language models and the XGBoost multi-class classification algorithm, Hu’s methodology achieved an impressive accuracy score of 0.98 in predicting both major and sub-category classification of foods, outperforming traditional bag-of-words methods and presenting a powerful tool for efficiently determining food categories and food processing levels.  

“The research holds the potential to expedite the monitoring and regulation of ultra-processed foods in the global food supply, offering a transformative impact on public health and regulatory practices,” says Professor L’Abbé. 

This research is part of a DSI Catalyst Grant project, Using deep learning and image recognition to develop AI technology to measure child-directed marketing on food and beverage packaging and investigate the relationship between marketing, nutritional quality and price, awarded to L’Abbé and Professors David Soberman (Joseph L. Rotman School of Management), Laura Rosella (Dalla Lana School of Public Health), and Steve Mann (Edward S. Rogers Sr. Department of Electrical & Computer Engineering, Faculty of Applied Science & Engineering). The Collaborative Research Team includes trainees such as Hu. 

By refining food analysis and offering a better method for policymakers to monitor and regulate UPFs, Hu especially hopes to improve public health and dietary understanding in countries where highly processed foods contribute significantly to daily energy intake, such as Canada, the United States and Argentina, where Hu has applied her work. 

Her just-completed research, though, is simply a first step. “Much like the continual evolution of technology,” says Hu, “our work demands continuous development and evolution in this pioneering field.” 

In the meantime, Hu’s work underscores the potential of machine learning and natural language processing in nutrition sciences and the interdisciplinary nature of such breakthroughs, reflecting the importance Data Sciences Institute grants in fostering collaborative research. 

As a collaborative community, the DSI promotes innovation and facilitates the exchange of ideas, connecting diverse groups of researchers and trainees spanning various disciplines. One of the many ways that trainees can get involved is through the DSI’s Postdoctoral Fellowship, designed to support multi and interdisciplinary training and collaborative research in data sciences.