Data Sciences Institute

Data Sciences Institute Nurturing a Future-Ready Workforce 

by Sara Elhawash

What are the key skills and qualities required for successful professionals in today’s rapidly evolving data science landscape and how do they inform training? 

To address this important question and understand the needs of organizations, the Data Sciences Institute (DSI) invited industry and non-profit leaders to the Data Science for an Effective Workforce Panel at our Research Day earlier this fall. The panel featured experts from diverse sectors, including Mark Fiume (Co-Founder & CEO, DNA Stack), Ann Meyer (Director, BioInnovation Scientist Program, adMare BioInnovations), Dana Ohab, Associate Partner (Digital & Emerging Technology, EY), and Yves Jaques (Chief, Frontier Data & Tech Unit, UNICEF).  

Engaging with data science leadership is key to our understanding of the essential skills, both soft and hard, that employers are looking for in a data-driven decision-making world. The DSI’s newly launched Data Science and Machine Learning Software Certificates have been shaped by such input from employers. 

During the panel discussions and Q&A from participants, the demands of the industry came to the forefront, with panelists providing valuable insights and a roadmap for data science professionals. It was clear that an understanding of data science and continuous learning are key for a wide range of professional fields.  

“There hasn’t been a more exciting time to be in data and data science. What we are seeing is the expectations of our clients have fundamentally changed, the world we work in today has been moving faster and is more tailored than ever seen before,” stated Dana Ohab. 

Dana also emphasized that building a community of practice and forming strategic partnerships is a blueprint many use for staying relevant in the industry. Her advice underscored the need for continuous learning and networking to remain at the forefront of data science. 

In addition to technical skills, soft skills or job-ready skills are critical. “Data science is a dynamic field that requires more than just technical skills. It’s about effective communication, adaptability, and the ability to bridge the gap between complex technical expertise and real-world business understanding. The Data Science and Machine Learning Certificates at the Data Sciences Institute aim to equip learners with these essential skills, ensuring they are not only data-savvy but also capable of making a meaningful impact in a constantly evolving landscape,” says Ann Meyer. 

Yves Jaques emphasized the value of data science in driving positive change: “We are building capacity globally to identify local solutions and talent. We take a community first response and look at the ethical implications of how data is used globally. We leverage partnerships to bring real time results.” 

Marc Fiume shared the inspirational story of his best friend, Dan, who battled cystic fibrosis due to mutations in his CFTR gene. This story served as the driving force behind DNA Stack’s mission, which aims to “save and improve the lives of people like Dan, by harnessing the collective power of the world’s genomics and health data.” 

He stressed that the future of genomic medicine would be powered by data scientists, signifying the critical role of data science in addressing these healthcare challenges. 

The panel discussion, and continuing input from data science leaders, enable the DSI to serve as a unique and enriching bridge to connect researchers with organizations in order to offer cutting-edge, in-demand training. The certificates offer an exclusive opportunity to learn from industry experts through case study components, providing invaluable insights into the professional world of data science 

To watch the video recording of the panel, click here.   

The DSI Data Science Certificate and Machine Learning Software Foundations Certificate are tailor-made for professionals with no prior technical background who aspire to excel in data science careers. In addition to technical skills courses, participants engage in job-ready skills sessions and networking opportunities to successfully enter, or further their career, in the data sciences. Both continuing education certificates offer an exclusive opportunity to learn from industry experts through case studies. The cost for each certificate is $425.  For information and to apply, click here.  

Combining genetics and data science can help us understand why some people react more severely to COVID-19

Researchers from U of T and partner hospitals collaborated with others from across Canada and around the world to identify genetic variants associated with more severe COVID-19 outcomes.

by Tyler Irving

Why do some people have a more severe course of COVID-19 disease than others? A database created by an international collaboration of researchers — including many from the University of Toronto and partner hospitals — may hold the answers to this question, and many more.

In late 2019 and early 2020, reports of a novel form of coronavirus started emerging, first from China, then from many other locations across the globe. Lisa Strug, Senior Scientist at The Hospital for Sick Children (SickKids) and Academic Director of U of T’s Data Sciences Institute, remembers what happened next.

“In my research, I use data science techniques to map the genes responsible for complex traits,” says Strug, who is a Professor in the Departments of Statistical Sciences and Computer Science in the Faculty of Arts & Science at U of T and in the Biostatistics Division of the Dalla Lana School of Public Health. She is also the Associate Director of SickKids’ Centre for Applied Genomics, which is one of three sites across Canada that form CGEn, Canada’s national platform for genome sequencing infrastructure for research.

“We knew that genes were a factor in the severity of previous SARS infections, so it made sense that COVID-19, which is caused by a closely related virus, would have a genetic component too. Very early on, I started getting messages from several scientists who wanted to set up different studies that would help us find those genes.”

Over the next few months, Strug collaborated with nearly 100 researchers from across U of T and partner hospitals and institutions, as well as other researchers from across Canada to enrol individuals with COVID-19 and sequence their genomes.

Some of the key team members from the Toronto community included:

  • Stephen Scherer, Chief of Research at SickKids Research Institute and a University Professor in the Temerty Faculty of Medicine at U of T, as well as Director of the U of T McLaughlin Centre;
  • Rayjean Hung, Associate Director of Population Health, Lunenfeld-Tanenbaum Research Institute and a Professor in the Dalla Lana School of Public Health at U of T;
  • Angela Cheung, Clinician Scientist at University Health Network, Senior Scientist at Toronto General Hospital Research Institute, and a Professor at Temerty Medicine;
  • Upton Allen, Head of the Division of Infectious Diseases at SickKids and a Professor at Temerty Medicine.

Partner hospitals and institutions included:

  • The Hospital for Sick Children
  • Lunenfeld-Tanenbaum Research Institute
  • Mount Sinai Hospital
  • St Michael’s Hospital, Unity Health Toronto
  • Princess Margaret Cancer Centre
  • Ontario Institute for Cancer Research
  • University Health Network
  • Women’s College Hospital
  • Toronto General Hospital
  • Baycrest Health Sciences

Together with researchers at other universities, hospitals and research institutions across Canada, the team eventually created what came to be known as CGEn HostSeq — Canadian COVID-19 Human Host Genome Sequencing Databank.

Initiated by Dr. Scherer and CGEn’s Naveed Aziz, with Dr. Strug, a $20M grant was secured from Innovation, Science and Economic Development Canada administered through Genome Canada.

Scherer recalls, “we had to go right to the top to get this project funded fast and our labs and teams worked 7 days a week on the project right through the pandemic”.

Identifying associations between individual genes and complex traits typically requires thousands of genomes, both from those with the trait and those without. Though there was no shortage of cases to choose from, it was critical to gather, sequence DNA and organize the data in a way that would be ethical, efficient and useful to researchers now and in the future.

“One of our key mandates at the Data Sciences Institute is developing techniques and programs that ensure that data remains as open, accessible and as reproduceable as it can be,” says Strug.

“That vision was brought to bear as we assembled the data infrastructure for this project: for example, ensuring that consent forms were as broad as possible, so that this data could be linked with other sources, from electronic medical records to other health databases.”

“We wanted to be sure that even after the COVID-19 pandemic was over, this could be a national whole genome sequencing resource to ask all kinds of questions about health and our genes. The development of the database and its open nature also enabled Canada to collaborate effectively with similar projects in other countries.”

In the end, the project gathered more than 11,000 full genome sequences from across Canada, representing patients with a wide range of health outcomes. Those data were then combined with even more sequences from patients in other countries under what came to be called the COVID-19 Host Genetics Initiative.

It didn’t take long for patterns to start to emerge. A paper published in Nature in 2021 identified 13 genome-wide significant loci that are associated with SARS-CoV-2 infection or severe manifestations of COVID-19.

Since then, even more data have been added, and subsequent analysis has confirmed the significance of existing loci while also identifying new ones. The most recent update to the project, published in Nature earlier this year, brings the total number of distinct, genome-wide significant loci to 51.

“Identification of these loci can help one predict who might be more prone to a severe course of COVID-19 disease,” says Strug.

“When you identify a trait-associated locus, you can also unravel the mechanism by which this genetic region contributes to COVID-19 disease. This potentially identifies therapeutic targets and approaches that a future drug could be designed around.” 

While it will take many more years to fully untangle the effects of the different loci that have been identified, Strug says that the database is already showing its worth in other ways.

“It can be difficult to find datasets with whole genome sequence and approved for linkage with other health information that are this large, and we want people to know that it is open and available for all kinds of research, well beyond COVID, through a completely independent data access committee,” she says.

“For example, several investigators from across Canada have been approved to use these data and we’ve even provided funding to trainees to encourage them to develop new data science methodologies or ask novel health questions using the CGen HostSeq data.”

“This was a humongous effort, where researchers from across Canada came together during the COVID-19 pandemic to recruit, obtain and sequence DNA from more than 11,000 Canadians, in a systematic, cooperative, aligned way to create a made-in-Canada data resource that will hopefully be useful for years to come. I think that was really miraculous.”

Data Sciences Institute’s Research Day Unveils the Power of Data

by Sara Elhawash

The day-long event was filled with engaging lightning talks, poster sessions, discussions, networking activities, interactive panels, and a focus on data-driven solutions for social good. It was a celebration of the remarkable interest, participation, and collective impact achieved by the DSI community. Moreover, it provided a platform for the DSI community to showcase their work and cultivate connections with collaborators from academia, industry, and government. 

The day began with a captivating keynote address delivered by Dr. Manuel Garcia-Herranz, Data Principal Researcher, UNICEF.  He emphasized the transformative power of Frontier Data technologies and their potential to address pressing global challenges. He discussed how the diversity and volume of data are reshaping technology’s capabilities and changing the world in profound ways. 

Dr. Garcia-Herranz also highlighted the importance of addressing data inequalities, noting that data from less privileged regions is often lacking. He underscored the need to bridge the gap between data scientists and those responding to real-world emergencies. One of the key collaborations highlighted during the event was the partnership between DSI and UNICEF on the Summer Undergraduate Data Science (SUDS) program.   

Research Day featured a series of enlightening lightning talks under the theme Data for Social Good, moderated by Bree McEwan, Associate Director, University of Toronto Mississauga, Data Sciences Institute and featured Madeleine Bonsma-Fisher, Data Sciences Institute Postdoctoral Fellow, Professor Marie-Josee Fortin FRSC, Canada Research Chair in Spatial Ecology (Department of Ecology & Evolutionary Biology, Faculty of Arts & Science), Assistant Professor Zahra Shakeri, Dalla Lana School of Public Health, Assistant Professor Nidhi Subramanyam, (Department of Geography & Planning, Faculty of Arts & Science). The talks showcased research that demonstrated how data science can improve lives and enhance human experiences. Topics ranged from equitable prioritization of active transportation infrastructure in Canadian cities to the use of anonymized movement data to assess urban park usage and more. 

Another set of lightning talks, Methodologies in Novel Applications, moderated by Ethan Fosse (Associate Director, University of Toronto Scarborough, Data Sciences Institute) featured Assistant Professor Jessica Gronsbell (Department of Statistics, Faculty of Arts & Science), Professor Peter Marbach (Department of Computer Science, Faculty of Arts & Science), and Senior Scientist Babak Taati (KITE Toronto Rehabilitation Institute, University Health Network). Their talks delved into innovative methods applied to critical issues in health and social sciences. Topics included auditing fairness in health applications, uniting sociological theory with data science concepts, and using machine learning to assess fall risk. 

During the networking lunch, the DSI community had the opportunity to engage with the excellent work of graduate students and trainees who presented their research via posters. 

These poster projects covered a wide array of topics, showcasing the diversity of research within the DSI community. Among the poster presenters was DSI Graduate Doctoral Fellow Tara Henechowicz, who shared her work that explored the intriguing connections between genetics, motor traits, and music engagement, highlighting its impact on health, cognition and aging. The posters were a testament to the depth and breadth of research happening within the DSI community. 

Shayan Hodai, a student studying AI at George Brown College, shared his enthusiasm for the day, “My passion for machine learning and data science brought me here to explore how to best apply computational tools to health and genetic science. It was a really inspiring day.” 

The day concluded with a panel discussion on Data Science for an Effective Workforce, moderated by Lisa Strug, Director of the Data Sciences Institute. Industry leaders from various sectors, including Yves Jaques, Chief, Frontier Data & Tech Unit at UNICEF; Ann Meyer, Director, BioInnovation Scientist Program at adMare BioInnovations; Mark Fiume, Co-Founder and CEO, DNA Stack, and Dana Ohab, Associate Partner, Digital & Emerging Technology at EY, came together to discuss how data science is reshaping workforce efficiency and effectiveness. The panel illuminated the profound impact of data science on decision-making, strategic planning, and operational excellence. 

After the panel discussion, Shefali Lathwal, new to the Toronto Data Science community, shares, “I wanted to know more about who the main players in the field are. I really liked that we had both academic researchers and industry-focused talks, especially the last panel on Data Science for an Effective Workforce. It’s nice that we looked at areas beyond generative AI and explored where else data is being applied. These are areas that don’t get much attention like big tech projects, so it was nice to hear about these projects that tell you that data science is not all about big data necessarily. There are many fields where Data Scientists are needed to solve meaningful problems.” 

The Data Sciences Institute extends heartfelt thanks to all of its funding partners, including our gold sponsor Amazon Web Services, for their support in making this event possible. The day was a testament to the collective power of data science to shape a better tomorrow.  

To watch the video recordings, click here 

Photos captured by Harry Choi

Data Sciences Institute Researchers are Revolutionizing Financial Risk Management with Data-Driven Strategies

by Sara Elhawash

How can innovative data-driven approaches like reinforcement learning revolutionize risk management for financial institutions? 

Financial institutions constantly deal with the challenge of managing risks tied to factors like interest rates, stock prices, and more. These risks, often unpredictable in nature, add complexity to the financial landscape. Managing risk is simpler when dealing with straightforward assets like stocks, where risks are typically linked to the asset’s price. However, complexities arise when dealing with financial derivatives like options, where risks are shaped by intricate non-linear relationships and unpredictable market changes. 

Historically, the financial industry has relied on parametric models to understand financial variable dynamics. The Black-Scholes model, introduced in 1973 by Black, Scholes, and Merton, became renowned for its constant-volatility assumption.  

DSI members and University of Toronto Professors Sebastian Jaimungal, (Department of Statistical Sciences, Faculty of Arts & Science) and John Hull (Joseph L. Rotman School of Management), propose a new frontier in financial risk management. Their aim is to develop alternative methods for quantifying and managing risk within financial institutions, utilizing reinforcement learning—a data-driven approach. Their strategies prioritize robustness to model misspecification and dynamic time consistency. 

Professor Sebastian Jaimungal explains, “Thanks to the invaluable support from DSI, our team has achieved a significant milestone with the development of ‘FuNVol: A Multi-Asset Implied Volatility Market Simulator using Functional Principal Components and Neural SDEs. This research employs Legendre polynomials to represent the surface and employs neural stochastic differential equations, a form of stochastic evolution driven by neural networks, to capture its intricate dynamics. With DSI’s support, we’ve been able to delve deeper into understanding volatility surface dynamics and its implications for risk management.” 

With the support of a DSI Catalyst Grant, this collaborative research team is working to better understand how volatility surfaces change using generative models. Their research has significant implications for risk assessment, risk management, and portfolio valuation, primarily benefiting financial institutions. Understanding the various ways volatility surfaces can evolve promises to enhance portfolio hedging strategies for financial institutions.  

“The DSI Catalyst Grant program underscores our commitment to advancing the frontiers of data-driven research, and we are delighted to witness the significant progress it has facilitated,” says Gary Bader, Associate Director, Research and Software, Data Sciences Institute.  

John Hull’s group is looking at the same challenge using variational autoencoders (VAEs), a model where latent factors determine option price location and spread. 

What sets this research apart is the goal of incorporating these generative models as inputs into reinforcement learning algorithms. Their aim is to develop sophisticated strategies for mitigating risks within portfolios of financial options and toward more robust and effective risk mitigation strategies in an ever-evolving financial landscape. 

The impact of data science on workforce efficiency – A discussion with senior leaders

by Sara Elhawash

Amidst the whirlwind of rapid digital transformation sweeping across industries, we are shining a spotlight on the pivotal role that data science plays in building an effective workforce. Workforce strategies will be showcased at the Data Sciences Institute Research Day on September 27, 2023. 

The panel titled “Data Science for an Effective Workforce,” will feature data science leaders from the private sector, non-profit organizations and the government. The panel will include David Campbell, Assistant Director, Data Science Applications at the Bank of Canada; Yves Jaques, Chief, Frontier Data & Tech Unit at UNICEF; Ann Meyer, Director, BioInnovation Scientist Program at adMare BioInnovations; Mark Fiume, Co-Founder and CEO, DNA Stack, and Dana Ohab, Associate Partner, Digital & Emerging Technology at EY. Each of the panelists will bring a wealth of insight on the topic. The event presents an exciting opportunity to explore the synergy of data science and modern workforce development in a world that’s becoming more data driven. 

As industries become more complex and interconnected, the ability to harness and interpret data has become essential for making informed choices that drive growth, efficiency, and innovation. 

Yves Jaques, Chief of the Frontier Data and Tech Unit at UNICEF, extends the perspective: “Data science is borderless. It defies geographical constraints, knitting together a digitally connected workforce that is not bound by location. It gives us the possibility to bridge the digital divide by creating resilient networks that empower our national partners to scale and sustain local solutions with local talent, capitalizing on the collective intelligence of a global workforce.”   

“The Data Sciences Institute Research Day serves as a platform for delving into the intricate interplay of data science and workforce strategies. It’s an opportunity to explore how these two domains coalesce to define the future of work,” says Lisa Strug, Director of the Data Sciences Institute. 

As a multi-divisional, tri-campus, multidisciplinary hub for data science activity at the University of Toronto, DSI brings together researchers and trainees from across the University, its affiliated research institutes, industry and beyond to support data sciences research, innovation, collaboration, and training to translate promising ideas into real-world solutions and advance the data sciences, themselves.  

Research Day #DataSciencesDay, serves as a platform for this discussion. Attendees can expect to learn from these insights through the panel discussion, lightning talks from DSI researchers, poster sessions and the invaluable networking sessions that promise to enrich understanding.  

For those interested in joining in on the DSI Research Day and gaining new insights, the countdown has commenced. Register here to secure your spot.