Cormac Rea

Breaking New Ground: Data Sciences Institute and Ernst & Young Collaborate on Data-Driven Consulting Project with Mitacs Support

Photo (L-R): SUDS scholars Tejas Balaji, Minh Dang, Farah Mikati; Sumaiya Hossain, DSI Partnership & Business Development Officer

By: Cormac Rea

A team of Summer Undergraduate Data Science (SUDS) scholars recently embarked on a transformative journey in their careers, utilizing a forward-thinking initiative between University of Toronto’s Data Sciences Institute (DSI) and Ernst & Young (EY) with funding support from Mitacs.

The data-driven EY Project, Leveraging Data Science to Build Consulting Specific Solution Offering, brought together three SUDS scholars to work on the project at EY during the summer. The project’s focus is developing data science go-to-market solutions for some of EY’s target industries, aiming to build AI-based assets intended for wider commercial application.

EY is a leader in assurance, tax, transaction, and advisory services, harnessing data analytics and artificial intelligence to offer innovative consulting solutions. EY’s focus on industry-specific solutions is meant to generate broad economic benefits, improving market competitiveness and efficiency across sectors beyond those targeted in this project.

The SUDS EY collaboration showcases EY’s commitment to innovation through data science and also serves as a model for other companies considering similar collaborations.

“The quality and calibre of work by the SUDS scholars was outstanding,” said Shawn Sigesmund, EY Canada SAP National Practice Co-Leader.

“They hit the ground running with regards to their positive attitude and how they were able to so quickly and seamlessly fit into the EY workplace. The students immediately started adding value and contributing to delivering client work.”

Throughout the summer, the SUDS scholars applied their data science skills to EY’s project, gaining hands-on experience while simultaneously refining their technical and professional competencies. The SUDS scholars’ journey began with a data science bootcamp in May, providing them with essential technical skills, such as data analysis, machine learning, and visualization techniques. In addition to these foundational tools, they received professional development through the SUDS Cohort programming.

“The internship has greatly enriched my learning and career aspirations,” says SUDS scholar, Farah Mikati.

“The experience I gained has enhanced my skills and solidified my commitment to pursuing a successful career in the data science field.”

I joked with everyone that the opportunity to work with EY and DSI felt like a gift from the sky,” said SUDS scholar, Minh Dang.

“It has significantly helped me grow my technical, interpersonal, and business skills in a professional environment.”

This collaboration is part of the larger DSI Mitacs initiative for Data-driven Decisions & Discovery: Innovation for Transformative Impact. This initiative highlights the growing importance of data science in industry and demonstrates the significant potential of industry-academic collaboration.

In addition to funds from MITACS, one of the key advantages of the SUDS-Mitacs partnership opportunity is the streamlined application process for fast-tracked projects. With DSI’s assistance, the application process for Mitacs funding has a remarkably fast turnaround time of just two weeks to receive project approvals after proposal submission to Mitacs. Companies benefit from this efficiency, allowing them to focus on innovation rather than administrative hurdles.

DSI also plays a critical role in finding a faculty supervisor for each Mitacs project, ensuring that interns receive the academic guidance necessary for success. These faculty connections can lead to long-term partnerships between companies and U of T researchers, opening the door to larger projects.

“This successful collaboration highlights the benefits of partnering with DSI for Mitacs projects. Companies gain access to top-tier talent through the SUDS, which offers rigorous technical and professional training,” highlights Sumaiya Hossain, DSI Partnership & Business Development Officer.

Is your company interested in leveraging data science for innovative solutions? Partnering with DSI through Mitacs is a strategic move that capitalizes on government funding to lower costs, which can yield immediate results and foster long-term growth.

Interested in learning more? Contact DSI today to explore how a SUDS-Mitacs collaboration can benefit your organization.

Mapping the Disconnectome: DSI Research Software Development Support Office helps Develop New Software for Brain Health

By: Cormac Rea

A complex entity full of grey matter structures that perform distinct functions and the white matter tracts that scaffold them, the human brain contains a vast puzzle of regions, interconnections and, sometimes, disconnections.

When we injure our brains at any age – or when breakdowns in connectivity occur in the brain – it is often a race against the clock for doctors, medical professionals and scientists to identify a root cause and initiate effective treatment.

For parents of newborn infants, this is an especially terrifying ordeal. Brain injury that occurs during birth or within immature newborn infants requires early and precise treatment to determine positive outcomes and avoid neurodevelopmental impairment.

A challenge for medical practitioners is to connect dots at speed between specific patient scans (ie. MRI etc.) and the broad existing data sets on various types of neonatal brain injuries.

Disconnectome – a Data Sciences Institute supported research software development project – is a handy desktop application that could be installed on a clinician’s computer and help speed up the process of identifying injury and appropriate treatment.

“One of the typical challenges in research software development is being able to package the software developed by the researchers in a way that makes it easy for others to use and run in a user-friendly way,” said Data Sciences Institute Senior Software Developer, Wisam Al Abed.

“My main task was to take the algorithms developed by the Disconnectome research team and build a desktop application with them that can be run with a click of a button. I built a simple user-friendly interface that is browser based that allows users to specify which MRI images they want to run the algorithm on as well as where to store the results.”

Co-led by scientists Steven Miller and Vann Chau (Neurosciences & Mental Health, SickKids Research Institute; Department of Paediatrics, University of Toronto), the Disconnectome project was provided with access to a DSI professional research software developer.

“DSI funding offers an important opportunity to bring the research advances from quantitative MRI to the bedside – making the wealth of data generated from these images accessible to clinicians to promote the best possible outcomes for their patients,” said Miller.

The DSI Research Software Development Support Program supports researchers to refine existing software tools to improve usability and robustness, or to build new tools, disseminate research software beyond the research space in which it is created, and enhance existing functionality.

“We have had tremendous response from a wide range of research teams for our competitive calls for DSI support,” said Gary Bader, DSI Associate Director, Research and Software.

“This is clearly an important program to better support cutting-edge research, while fostering the collaboration, equitable and open science principles at the DSI.”

Learn more about the Research Software Development Support Program and how to apply for support. Deadline for 2024-2025 applications is October 18, 2024

Cataloguing Deep Space: DSI Research Software Development Support Office Seeds Zoobot Project

By: Cormac Rea

Astronomers and aerospace engineers are continuously driven to design and build better tools with which to monitor and explore outer space. Recent breakthroughs have resulted in new billion-dollar telescopes (ie. Euclid and Rubin) that can provide reams of detailed photographs from distant reaches of the universe.

But with each breakthrough arrive new problems; for instance, how can astronomers accurately organize, label, measure, catalogue and eventually make use of this seemingly infinite cache of images?

Enter Zoobot3D, a cutting-edge new DSI-funded software development project that connects AI industry with human ingenuity, efficiently measuring, labelling, annotating and cataloguing images of deep space. Zoobot3D will be the first and only software tool for galaxy feature segmentation, underpinning a new field of research that will help researchers answer questions that would otherwise be impossible.

Essentially, Zoobot3D will help researchers develop maps to millions of previously unknown galaxies… and who knows what we might find there?

Photo: Euclid’s view of the Perseus cluster of galaxies

Co-led by Professors Jo Bovy (David A. Dunlap Department of Astronomy and Astrophysics, Faculty of Arts & Science, University of Toronto) and Joshua Speagle (Department of Statistical Sciences, Faculty of Arts & Science, University of Toronto), the Zoobot project was awarded funding under the DSI Research Software Development Program.

“From the dawn of humanity, people have looked at the sky and classified the phenomena that can be observed on the celestial tapestry,” says Bovy. “This has led to fundamental insights, such as that the Earth is not at the centre of the Universe and that the Milky Way galaxy is but one of an enormous number of galaxies.”

“Understanding this ‘zoo’ of galaxies across time allows us to piece together how galaxies form and evolve and how our own Milky Way fits into this picture. By partnering with the DSI, we are able to bring the power of modern software development and data science to bear on this problem.”

Photo: Euclid’s view of spiral galaxy IC 342

“Historically, astronomers have looked through every image of galaxies – and they have looked through many thousands and tens of thousands – and then they divided them into different buckets,” explains Zoobot Team Lead and postdoctoral fellow, Mike Walmsley (David A. Dunlap Department of Astronomy and Astrophysics, Faculty of Arts & Science, University of Toronto).

“But as telescopes have become much more powerful, it’s impossible to do that for the millions of images each telescope now collects.”

“We’ve been running a citizen science project named Galaxy Zoo, showing galaxies to hundreds of thousands of people and asking them to annotate those images  – partly to get those same measurements that astronomers are used to, and partly to see what might be there that we didn’t expect,” adds Walmsley.

“Zoobot adds to the picture by helping to really focus on the first of those goals – the making of measurements at scale.”

Certain technical challenges with the Zoobot3D project required a research software engineer that could package the custom annotation tools so that other researchers could create their own labelling and as well seamlessly retrain the model on their own data.

“This has been a very interesting project,” says DSI Senior Software Developer, Conor Klamann. “Its purpose—the creation of maps of outer space—is undeniably fascinating, and developing the software itself has given us the opportunity to evaluate, select, and integrate several cutting-edge open-source tools.” 

“It’s always amazing to see what the open-source community has created, and it’s gratifying to think that citizen-scientists will be using our software to advance our knowledge of the world (and beyond!).”

Empowering Global Talent Through Partnership: DSI and KAUST Academy’s Collaborative Summer Undergraduate Data Sciences Research

Photo: 14 KAUST scholars at 2024 SUDS Showcase with Kingdom of Saudi Arabia’s Ambassador to Canada, Her Excellency Amal Yahya al-Moallimi

By: Cormac Rea

Photos: Harry Choi Photography

In a groundbreaking initiative that bridges continents and fosters global collaboration, the University of Toronto’s Data Sciences Institute (DSI) partnered with the King Abdullah University of Science and Technology l(KAUST) Academy to provide 14 exceptional Saudi scholars with the opportunity to engage in cutting-edge data science research in the 2024 cohort of the Summer Undergraduate Data Science (SUDS) Research Program. 

Not simply an educational exchange, the DSI-KAUST partnership is an investment in the future of Saudi Arabia, aimed at cultivating a new generation of leaders equipped with the skills and knowledge to drive innovation and national development. 

“The DSI-KAUST partnership is a catalyst for innovation, empowering students from both institutions to lead in data science and solve real-world challenges,” said KAUST Academy Director, Sultan Albarakati.

The scholars, recipients of prestigious awards from KAUST, were selected through a highly competitive process, ensuring that only the highest performing students were chosen to represent the Kingdom on a global stage.

KAUST specifically sought out the University of Toronto for this collaboration due to its world-renowned ranking in data science.

“The DSI SUDS program at U of T selects data science research opportunities, providing a comprehensive cohort program that includes both data science and professional development skills,” explained DSI Executive Director, Lisa Strug.

“KAUST requested that its scholars focus on data science and bioinformatics research projects, aligning with the strategic needs of Saudi Arabia. The DSI-KAUST partnership underscores a mutual commitment to nurturing the next generation of global data scientists.”

The work of KAUST Academy and SUDS scholar, Fatemah Alsolaiman, focused on evaluating transcript-guided cell segmentation in GBM-derived single-molecule spatial transcriptomics data. The main goal was to enhance an understanding of glioblastoma (GBM) by analyzing gene expression, cell patterns, and the structure and organization of tumor cells through advanced cell segmentation methods.

“As an international SUDS scholar, I feel incredibly fortunate to be part of this institute and the University of Toronto,” said Alsolaiman.

“I have had the opportunity to work with expert researchers such as Dr. Bader and Dr. Shamini at the Bader Lab within the Terrence Donnelly Centre for Cellular & Biomolecular Research. Their support and encouragement have been invaluable, motivating me to push the boundaries of my research.”

Scholar Faisal Alkulaib’s project entitled Enhancing Named Entity Normalization in Biofactoid Using NLP Techniquesaimed to improve the accuracy of bioentity normalization in the Biofactoid web tool by leveraging natural language processing (NLPs) to reduce common errors. This enhancement is crucial for creating reliable, curated biomedical data, which in turn can provide deeper insights into cellular processes and potential therapeutic opportunities.

“Participating in this program as a SUDS Scholar from KAUST has been incredibly enriching,” said Alkulaib. “The diverse perspectives boosted my research skills and network—and yes, my caffeine addiction too!”

Rakan Alsallum’s research project focused on the search for novel DNA viruses in Alzheimer’s disease brains, particularly within the Circoviridae family. His work involves identifying the jelly roll hallmark structure, one of the most conserved structures in DNA viruses, as a primary indicator for novel Circoviridae. By screening all publicly available sequencing data, Rakan aims to test the hypothesis that a previously unidentified Circovirus may be the cause of Alzheimer’s Disease.

“As an international SUDS Scholar from KAUST, I initially thought it would take a long time to adapt, especially since it was my first time traveling outside of Saudi Arabia,” said Alsallum.

“However, the supportive community at DSI and everyone in the RNAlab made me feel just like home.”

“The KAUST Academy/ DSI SUDS students have been excellent,” reflected DSI supervisor, Gary Bader.

“Many of them are experiencing their first research internship and they are learning diverse skills, including in data science and communicating their work to others.”

The successful 2024 collaboration between KAUST and U of T sets a strong foundation for future partnerships.

KAUST is already looking ahead to 2025, with plans to send another group of scholars to continue this impactful program at DSI – further solidifying a shared commitment to fostering global talent in the data sciences industry.

DSI Celebrates SUDS Cohort of 2024 with Annual Showcase

By: Cormac Rea

Photos: Harry Choi Photography

The Data Sciences Institute’s (DSI) Summer Undergraduate Data Science (SUDS) Opportunities Program recently celebrated the achievements of its 2024 cohort with the annual SUDS Showcase – an exciting full day of research project presentations and poster sessions by 57 undergraduate students.  

Designed as an annual marquee event to close the SUDS, the Showcase provides a forum for SUDS Scholars and Supervisors to share their data science research.  A highlight of the day was keynote speaker, Prof. Shion Guha (Faculty of Information and Department of Computer Science, Faculty of Arts & Science, University of Toronto, Director, Human-Centered Data Science Lab), who spoke on the topic of Deconstructing Risk in Predictive Risk Models. 

“A vibrant exhibition of the talent, the SUDS Showcase is always a great opportunity for students, supervisors and academic peers alike to connect over the many outstanding presentations, cutting-edge data science methods across a wide range of topic areas,” said Professor Laura Rosella, DSI Associate Director of Education and Training. 

“SUDS offers an excellent bridge from academic study to a professional career in data sciences and machine learning.”

David Carter, whose research entitled Mass extinctions and nocturnal behaviour: an analysis of the cryptic activity patterns of arthropod explored the link between nocturnality and evolutionary advantage in Arthropods during mass extinction events, is using data science methods and tools to construct a database of Arthropod circadian rhythms. 

“I recommend SUDS to anybody who wants to get quality research experience and develop their technical skills,” said Carter.

“I started SUDS being relatively new to coding and data analysis, and I was unconfident about my data science skills. The trainings helped me develop my skills, and I learned a lot and really enjoyed applying data science techniques to my research project.”

Ola Alyazidi’s oral presentation, Investigating the role of the X-chromosome in autoimmune diseases: A Bioinformatics Approach Leveraging Single-Cell Genomics, delved into the intricate relationship between genetics and autoimmune diseases.

“Being a SUDS scholar is a defining moment for me,” said Alyazidi. “My research journey into the role of the X chromosome in autoimmune diseases has been challenging yet incredibly rewarding.”

“This opportunity helps me to follow my dreams in bioinformatics. It is also a pivotal point in my educational journey, offering me the chance to connect with experts and advance my academic, professional, and personal growth.” 

SUDS provides a rich summer training experience for students from a wide variety of academic backgrounds to be exposed to and apply data science techniques in their work. In this year’s SUDS, under the supervision of U of T and affiliated external partner researchers, students applied data science methods and tools to research on the robustness for machine learning models, to commercial determinants of health in online gaming, mass extinctions, nocturnal behaviour, sleep modeling, neural systems, and searching for stellar streams in the milky way.

This summer, three SUDS Scholars from the University of Toronto had the opportunity to intern at Ernst & Young (EY), thanks to Mitacs funding. Through such strategic partnerships, the DSI collaborates with organizations on transformative data-driven projects. By partnering with DSI and leveraging Mitacs funding, organizations can bring on talented U of T undergraduate students to advance their data science initiatives, driving innovation and impact over the summer.

In addition, 14 students from the King Abdullah University of Science and Technology (KAUST) Academy, recipients of prestigious awards from KAUST, were selected through a highly competitive process to participate in SUDS. KAUST specifically sought out the University of Toronto for this collaboration due to its world-renowned ranking in data science.

Along with their research projects, SUDS Scholars partake of the SUDS Cohort programming for networking, academic and professional development. This includes the Data Science@Work Series, where representatives from the private sector and government organizations share data science applications in the workplace. The scholars began their studies in May with the DSI Data Science Bootcamp, gaining proficiency in data science skills including Unix Shell, R, Python, and machine learning.

“Being a SUDS scholar is a defining moment for me,” said Alyazidi. “My research journey into the role of the X chromosome in autoimmune diseases has been challenging yet incredibly rewarding.”

“This opportunity helps me to follow my dreams in bioinformatics. It is also a pivotal point in my educational journey, offering me the chance to connect with experts and advance my academic, professional, and personal growth.” 

“I had a really excellent time with my student, and it was exciting to see them take advantage of all of the opportunities, workshops, and networking events within the program, and for them to become an accomplished and productive [data] scientist in just three months,” said DSI supervisor Max Shafer, Department of Cell and Systems Biology, Faculty of Arts & Science, University of Toronto.

Distinction in the poster category was given to scholars Zeke Weng, Shivesh Prakash and Waleed Adel Alsarhani, while Elliot Sicheri and David Carter were also recognized for their standout presentations.