Uncategorized

Cataloguing Deep Space: DSI Research Software Development Support Office Seeds Zoobot Project

By: Cormac Rea

Astronomers and aerospace engineers are continuously driven to design and build better tools with which to monitor and explore outer space. Recent breakthroughs have resulted in new billion-dollar telescopes (ie. Euclid and Rubin) that can provide reams of detailed photographs from distant reaches of the universe.

But with each breakthrough arrive new problems; for instance, how can astronomers accurately organize, label, measure, catalogue and eventually make use of this seemingly infinite cache of images?

Enter Zoobot3D, a cutting-edge new DSI-funded software development project that connects AI industry with human ingenuity, efficiently measuring, labelling, annotating and cataloguing images of deep space. Zoobot3D will be the first and only software tool for galaxy feature segmentation, underpinning a new field of research that will help researchers answer questions that would otherwise be impossible.

Essentially, Zoobot3D will help researchers develop maps to millions of previously unknown galaxies… and who knows what we might find there?

Photo: Euclid’s view of the Perseus cluster of galaxies

Co-led by Professors Jo Bovy (David A. Dunlap Department of Astronomy and Astrophysics, Faculty of Arts & Science, University of Toronto) and Joshua Speagle (Department of Statistical Sciences, Faculty of Arts & Science, University of Toronto), the Zoobot project was awarded funding under the DSI Research Software Development Program.

“From the dawn of humanity, people have looked at the sky and classified the phenomena that can be observed on the celestial tapestry,” says Bovy. “This has led to fundamental insights, such as that the Earth is not at the centre of the Universe and that the Milky Way galaxy is but one of an enormous number of galaxies.”

“Understanding this ‘zoo’ of galaxies across time allows us to piece together how galaxies form and evolve and how our own Milky Way fits into this picture. By partnering with the DSI, we are able to bring the power of modern software development and data science to bear on this problem.”

Photo: Euclid’s view of spiral galaxy IC 342

“Historically, astronomers have looked through every image of galaxies – and they have looked through many thousands and tens of thousands – and then they divided them into different buckets,” explains Zoobot Team Lead and postdoctoral fellow, Mike Walmsley (David A. Dunlap Department of Astronomy and Astrophysics, Faculty of Arts & Science, University of Toronto).

“But as telescopes have become much more powerful, it’s impossible to do that for the millions of images each telescope now collects.”

“We’ve been running a citizen science project named Galaxy Zoo, showing galaxies to hundreds of thousands of people and asking them to annotate those images  – partly to get those same measurements that astronomers are used to, and partly to see what might be there that we didn’t expect,” adds Walmsley.

“Zoobot adds to the picture by helping to really focus on the first of those goals – the making of measurements at scale.”

Certain technical challenges with the Zoobot3D project required a research software engineer that could package the custom annotation tools so that other researchers could create their own labelling and as well seamlessly retrain the model on their own data.

“This has been a very interesting project,” says DSI Senior Software Developer, Conor Klamann. “Its purpose—the creation of maps of outer space—is undeniably fascinating, and developing the software itself has given us the opportunity to evaluate, select, and integrate several cutting-edge open-source tools.” 

“It’s always amazing to see what the open-source community has created, and it’s gratifying to think that citizen-scientists will be using our software to advance our knowledge of the world (and beyond!).”

Empowering Global Talent Through Partnership: DSI and KAUST Academy’s Collaborative Summer Undergraduate Data Sciences Research

Photo: 14 KAUST scholars at 2024 SUDS Showcase with Kingdom of Saudi Arabia’s Ambassador to Canada, Her Excellency Amal Yahya al-Moallimi

By: Cormac Rea

Photos: Harry Choi Photography

In a groundbreaking initiative that bridges continents and fosters global collaboration, the University of Toronto’s Data Sciences Institute (DSI) partnered with the King Abdullah University of Science and Technology l(KAUST) Academy to provide 14 exceptional Saudi scholars with the opportunity to engage in cutting-edge data science research in the 2024 cohort of the Summer Undergraduate Data Science (SUDS) Research Program. 

Not simply an educational exchange, the DSI-KAUST partnership is an investment in the future of Saudi Arabia, aimed at cultivating a new generation of leaders equipped with the skills and knowledge to drive innovation and national development. 

“The DSI-KAUST partnership is a catalyst for innovation, empowering students from both institutions to lead in data science and solve real-world challenges,” said KAUST Academy Director, Sultan Albarakati.

The scholars, recipients of prestigious awards from KAUST, were selected through a highly competitive process, ensuring that only the highest performing students were chosen to represent the Kingdom on a global stage.

KAUST specifically sought out the University of Toronto for this collaboration due to its world-renowned ranking in data science.

“The DSI SUDS program at U of T selects data science research opportunities, providing a comprehensive cohort program that includes both data science and professional development skills,” explained DSI Executive Director, Lisa Strug.

“KAUST requested that its scholars focus on data science and bioinformatics research projects, aligning with the strategic needs of Saudi Arabia. The DSI-KAUST partnership underscores a mutual commitment to nurturing the next generation of global data scientists.”

The work of KAUST Academy and SUDS scholar, Fatemah Alsolaiman, focused on evaluating transcript-guided cell segmentation in GBM-derived single-molecule spatial transcriptomics data. The main goal was to enhance an understanding of glioblastoma (GBM) by analyzing gene expression, cell patterns, and the structure and organization of tumor cells through advanced cell segmentation methods.

“As an international SUDS scholar, I feel incredibly fortunate to be part of this institute and the University of Toronto,” said Alsolaiman.

“I have had the opportunity to work with expert researchers such as Dr. Bader and Dr. Shamini at the Bader Lab within the Terrence Donnelly Centre for Cellular & Biomolecular Research. Their support and encouragement have been invaluable, motivating me to push the boundaries of my research.”

Scholar Faisal Alkulaib’s project entitled Enhancing Named Entity Normalization in Biofactoid Using NLP Techniquesaimed to improve the accuracy of bioentity normalization in the Biofactoid web tool by leveraging natural language processing (NLPs) to reduce common errors. This enhancement is crucial for creating reliable, curated biomedical data, which in turn can provide deeper insights into cellular processes and potential therapeutic opportunities.

“Participating in this program as a SUDS Scholar from KAUST has been incredibly enriching,” said Alkulaib. “The diverse perspectives boosted my research skills and network—and yes, my caffeine addiction too!”

Rakan Alsallum’s research project focused on the search for novel DNA viruses in Alzheimer’s disease brains, particularly within the Circoviridae family. His work involves identifying the jelly roll hallmark structure, one of the most conserved structures in DNA viruses, as a primary indicator for novel Circoviridae. By screening all publicly available sequencing data, Rakan aims to test the hypothesis that a previously unidentified Circovirus may be the cause of Alzheimer’s Disease.

“As an international SUDS Scholar from KAUST, I initially thought it would take a long time to adapt, especially since it was my first time traveling outside of Saudi Arabia,” said Alsallum.

“However, the supportive community at DSI and everyone in the RNAlab made me feel just like home.”

“The KAUST Academy/ DSI SUDS students have been excellent,” reflected DSI supervisor, Gary Bader.

“Many of them are experiencing their first research internship and they are learning diverse skills, including in data science and communicating their work to others.”

The successful 2024 collaboration between KAUST and U of T sets a strong foundation for future partnerships.

KAUST is already looking ahead to 2025, with plans to send another group of scholars to continue this impactful program at DSI – further solidifying a shared commitment to fostering global talent in the data sciences industry.

DSI Celebrates SUDS Cohort of 2024 with Annual Showcase

By: Cormac Rea

Photos: Harry Choi Photography

The Data Sciences Institute’s (DSI) Summer Undergraduate Data Science (SUDS) Opportunities Program recently celebrated the achievements of its 2024 cohort with the annual SUDS Showcase – an exciting full day of research project presentations and poster sessions by 57 undergraduate students.  

Designed as an annual marquee event to close the SUDS, the Showcase provides a forum for SUDS Scholars and Supervisors to share their data science research.  A highlight of the day was keynote speaker, Prof. Shion Guha (Faculty of Information and Department of Computer Science, Faculty of Arts & Science, University of Toronto, Director, Human-Centered Data Science Lab), who spoke on the topic of Deconstructing Risk in Predictive Risk Models. 

“A vibrant exhibition of the talent, the SUDS Showcase is always a great opportunity for students, supervisors and academic peers alike to connect over the many outstanding presentations, cutting-edge data science methods across a wide range of topic areas,” said Professor Laura Rosella, DSI Associate Director of Education and Training. 

“SUDS offers an excellent bridge from academic study to a professional career in data sciences and machine learning.”

David Carter, whose research entitled Mass extinctions and nocturnal behaviour: an analysis of the cryptic activity patterns of arthropod explored the link between nocturnality and evolutionary advantage in Arthropods during mass extinction events, is using data science methods and tools to construct a database of Arthropod circadian rhythms. 

“I recommend SUDS to anybody who wants to get quality research experience and develop their technical skills,” said Carter.

“I started SUDS being relatively new to coding and data analysis, and I was unconfident about my data science skills. The trainings helped me develop my skills, and I learned a lot and really enjoyed applying data science techniques to my research project.”

Ola Alyazidi’s oral presentation, Investigating the role of the X-chromosome in autoimmune diseases: A Bioinformatics Approach Leveraging Single-Cell Genomics, delved into the intricate relationship between genetics and autoimmune diseases.

“Being a SUDS scholar is a defining moment for me,” said Alyazidi. “My research journey into the role of the X chromosome in autoimmune diseases has been challenging yet incredibly rewarding.”

“This opportunity helps me to follow my dreams in bioinformatics. It is also a pivotal point in my educational journey, offering me the chance to connect with experts and advance my academic, professional, and personal growth.” 

SUDS provides a rich summer training experience for students from a wide variety of academic backgrounds to be exposed to and apply data science techniques in their work. In this year’s SUDS, under the supervision of U of T and affiliated external partner researchers, students applied data science methods and tools to research on the robustness for machine learning models, to commercial determinants of health in online gaming, mass extinctions, nocturnal behaviour, sleep modeling, neural systems, and searching for stellar streams in the milky way.

This summer, three SUDS Scholars from the University of Toronto had the opportunity to intern at Ernst & Young (EY), thanks to Mitacs funding. Through such strategic partnerships, the DSI collaborates with organizations on transformative data-driven projects. By partnering with DSI and leveraging Mitacs funding, organizations can bring on talented U of T undergraduate students to advance their data science initiatives, driving innovation and impact over the summer.

In addition, 14 students from the King Abdullah University of Science and Technology (KAUST) Academy, recipients of prestigious awards from KAUST, were selected through a highly competitive process to participate in SUDS. KAUST specifically sought out the University of Toronto for this collaboration due to its world-renowned ranking in data science.

Along with their research projects, SUDS Scholars partake of the SUDS Cohort programming for networking, academic and professional development. This includes the Data Science@Work Series, where representatives from the private sector and government organizations share data science applications in the workplace. The scholars began their studies in May with the DSI Data Science Bootcamp, gaining proficiency in data science skills including Unix Shell, R, Python, and machine learning.

“Being a SUDS scholar is a defining moment for me,” said Alyazidi. “My research journey into the role of the X chromosome in autoimmune diseases has been challenging yet incredibly rewarding.”

“This opportunity helps me to follow my dreams in bioinformatics. It is also a pivotal point in my educational journey, offering me the chance to connect with experts and advance my academic, professional, and personal growth.” 

“I had a really excellent time with my student, and it was exciting to see them take advantage of all of the opportunities, workshops, and networking events within the program, and for them to become an accomplished and productive [data] scientist in just three months,” said DSI supervisor Max Shafer, Department of Cell and Systems Biology, Faculty of Arts & Science, University of Toronto.

Distinction in the poster category was given to scholars Zeke Wong, Shivesh Prakash and Waleed Adel Alsarhani, while Elliot Sicheri and David Carter were also recognized for their standout presentations.  

Data Sciences Institute announces Doctoral Student Fellows for 2024

by Cormac Rea

The Data Sciences Institute (DSI) is pleased to announce its 2024 Doctoral Student Fellowship recipients.

The DSI Doctoral Student Fellowship supports multi-disciplinary training and collaborative research in data sciences that include faculty from the University of Toronto and external funding partners. Fellows will engage in exciting research projects with a data sciences focus, developing novel methodologies or applying existing approaches innovatively. Each fellow has at least two co-supervisors from complementary disciplinary backgrounds to guide the multidisciplinary aspects of their research project. In addition to their research, Fellows engage in DSI professional development and data skills programming and networking. 

“It is a pleasure to announce the selection of our 14 new fellows for the DSI Doctoral Student Fellowship,” says Laura Rosella, DSI Associate Director of Education and Training.

“Each fellow is an outstanding scholar and we’re excited to see how their research in data sciences can both address important societal questions and drive change. The impact of their work and contributions to the DSI community impacting several sectors of society will be eagerly anticipated by many.”

Optimizing Patient Safety Reporting and Providing Improved Patient Centred Care 

Rob (Hongbo) Chen is working with his supervisors, Profs.  Myrtede Alfred and Eldan Cohen (University of Toronto, Faculty of Applied Science and Engineering, Department of Mechanical and Industrial Engineering), to focus on research that leverages data science to optimize the efficiency, equity and user-interaction of patient safety event report classification.

Chen, a PhD student with Faculty of Engineering’s Department of Mechanical & Industrial Engineering, recently described his work in an interview with MIE’s digital news.

“Adverse events attributed to patient safety challenges are the third leading cause of death in the world, resulting in 251,454 deaths annually in the United States alone,” says Chen, whose research aims to synthesize human-centered design principles with artificial intelligence (AI) to improve patient safety. 

“The possibility of using AI to create more reliable and user-friendly incident reporting systems is a very real solution to the current challenges healthcare professionals face with complex classification taxonomy and the consequences of misclassified incident reports.”

“Accurate classification of patient safety event reports is crucial to analyzing trends, prioritizing measures to reduce such adverse events, and supporting organizational learning,” adds Prof. Cohen.

“Rob is combining state-of-the-art machine learning with human factors engineering principles to build tools that can significantly improve healthcare quality.”

Measuring the public health benefits from Zero Emissions Vehicles 

Harshit Gujral, a PhD student in Computer Science (C/S Environment & Health), is working on a  research topic that explores the measurement of public health benefits from Zero Emissions Vehicles.

“Transitioning to electric vehicles (EVs) is crucial in reducing our reliance on fossil fuels, but it’s not without its challenges—from disparities in adoption rates to increased non-tailpipe emissions as more vehicles hit the roads,” explains Gujral. “If not managed properly, EV transition can exacerbate existing health disparities, particularly for marginalized communities.”

“My research taps into data science to quantify the health benefits and inequities associated with Zero-Emission Vehicle mandates, advocating for data-driven, evidence-based policies that support a rapid and equitable EV transition,” he says. 

“At DSI, I’m excited to engage with experts and policymakers to refine our understanding of the health impacts of the EV transition. Working together, we can ensure our environmental policies do more—not only mitigate environmental harm but also promote health equity.”

Gujral is collaborating with supervisors Profs. Steve Easterbrook (University of Toronto, Faculty of Arts & Science, Department of Computer Science), Meredith Franklin (University of Toronto, Faculty of Arts & Science, Department of Statistical Sciences), and Paul Kushner (University of Toronto, Faculty of Arts & Science, Department of Physics) on his research.

“Harshit will leverage big data and computational skills to tackle an important question at the intersection of climate science and public health — how can zero emission vehicle policies promote an equitable shift to electric vehicle adoption that will benefit public health?” explains Prof. Franklin.

“His results have the potential to have significant impact on how zero emission vehicle policies are effectively implemented in the US and Canada, which in turn could result in substantive impact to our climate and health.”

Congratulations to all the 2024 DSI Doctoral Student Fellows. Learn more about each of them below:  

Dorothy Apedaile – Using Machine Learning to Investigate Homelessness and HIV Vulnerability Among Transgender Women in the United States 

Supervisors: Amaya Perez-Brumer and Susan Bonday, University of Toronto, Dalla Lana School of Public Health 

Samantha Berek – Understanding galaxy evolution using star cluster populations with statistical models 

Supervisors: Gwendolyn Eadie, University of Toronto, Faculty of Arts & Science, David A. Dunlap Department of Astronomy and Astrophysics; Joshua Speagle and Monica Alexander, University of Toronto, Faculty of Arts & Science, Department of Statistical Sciences  
 
Duncan Carruthers-Lay – Understanding the fundamental biology of Neisseria gonorrhoeae through an integrated omics approach and metabolic modelling 

Supervisors: John Parkinson, The Hospital for Sick Children; Scott Gray-Owen, University of Toronto, Temerty Faculty of Medicine, Department of Molecular Genetics 
 
Hongbo Chen – Leveraging data science approaches to optimize the efficiency, equity, and user-interaction of patient safety event report classification 

Supervisors: Myrtede Alfred and Eldan Cohen, University of Toronto, Faculty of Applied Science and Engineering, Department of Mechanical and Industrial Engineering  

Chaoran Dong – Estimating the Value of Reducing Geographical Disparities in Pediatric Cancer Care using Health Administrative Data 

Supervisors: Petros Pechlivanoglou, University Health Network, Princess Margaret Cancer Centre; Linbo Wang, University of Toronto Scarborough, Department of Computer and Mathematical Sciences 
 
Mei Dong – Advancing Mendelian Randomization Methods for Lung Cancer Research 

Supervisors: Wei Xu, The Child Health Evaluative Sciences Hospital for Sick Children; Linbo Wang, University of Toronto Scarborough, Department of Computer and Mathematical Sciences 

Harshit Gujral – Towards Equitable ZEV mandates: Measuring the public health benefits from Zero Emissions Vehicles 

Supervisors: Steve Easterbrook, University of Toronto, Faculty of Arts & Science, Department of Computer Science; Meredith Franklin, University of Toronto, Faculty of Arts & Science, Department of Statistical Sciences; Paul Kushner, University of Toronto, Faculty of Arts & Science, Department of Physics 

Ramaravind Kommiyamothilal – Reducing Online Toxicity through Exposure to Diverse Opinions 

Supervisors: Shion Guha, University of Toronto, Faculty of Information; Syed Ishtiaque Ahmed, University of Toronto, Faculty of Arts & Science, Department of Computer Science 

Alexander Laroche – Discovering middle-aged massive binaries throughout the Milky Way with deep learning 

Supervisors: Joshua Speagle, University of Toronto, Faculty of Arts & Science, Department of Statistical Sciences; Maria Drout, University of Toronto, Faculty of Arts & Science, David A. Dunlap Department of Astronomy and Astrophysics 

Eric Sanders – Statistical genetics of sex-dependent phenotypes: Applications in the common genetic epilepsies

Supervisors: Lisa Strug, University of Toronto, Faculty of Arts & Science, Department of Statistical Sciences; Linbo Wang, University of Toronto Scarborough, Department of Computer and Mathematical Sciences 

Yixiong Sun – Realistic neuron network modelling to examine network dysfunctions causing memory impairments in Alzheimer’s disease 

Supervisors: Kaori Takehara-Nishiuch, University of Toronto, Faculty of Arts & Science, Department of Psychology; Jiannis Taxidis, The Hospital for Sick Children, Neuroscience and Mental Health 

Farzan Taj – A Deep Learning Foundation Model for Predicting Responses to Genetic and Chemical Perturbations in Single Cancer Cells 

Supervisors: Lincoln Stein, University of Toronto, Temerty Faculty of Medicine, Department of Molecular Genetics; Benjamin Haibe-Kains, University of Toronto, Temerty Faculty of Medicine, Department of Medical Biophysics

Mete Yuksel – Risk roadmap: using genome sequence data and simulation-based inference to identify dangerous viral reservoirs – and predict pandemic risk 
Supervisors: Nicole Mideo and Matthew Osmond, University of Toronto, Faculty of Arts & Science, Department of Ecology and Evolutionary Biology 

Xindi Zhang – A Deep Learning System Classifies Tumour Origins Using Somatic Mutation Patterns From Circulating Tumour DNA 

Supervisors: Lincoln Stein, University of Toronto, Temerty Faculty of Medicine, Department of Molecular Genetics; Trevor Pugh, University Health Network, Princess Margaret Cancer Centre 

DSI welcomes the Centre for Addiction and Mental Health (CAMH) as a partner

By: Cormac Rea

The Data Sciences Institute (DSI) aspires to build meaningful collaborations with organizations that share mutual goals of engaging and supporting world-class data science research and training, across all sectors. We are excited to announce a new partnership with the Centre for Addiction and Mental Health (CAMH).

CAMH is Canada’s largest mental health teaching hospital and one of the world’s leading research centres in its field. CAMH is fully affiliated with the University of Toronto and is a Pan American Health Organization/World Health Organization Collaborating Centre.

With a dedicated staff of more than 5,000 physicians, clinicians, researchers, educators and support staff, CAMH offers outstanding clinical care to more than 38,000 patients each year. The organization conducts groundbreaking research, provides expert training to health care professionals and scientists, develops innovative health promotion and prevention strategies, and advocates on public policy issues at all levels of government. And through its Foundation, CAMH is working to raise tens of millions of additional dollars to fund new programs and research and augment services.

“CAMH is already a leader in data science and best practices, particularly through our Krembil Centre for Neuroinformatics,” said Dr. Aristotle Voineskos, Vice President Research & Director Campbell Family Mental Health Research Institute at CAMH. “A partnership with the DSI will better connect our organizations and the larger Toronto Academic Health Sciences Network, fostering new approaches and collaborations among hospitals and the University of Toronto. Additionally, it provides CAMH scientists and trainees with access to vital seed funding and scholarships, further advancing our mission to transform mental health research and care.”

The DSI fuels innovation and fosters the exchange of ideas, connecting a diverse community of researchers and trainees that represent a wide array of disciplines. By connecting data science researchers, data and computational platforms, and external partners, the DSI advances research and nurtures the next generation of data science researchers. As one of our external funding partners, researchers at CAMH can apply for research grants and support, training, as well as participate in networking opportunities at the DSI.

“The DSI is very proud to announce this partnership, expanding our research community to include such a widely recognized and reputable leader in social responsibility and mental health issues as CAMH. Our commitment to building a hub of data science researchers that can accelerate the impact of data across disciplines and affect positive social change will be significantly enriched by having researchers from CAMH join our community,” said Lisa Strug, DSI Director.