Data Sciences Institute (DSI)

Oct 03 2022

A summer of learning, fun and community for 2022 DSI SUDS Scholars

The Data Sciences Institute (DSI) welcomed 35 carefully selected undergraduate students from across Canada for a rich data sciences research experience. The Summer Undergraduate Data Science (SUDS) Opportunities Program is a great way for undergraduate students to engage in hands-on research led by DSI member faculty and scientists. The program has a cross-disciplinary approach, applying data science skills in various fields including the humanities, life science, engineering, public health, and more.

“The DSI SUDS program is about inspiring the next generation of data scientists and giving undergraduate students the chance to explore data science as a career. In addition to their research projects, SUDS Scholars are provided with data science skills and professional development opportunities. We couldn’t be more thrilled to have the chance to inspire them and hopefully kickstart their careers in this exciting field. They are truly an exceptional bunch!” says Laura Rosella, DSI associate director of education and training.

Interested in proposing a SUDS research opportunity? Applications are now open!

Student applications open on December 19, 2022, for the 2023 SUDS opportunities.

SUDS Scholars praise the program

SUDS Scholars participated in weekly speaker seminars, data science skills, and professional development opportunities. They had numerous networking events where they got to know more about the wide variety of SUDS projects. They also requested an easier way to stay in touch and connect. To accommodate this request the DSI set up a Zoom Chat channel.

“We were so excited to see that the Scholars wanted to stay in touch, network, and learn from each other. It was so great to see a community form. We look forward to our 2023 SUDS program and continue to support this community of students from diverse backgrounds,” says Wenzhe Xu, DSI’s programming coordinator and SUDS officer.

Scholars presented their research projects and data methods at SUDS Research Day, where students also voted for the best presentation. This year’s winner was Lauren Gill from the University of British Columbia who was studying Data Science for White Shark Conservation with Vianey Leos Barajas an assistant professor in the Department of Statistical Sciences at the Faculty of Arts & Science.

Other 2022 SUDS scholars included Yingke Wang who was working with Rahul Krishnan, an assistant professor in the Department of Computer Science and the Department of Laboratory Medicine & Pathobiology within the Temerty Faculty of Medicine, on machine learning for chronic disease management.

“Thanks to SUDS, I had the opportunity to learn how to combine machine learning algorithms in the healthcare industry as well as explore survival analysis. Plus, the self-learning skills I gained will be essential to me for approaching graduate study,” says Wang, a member of St. Michael’s College.

“It's been wonderful to see the support that SUDS provides to young scholars like Yingke,” says Krishnan. “Introducing students to research early is an important step for them to see the opportunities that graduate study can provide."

SUDS scholar and Innis College member, Tina Tsan worked with Ulrich Wortmann, an associate professor in the Department of Earth Sciences at the Faculty of Arts & Science, on an analysis of why the last ice age came to a sudden end.

“For me, the biggest reward from the SUDS program has been how it’s broadened my perspective and understanding of what data science is and how it's used in different fields,” Tsan says.

SUDS scholar Anthony McCanny, a member of Victoria College where he was a Northrop Frye Centre Undergraduate Fellow, worked with Felix Cheung, an assistant professor in the Department of Psychology at the Faculty of Arts & Science. They explored whether gross domestic product (GDP) is a good measure of economic and societal success, and what type of government spending improves the lives of citizens.

“The SUDS program filled my summer with an unbelievable amount of learning, fun, joy and community,” says McCanny. “I’ve been very lucky in Professor Cheung’s lab to have the freedom to conduct my own research, paired with great guidance. It’s hard not to feel like this summer has redefined my path in life, filling me with enthusiasm for a career in research, and connecting me with people that I hope I get to keep working with.”

Sep 22 2022

Building data science software to help the fight against cancer

Tumors, much like people, are different from one another. In fact, not only can the same type of tumor vary from person to person, but there can also be variations within the tumor itself, as a single tumor is comprised of a diverse population of cells. This tumor heterogeneity makes it difficult for researchers to create effective treatment plans. This is where Dr. Gregory Schwartz and his team at the University Health Network, and Medical Biophysics at the University of Toronto, come in with the help of the Data Sciences Institute’s (DSI) research software development support program.

Interested in applying for the DSI’s research software development support program? Apply by October 21, 2022, for our next round of applications.

The DSI’s software development program is designed to support faculty and scientists by providing access to highly skilled software developers to refine or enhance existing software and improve usability and robustness, build new tools, and disseminate research software. The DSI has been supporting six projects since its first call. The DSI’s senior software developer, Dr. Conor Klamann worked with Schwartz and his team.

Helping understand cellular heterogeneity in cancer with TooManyCells

Genetic heterogeneity within a tumor occurs due to imperfect DNA replication. When healthy cells divide to create new cells, it can lead to mutations. When cancerous cells divide, mutations can also occur causing tumor heterogeneity. However, these diverse populations of cells can also exhibit non-genetic heterogeneity in response to treatment, changing their behaviour based on their surrounding environment independent of mutation. To measure cell behavior at the resolution of individual cells, researchers are using new single-cell technologies. This produces a massive amount of detailed data and subsequently requires sophisticated computational tools to interpret.

To better understand heterogeneity and drug resistance in cancer, Schwartz and his team developed TooManyCells, a suite of tools designed for clustering and visualizing single-cell data. The visualization component of TooManyCells’ is custom-made and presents cell relationships as a tree. By using TooManyCells, the team could identify rare cancer cells which were contributing to disease progression.

However, the software had some limitations.

“The limitation of TooManyCells was that it took time to build a tree. These trees can be quite large, so to visualize major cell populations you would have to prune the tree several different ways and rerun the program repeatedly. You also didn't really know which way was the right way to prune the tree and colour it until you saw the output,” says Schwartz. “So that’s where this opportunity to work with the DSI’s research software development support program came in.”

“It's wonderful to have a fantastic software developer like Conor devoting his time to facilitating these kinds of projects, which are not easy to get off the ground. They are absolutely necessary and required in these fields but have surprisingly few funding opportunities. So, it's fantastic that these kinds of avenues exist,” says Schwartz about the program.

How is the project developing?

The goal of this project was to provide a graphical user interface for the analysis tools that Schwartz and his team developed. The details have evolved with time but creating an interactive tool to speed up analyses and improve user experience has always been at the heart of the project. Currently, the software development team at the DSI has a prototype in place and is working on collecting user feedback. The research team is also preparing an article describing the software, and once it has been completed, the source code will be made public on the Schwartz Lab GitHub page so that other researchers may access it.

“It's been a pleasure working on TooManyCells! It's given me the opportunity to combine various programming frameworks in ways I haven't done before while supporting some very interesting research,” says Conor Klamann, DSI senior software developer.

Sep 06 2022

Conquer the world of data science with the DSI Data Science Certificate

The world runs on data — and a new certificate is set to help people develop the skills they need to become leaders in the field.

The Data Sciences Institute (DSI) at the University of Toronto has launched a Data Science Certificate to help professionals gain essential job-ready skills, which will support them to open doors to new advancements and employment opportunities.

“The University of Toronto is a global leader in data sciences,” says Lisa Strug, director of the DSI, Professor of Statistical Sciences, Computer Science and Biostatistics and senior scientist at The Hospital for Sick Children. “The demand for skilled, fluent and adaptable data science expertise is expanding. To keep pace with the scale of change, the DSI has an opportunity to lead in the shift from a knowledge-based to a learning-based model where upskilling is an ongoing opportunity for learners and no job opportunity is ever out of reach.”

Estimates suggest that 2.5 quintillion bytes of data are generated every day. It’s not surprising that professionals increasingly find that data science skills are in demand. Society is experiencing a transformative shift in the production, collection and use of data. As a result, organizations need skilled professionals capable of analyzing large amounts of data, uncovering valuable insights and defining the story hidden in the numbers.

Previous experience with data science isn’t needed to apply. The only prerequisite for the certificate is a degree in a field outside of computer science or statistics.

Why is the DSI offering this certificate?

The DSI is a central hub and incubator for data science research, training and partnerships at U of T. The DSI is accelerating the impact of data across disciplines to address pressing societal issues and drive positive social change. Training is an integral component of the DSI’s mission, aligned with the University’s aim to support life-long learning.

Learn from private-sector experts

The DSI Data Science Certificate offers the unique opportunity to learn from private-sector experts during the case studies in each course. The case study component provides learners with important insights into the professional world of data science analytics.

“The DSI Data Science Certificate is built around a series of core courses essential to establishing a strong foundation in data science. These courses are designed to take someone without data science expertise and give them the confidence to excel in any data-driven field. It also includes case studies from leading experts. We are very excited to be launching this certificate and have big plans to expand our offerings in the future,” says Rohan Alexander, assistant professor in the Faculty of Information and Department of Statistical Sciences.

In addition, the certificate offers busy professionals flexibility. The certificate is fully online, and learners can choose a single course to improve their skills in a specific area or earn a full certificate by taking six of the eight courses offered. The courses are designed to ensure learners master the core competencies in foundational data science, including SQL, R and Python, and gain hands-on experience through real-world case studies.

What pilot participants are saying

The DSI ran a successful set of course pilots with over 100 learners over the summer.

“For a beginner, I found that it provided an amazing overview! The flow was well-paced. It was a lot of information at once sometimes, but I was able to manage as I could go back and review items when off class time. The sequence of the course material makes complete sense as you move forward in the course. It all tied in together,” says one participant.

“Instructors were very knowledgeable, helpful and engaging! Good class size; also attracted collaborative and enthusiastic students with a variety of competencies. It was very helpful to be asked to ask questions in the public chat, which encouraged collegiality,” says another participant.

Jun 13 2022

DSI welcomes Unity Health Toronto as a partner

The Data Sciences Institute (DSI) strives to collaborate with organizations that want to engage and support world-class researchers, educators, and trainees working to advance data science. We are excited to announce a new partnership with Unity Health Toronto.

Unity Health Toronto, comprised of Providence Healthcare, St. Joseph’s Health Centre, and St. Michael’s Hospital, works to advance the health of everyone in their urban communities and beyond. The health network serves patients, residents and clients across the full spectrum of care, spanning primary care, secondary community care, tertiary and quaternary care services to post-acute through rehabilitation, palliative care and long-term care while investing in world-class research and education.

“The pandemic has helped shine an important light on how data science can help us plan, understand and evaluate responses to global health crises and ultimately create the best care experiences for our patients and those beyond our walls. The value of health research has never been clearer than it is now. At Unity Health, we are a leader in the use of data and advanced analytics in healthcare delivery and research. Partnering with the DSI will enable Unity Health to continue to harness the power of data science to improve care. This collaboration will enhance our work with our partners to apply big data to advance the health of our communities locally, nationally and globally,” says Dr. Ori Rotstein, vice-president of Research and Innovation at Unity Health Toronto.

The DSI fuels innovation and fosters the exchange of ideas, connecting a diverse community of researchers and trainees that represent a wide array of disciplines. By connecting data science researchers, data and computational platforms, and external partners, the DSI advances research and nurtures the next generation of data science researchers. As one of our external funding partners, researchers at Unity Health can apply for research grants and support, training, as well as networking opportunities at the DSI.

“The DSI is thrilled to announce this partnership. We are very excited to be expanding our research community. We are committed to building a hub of data science researchers that can accelerate the impact of data across disciplines to address pressing societal issues and forward positive social change. We are ecstatic to have researchers from Unity Health join our data science community,” says Lisa Strug, DSI Director.

May 26 2022

Bringing together the hammer and the nails – encouraging collaborations between methodologists and applied researchers

The Data Sciences Institute recently held a competition for Seed Funding for Methodologists. This funding is designed to catalyze new Collaborative Research Teams and encourage new partnerships between data science methodologists or theorists and applied researchers. Data science is inherently interdisciplinary and building capacity in data science has the potential to advance research frontiers across a broad spectrum of fields.

“This competition was about uniting cutting-edge methodologists with applied researchers to form new collaborations. By presenting and bringing to the fore innovative methodological and theoretical work, our goal is to ensure that new Collaborative Research Teams are forged with new and unexpected connections,” says Michael Brudno, professor at the Department of Computer Science, Faculty of Arts & Science, and chief data scientist at the University Health Network.

“Imagine that you have this amazing new hammer that you spent ages perfecting. But you are missing the nails on which to use your hammer. This seed funding is about finding those nails,” says Eyal de Lara, professor at the Department of Computer Science, Faculty of Arts & Science.

Presenting the three inaugural methodologists

Aya Mitani, from the Dalla Lana School of Public Health, is developing a methodology that applies multilevel matrix-variate analysis to longitudinally collected dental data while accounting for correlation. The unique correlation structure of teeth provides an excellent application area, and Mitani aims to connect with researchers and oral health practitioners to prevent and manage oral diseases with greater precision, improving oral and general health outcomes across populations by applying these new methods and tools.

Linbo Wang, from the University of Toronto Scarborough, Department of Computer and Mathematical Sciences is developing innovative tools to find causal relationships with observational and/or experimental datasets. These new tools will allow researchers to better understand the underlying causal mechanisms and help decision-makers make more informed decisions. There is broad and impactful potential for the application of these methods.

Murat Erdogdu, from the Faculty of Arts and Science, Department of Computer Science and Statistical Sciences is developing theoretical tools to compute the asymptotic generalization error of certain overparameterized estimators and characterize the convergence rate of overparameterized neural networks beyond the kernel regime. This new theoretical tool will enable researchers to more carefully develop machine learning models that take their model’s limitations into account, across many application areas.

Showcasing innovative data science methodologies

One key deliverable for this award is that recipients present their methodology or theory focusing on building new applied collaborations.

Join us on June 16 for a discussion on potential application areas as Mitani, Wang and Erdogdu present their innovative methodological techniques. We welcome applied researchers from any discipline interested in learning more about how these methodologies might be applicable to their research.