Data Sciences Institute (DSI)

New approaches to data unlocking benefits for social scientists

The Data Sciences Institute (DSI) and the University of Toronto Scarborough, DSI@UTSC, are leading a tri-campus initiative to encourage research activity in Computational and Quantitative Social Science (CQSS) that includes community-building, research funding, and training in CQSS.

The social sciences are undergoing a data sciences revolution spurred on by new statistical and algorithmic techniques, rapid advances in high-performance computing, as well as the proliferation of large, complex, and heterogeneous data structures. These developments present exciting opportunities as well as new challenges for social scientists.

Assistant Professor of Sociology Ethan Fosse is an Associate Director of the DSI. “The Institute came about when a number of scholars across all three campuses recognized that there is an emerging area at the intersection of technical fields such as computer science, theoretical statistics, and mathematics and domain-specific fields such as biology, physics, political science, and sociology,” says Prof. Fosse. “With new data, novel algorithms, and rapid increases in computing power, there is the potential to truly accelerate knowledge production in a wide range of subject areas.”

But what does this mean, practically, for researchers in the social sciences? The answer is that there are new, powerful ways of analyzing complex data structures commonly used by social scientists, such as temporal, textual, spatial, and network-based data. For example, many social scientists are experts at examining unstructured text data, such as open-ended survey responses, in-depth interviews, or ethnographic field notes. However, new computational techniques further allow social scientists to automatically summarize and group the data, significantly reducing the amount of labour, time, and monetary cost required to conduct interpretative analyses.

For Prof. Fosse, who has written on the usefulness of these techniques for social scientists and has used them in his own research, the next step is to increase training on these new methods, in addition to helping social scientists become aware of the benefits such techniques can bring to their research. As well, he is working on developing programs to foster research collaborations between data scientists on the one hand and social scientists on the other. The DSI currently provides a number of training opportunities for social scientists as well as grants for projects using the data sciences in innovative ways.

The Institute aims to build an interdisciplinary network of researchers who can be an advocate for the importance of the data sciences in social science research. DSI membership is free, and members can be faculty, staff, or students of the University of Toronto and/or a member of a University of Toronto affiliate (for example, affiliated hospitals and research institutes).

Article by: David Blackwood, Dept. of Anthropology, Dept. of Health and Society, Dept. of Sociology, University of Toronto Scarborough

Gift from Schmidt Futures to spark a revolution in AI-based STEM research at the University of Toronto

The Data Sciences Institute (DSI) is excited to co-lead the prestigious Eric and Wendy Schmidt AI in Science Postdoctoral Fellowship a program of Schmidt Futures 

With the goal of accelerating scientific research through the application of artificial intelligence, Schmidt Futures is investing $148-million in nine global universities, including the University of Toronto.

The announcement launches the Eric and Wendy Schmidt AI in Science Postdoctoral Fellowship, a program of Schmidt Futures. A large-scale initiative supporting the work of early-career scholars in engineering and the natural sciences, such as mathematics, chemistry or physics, the program fosters their uptake of vital tools in artificial intelligence.

Artificial intelligence is not just a field in its own right but also an important tool for research. It can find patterns to enable research that solves important challenges—across fields from climate change to human health and beyond—more quickly and more efficiently. To accelerate the adoption of AI into scientific methodologies, the Schmidt AI in Science Postdocs initiative aims to spark a significant increase globally in the number of scientists working with cutting-edge AI tools.

A wide-ranging vision for solving global challenges

Schmidt Futures is a philanthropic initiative, founded by Eric and Wendy Schmidt, that brings talented people together in networks to prove out their ideas and solve hard problems in science and society.

The CEO of Google from 2001 to 2011, Eric Schmidt has hands-on experience with the transformative power of finding and supporting innovative minds—at scale. Wendy Schmidt, a journalist and a competitive sailor, has created multiple non-profits in the areas of global sustainability and human rights. With Schmidt Futures, their focus is on building networks of visionary minds with the talent to solve society’s problems.

The University of Toronto is Canada’s leading research university and the home of seminal work in artificial intelligence, from deep learning and neural networks to the interfaces between AI and the natural sciences.

“As the home of deep learning, the University of Toronto is proud to partner with Schmidt Futures on this forward-looking program, which will accelerate humanity’s ability to meet some of the most important challenges of our time,” said Meric Gertler, president of U of T. “The Schmidt AI in Science Postdocs program provides tremendous opportunities for the emerging generation of STEM researchers. On behalf of the U of T community, I would like to thank Schmidt Futures for their vision and generosity.”

The University of Toronto is the only Canadian university chosen for the program. Its highly diverse community—its existing postdoctoral fellows come from 89 countries—and global links make it an ideal centre to support the Schmidt AI in Science Postdocs global network.

“The Eric and Wendy Schmidt AI in Science Postdoctoral Fellowship, a program of Schmidt Futures, will create an immediate acceleration of AI applications across several disciplines. We are proud to partner with these exceptional universities, especially the University of Toronto, on this important initiative,” said Stu Feldman, chief scientist at Schmidt Futures. “The Fellowship will provide these postdoctoral fellows with the advanced tools to increase the scope and speed of their research while discovering new and innovative use cases for AI within their field. U of T’s thoughtfully crafted program design, strong base of alumni in the scientific world, high volume of leading-edge scientific research, and deep history of important AI research give us full confidence in an impactful outcome.”

Creating a cohort of AI-fluent researchers

The Schmidt AI in Science Postdocs program will support nearly 300 postdoctoral fellows each year for six years. U of T hosts 10 in the first year of the program and 20 annually thereafter. The support includes networking and research collaborations between participating universities; a robust series of workshops, conferences and lectures; and training in how to apply AI techniques.

The fellows will not only expand the scope of their own research but will also establish their careers as AI-fluent scientists, ready to expand new research methodologies across a range of fields through their future work.

At U of T, the Schmidt AI in Science Postdocs becomes one of the university’s most prestigious postdoctoral programs. Working closely with the Vector Institute for Artificial Intelligence, two senior faculty members lead the initiative. Alán Aspuru-Guzik is the director of U of T’s Acceleration Consortium, a global network of researchers, industry and government that is leading a convergence of materials science with AI and robotics. Lisa Strug is the director of U of T’s Data Sciences Institute, one of the world’s largest clusters of scientists working on innovative approaches to data that drive actionable research insights.

“The Data Sciences Institute (DSI) is excited to co-lead the Eric and Wendy Schmidt AI in Science Postdoctoral Fellowship. This large-scale initiative supports postdoctoral researchers in engineering and the natural sciences by giving them vital tools in artificial intelligence. The DSI is thrilled to have the opportunity to support this new prestigious global program and help early career researchers innovate in their fields,” says Lisa Strug, director of the DSI and Associate Director of The Centre for Applied Genomics at The Hospital for Sick Children.

Canada Research Chair in Genome Data Science, Lisa Strug is a statistical geneticist in the Faculty of Arts & Science who develops novel approaches to identifying the genetic contributors to complex human disease. She is cross-appointed to the Dalla Lana School of Public Health and the Hospital for Sick Children and is also the director of the Canadian Statistical Sciences Institute, Ontario Region.

As a CIFAR AI Chair at the Vector Institute for Artificial Intelligence and the Canada 150 Research Chair in Theoretical and Quantum Chemistry, Aspuru-Guzik works to accelerate the discovery of new molecules and materials needed for a sustainable future, using novel, disruptive approaches. He is also a Google Industrial Research Chair in Quantum Computing and is the founder of two startups.

Canada Research Chair in Genome Data Science, Lisa Strug is a statistical geneticist in the Faculty of Arts & Science who develops novel approaches to identifying the genetic contributors to complex human disease. She is cross-appointed to the Dalla Lana School of Public Health and the Hospital for Sick Children and is also the director of the Canadian Statistical Sciences Institute, Ontario Region.

As a CIFAR AI Chair at the Vector Institute for Artificial Intelligence and the Canada 150 Research Chair in Theoretical and Quantum Chemistry, Aspuru-Guzik works to accelerate the discovery of new molecules and materials needed for a sustainable future, using novel, disruptive approaches. He is also a Google Industrial Research Chair in Quantum Computing and is the founder of two startups.

“Thank you, Schmidt Futures, for this generous vote of confidence in U of T programming and in the exceptional talents who thrive in our postdoctoral programs,” said Leah Cowen, U of T’s vice-president for research, innovation, and strategic initiatives. “The prestigious Schmidt AI in Science Postdoc program will help catalyze novel solutions to tough challenges. It is the kind of thoughtful support that powers real innovation.”

A summer of learning, fun and community for 2022 DSI SUDS Scholars

The Data Sciences Institute (DSI) welcomed 35 carefully selected undergraduate students from across Canada for a rich data sciences research experience. The Summer Undergraduate Data Science (SUDS) Opportunities Program is a great way for undergraduate students to engage in hands-on research led by DSI member faculty and scientists. The program has a cross-disciplinary approach, applying data science skills in various fields including the humanities, life science, engineering, public health, and more.

“The DSI SUDS program is about inspiring the next generation of data scientists and giving undergraduate students the chance to explore data science as a career. In addition to their research projects, SUDS Scholars are provided with data science skills and professional development opportunities. We couldn’t be more thrilled to have the chance to inspire them and hopefully kickstart their careers in this exciting field. They are truly an exceptional bunch!” says Laura Rosella, DSI associate director of education and training.

Interested in proposing a SUDS research opportunity? Applications are now open!

Student applications open on December 19, 2022, for the 2023 SUDS opportunities.

SUDS Scholars praise the program

SUDS Scholars participated in weekly speaker seminars, data science skills, and professional development opportunities. They had numerous networking events where they got to know more about the wide variety of SUDS projects. They also requested an easier way to stay in touch and connect. To accommodate this request the DSI set up a Zoom Chat channel.

“We were so excited to see that the Scholars wanted to stay in touch, network, and learn from each other. It was so great to see a community form. We look forward to our 2023 SUDS program and continue to support this community of students from diverse backgrounds,” says Wenzhe Xu, DSI’s programming coordinator and SUDS officer.

Scholars presented their research projects and data methods at SUDS Research Day, where students also voted for the best presentation. This year’s winner was Lauren Gill from the University of British Columbia who was studying Data Science for White Shark Conservation with Vianey Leos Barajas an assistant professor in the Department of Statistical Sciences at the Faculty of Arts & Science.

Group photo from SUDS Research Day,

Other 2022 SUDS scholars included Yingke Wang who was working with Rahul Krishnan, an assistant professor in the Department of Computer Science and the Department of Laboratory Medicine & Pathobiology within the Temerty Faculty of Medicine, on machine learning for chronic disease management. 

“Thanks to SUDS, I had the opportunity to learn how to combine machine learning algorithms in the healthcare industry as well as explore survival analysis. Plus, the self-learning skills I gained will be essential to me for approaching graduate study,” says Wang, a member of St. Michael’s College.

“It's been wonderful to see the support that SUDS provides to young scholars like Yingke,” says Krishnan. “Introducing students to research early is an important step for them to see the opportunities that graduate study can provide."

SUDS scholar and Innis College member, Tina Tsan worked with Ulrich Wortmann, an associate professor in the Department of Earth Sciences at the Faculty of Arts & Science, on an analysis of why the last ice age came to a sudden end.

“For me, the biggest reward from the SUDS program has been how it’s broadened my perspective and understanding of what data science is and how it's used in different fields,” Tsan says.

SUDS scholar Anthony McCanny, a member of Victoria College where he was a Northrop Frye Centre Undergraduate Fellow, worked with Felix Cheung, an assistant professor in the Department of Psychology at the Faculty of Arts & Science. They explored whether gross domestic product (GDP) is a good measure of economic and societal success, and what type of government spending improves the lives of citizens.  

“The SUDS program filled my summer with an unbelievable amount of learning, fun, joy and community,” says McCanny. “I’ve been very lucky in Professor Cheung’s lab to have the freedom to conduct my own research, paired with great guidance. It’s hard not to feel like this summer has redefined my path in life, filling me with enthusiasm for a career in research, and connecting me with people that I hope I get to keep working with.”

Building data science software to help the fight against cancer

Tumors, much like people, are different from one another. In fact, not only can the same type of tumor vary from person to person, but there can also be variations within the tumor itself, as a single tumor is comprised of a diverse population of cells. This tumor heterogeneity makes it difficult for researchers to create effective treatment plans. This is where Dr. Gregory Schwartz and his team at the University Health Network, and Medical Biophysics at the University of Toronto, come in with the help of the Data Sciences Institute’s (DSI) research software development support program.

Interested in applying for the DSI’s research software development support program? Apply by October 21, 2022, for our next round of applications. 

The DSI’s software development program is designed to support faculty and scientists by providing access to highly skilled software developers to refine or enhance existing software and improve usability and robustness, build new tools, and disseminate research software. The DSI has been supporting six projects since its first call. The DSI’s senior software developer, Dr. Conor Klamann worked with Schwartz and his team.

Helping understand cellular heterogeneity in cancer with TooManyCells

Genetic heterogeneity within a tumor occurs due to imperfect DNA replication. When healthy cells divide to create new cells, it can lead to mutations. When cancerous cells divide, mutations can also occur causing tumor heterogeneity. However, these diverse populations of cells can also exhibit non-genetic heterogeneity in response to treatment, changing their behaviour based on their surrounding environment independent of mutation. To measure cell behavior at the resolution of individual cells, researchers are using new single-cell technologies. This produces a massive amount of detailed data and subsequently requires sophisticated computational tools to interpret.

To better understand heterogeneity and drug resistance in cancer, Schwartz and his team developed TooManyCells, a suite of tools designed for clustering and visualizing single-cell data. The visualization component of TooManyCells’ is custom-made and presents cell relationships as a tree. By using TooManyCells, the team could identify rare cancer cells which were contributing to disease progression.

However, the software had some limitations.  

“The limitation of TooManyCells was that it took time to build a tree. These trees can be quite large, so to visualize major cell populations you would have to prune the tree several different ways and rerun the program repeatedly. You also didn't really know which way was the right way to prune the tree and colour it until you saw the output,” says Schwartz. “So that’s where this opportunity to work with the DSI’s research software development support program came in.”

“It's wonderful to have a fantastic software developer like Conor devoting his time to facilitating these kinds of projects, which are not easy to get off the ground. They are absolutely necessary and required in these fields but have surprisingly few funding opportunities. So, it's fantastic that these kinds of avenues exist,” says Schwartz about the program.

How is the project developing?

 

TooManyCells tree.

The goal of this project was to provide a graphical user interface for the analysis tools that Schwartz and his team developed. The details have evolved with time but creating an interactive tool to speed up analyses and improve user experience has always been at the heart of the project. Currently, the software development team at the DSI has a prototype in place and is working on collecting user feedback. The research team is also preparing an article describing the software, and once it has been completed, the source code will be made public on the Schwartz Lab GitHub page so that other researchers may access it.

“It's been a pleasure working on TooManyCells! It's given me the opportunity to combine various programming frameworks in ways I haven't done before while supporting some very interesting research,” says Conor Klamann, DSI senior software developer.

Conquer the world of data science with the DSI Data Science Certificate

The world runs on data — and a new certificate is set to help people develop the skills they need to become leaders in the field.

The Data Sciences Institute (DSI) at the University of Toronto has launched a Data Science Certificate to help professionals gain essential job-ready skills, which will support them to open doors to new advancements and employment opportunities.

“The University of Toronto is a global leader in data sciences,” says Lisa Strug, director of the DSI, Professor of Statistical Sciences, Computer Science and Biostatistics and senior scientist at The Hospital for Sick Children. “The demand for skilled, fluent and adaptable data science expertise is expanding. To keep pace with the scale of change, the DSI has an opportunity to lead in the shift from a knowledge-based to a learning-based model where upskilling is an ongoing opportunity for learners and no job opportunity is ever out of reach.”

Estimates suggest that 2.5 quintillion bytes of data are generated every day. It’s not surprising that professionals increasingly find that data science skills are in demand. Society is experiencing a transformative shift in the production, collection and use of data. As a result, organizations need skilled professionals capable of analyzing large amounts of data, uncovering valuable insights and defining the story hidden in the numbers.

Previous experience with data science isn’t needed to apply. The only prerequisite for the certificate is a degree in a field outside of computer science or statistics.

Why is the DSI offering this certificate?

The DSI is a central hub and incubator for data science research, training and partnerships at U of T. The DSI is accelerating the impact of data across disciplines to address pressing societal issues and drive positive social change. Training is an integral component of the DSI’s mission, aligned with the University’s aim to support life-long learning.

Learn from private-sector experts

The DSI Data Science Certificate offers the unique opportunity to learn from private-sector experts during the case studies in each course. The case study component provides learners with important insights into the professional world of data science analytics.

“The DSI Data Science Certificate is built around a series of core courses essential to establishing a strong foundation in data science. These courses are designed to take someone without data science expertise and give them the confidence to excel in any data-driven field. It also includes case studies from leading experts. We are very excited to be launching this certificate and have big plans to expand our offerings in the future,” says Rohan Alexander, assistant professor in the Faculty of Information and Department of Statistical Sciences.

In addition, the certificate offers busy professionals flexibility. The certificate is fully online, and learners can choose a single course to improve their skills in a specific area or earn a full certificate by taking six of the eight courses offered. The courses are designed to ensure learners master the core competencies in foundational data science, including SQL, R and Python, and gain hands-on experience through real-world case studies.

What pilot participants are saying

The DSI ran a successful set of course pilots with over 100 learners over the summer.

“For a beginner, I found that it provided an amazing overview! The flow was well-paced. It was a lot of information at once sometimes, but I was able to manage as I could go back and review items when off class time. The sequence of the course material makes complete sense as you move forward in the course. It all tied in together,” says one participant.

“Instructors were very knowledgeable, helpful and engaging! Good class size; also attracted collaborative and enthusiastic students with a variety of competencies. It was very helpful to be asked to ask questions in the public chat, which encouraged collegiality,” says another participant.