Archives for April 4, 2022

Advancing data science discovery via software development support

Data Sciences Institute (DSI) announces its first software development support

Data science research is becoming increasingly reliant on complex computer programming, but many researchers lack the training or experience in software engineering to develop effective and reliable software. The DSI’s software development program supports faculty and scientists at the University of Toronto and external funding partners to accelerate their research by providing access to highly skilled software developers to refine or enhance existing software and improve usability and robustness, build new tools, and disseminate research software. The DSI hopes to help develop software for researchers that can be accessed across disciplines and support reproducible processes.

Coming out of the first call for this competitive program, six researchers and their teams will be able to work with a DSI software developer to build high quality and adaptable software. The research projects reflect a wide range of fields, from humanities, social sciences, and life sciences.

With over 25 applications, we had a tremendous response for this first competitive call for DSI support. It was exciting to learn about the wide range of research projects needing software support at UofT. We are working to increase capacity for this important program to better support the cutting-edge research, while supporting the collaboration, equitable and open science principles at the DSI,

says Gary Bader, DSI associate director of data management, research software and advanced research computing.

A key part of creating collaborative, reusable software is ensuring that source code is available to the broader research community. To that end, DSI-supported research software will be publicly available and documented on GitHub, and GitHub will also be used to track projects and progress towards milestones.

There is so much data in the world now. This is a transformational change. Some researchers are very savvy with it, but others are just discovering it, and we are here to support them. I see myself as more of a technician, it’s really about the researchers and their teams and what they want to achieve. It has been exciting to be part of these projects,

says Conor Klamann, DSI senior software developer.

The next call for DSI research software developer support will be announced later this year.

Developing a web interface to help speech researchers

Ewan Dunbar and his team from the Department of French in the Faculty of Arts & Science, are working with the DSI to create a web interface that allows speech researchers to upload audio files and download “speech features” useful for speech processing. This software is helpful for many experimental and clinical speech researchers. However, installing it currently not only requires Python, but also dependencies that do not work on Windows. Once completed, Speech Features Online (SFO) will let users upload large audio datasets and select among available speech features with ease.

We are very excited about this project and thrilled to work with the DSI. We want to have a tool, but we also want to make it accessible, by taking research code and bundling it, so researchers know that it’s usable and understand what it’s doing. That takes a lot of work, and it’s really a software development task,

says Ewan Dunbar.

Professor Dunbar’s research focuses on human speech perception, automatic speech processing, and understanding the cognitive processes going on in the human brain. As a speech researcher, Dunbar is also working on tackling a major problem, the fact that speech technology is currently limited to a few languages for which researchers have access to lots of transcribed audio data, such as English.

The full list of projects from the DSI’s Research Software Development Support Program

Alan Moses from the Department of Cell & Systems Biology, Faculty of Arts & Science, and Julie Forman-Kay from The Hospital of Sick Children will work with the DSI to create a software program to help the research community with intrinsically disordered regions, which are protein sequences that do not take on a stable secondary or tertiary structure.

Dorothea Kullmann from the Department of French, Faculty of Arts & Science will work with the DSI to develop a database that will consist of two interrelated parts: 1) a catalogue of the late medieval manuscripts of this type kept in Canada; and 2) a text corpus of the French texts contained in these, and other manuscripts of the same type kept anywhere in the world.

Eunice Eunhee Jang from the Department of Applied Psychology and Human Development, OISE (Ontario Institute for Studies in Education) is working on curriculum-based learning tools that assess and track children’s emergent literacy and language development. Most standardized assessments are only designed to measure exceptionalities and are often inaccessible to parents and teachers. Working with DSI developers, the BalanceAI Discovery digital assessment tool addresses this gap.

Ewan Dunbar from the Department of French, Faculty of Arts & Science is working with DSI software developers to create a web interface that allows speech researchers to upload audio files and download “speech features” useful for speech processing.

Gregory Schwartz, University Health Network, and his team identified rare cancer cells which may contribute to disease progression. He will work with DSI developers to better understand cellular heterogeneity, by developing a suite of tools for clustering and visualizing single-cell data called TooManyCells.

Laura C. Rosella, Dalla Lana School of Public Health and Birsen Donmez, Department of Mechanical and Industrial Engineering, Faculty of Applied Science and Engineering will be working with DSI developers to apply Human Factors Engineering methods to build a user-friendly decision support tool for the Chronic Disease Population Risk Tool (CDPoRT). CDPoRT was developed and validated using population-level health system data to predict the future burden of chronic diseases.