Rohan K's Senior Project Blog

Project Title: Improving Speech Recognition Technology for Patients with Voice Disorders
BASIS Advisor: Bryn Sharp
Internship Location: Dystonia and Speech Motor Control Laboratory, Harvard
Onsite Mentor: Kristina Simonyan

Project Abstract

Speech recognition is an AI technology impacting our day-to-day lives. However, current models are biased towards healthy speech and do not perform well on tasks specific to disordered speech. The goal for my project is to piece together existing speech recognition models and fine-tune them to accommodate disordered speech. I will be analyzing audio data from the Dystonia and Speech Motor Control Laboratory at Harvard Medical School where speech recordings of patients with laryngeal dystonia and voice tremor are collected.

    My Posts:

  • Final Product

    For my final post, I wanted to share my final product which is an interactive demo where you can actually run my model! I recorded myself saying 20 sentences, and the model will extract each of them using a Voice Activity Detector (VAD) and a Speech-to-Text model. To watch a recording of the demo, scan... Read More

  • Back Home!

    Welcome back! My final day in the lab was last Tuesday, where I presented in front of the entire team of researchers. It went really well, and it was nice to get some feedback from the experts in the field who have done these presentations several times in their career. After my presentation we ate... Read More

  • The Presentation before The Presentation

    Hello all! Today is my last day working at the lab! Tomorrow I am coming in for the last time to give a presentation to all the lab researchers... This is even scarier than the actual presentation because everyone working here is an expert in the field. It will be a good experience though, because... Read More

  • RohansModel.extract_sentences

    Hello! Right now I am re-writing my model into a clean, easy to use package named RohansModel (I'm taking name suggestions in the comments!). For a user to implement my model, all they have to do is import it from a python file. Then, they can input a custom set of sentences for my model... Read More

  • Finalizing Results

    Welcome back! Last week I discussed my goal for the end result. I mentioned I wanted 90-95% accuracy for sentences found, ideally greater than 95% so that most of the time it would find all 20 sentences. After running my most recent model, I achieved 93.98% accuracy! We are very close to the end which... Read More

  • Measuring the Problem: Clear Speech vs. Disordered Speech

    Hello and welcome back! Last week I introduced the prospect of a new model called Whisper. As I continue to compare models and run optimization algorithms, I am also getting ready to prepare results. The first step is measuring the initial problem, showing why this work is important in the first place. This week I... Read More

  • Using OpenAI but not ChatGPT? Exploring Speech Recognition Systems

    Hello all! There are two components of this project: the VAD model and the Speech-to-Text (STT) model. The VAD feeds time clips recognized as human speech into the STT model, and then the STT model outputs transcribed sentences. However, working with disordered speech creates a higher error rate. The past two weeks, I have been... Read More

  • $50,000 Computer!?

    Hello everyone! I finished writing my random search algorithm to find the best parameters for the Voice Activity Detector. However... on the iMac that I'm working on, to search 1,000 parameter combinations it would take an estimated 330,000 HOURS! That's 37 years. Thankfully, the lab has a $50,000 Lambda Vector computer that can run my program... Read More

  • Calculating the Error of a Speech Recognition AI

    Status Update: Manual correction for Voice Activity Detector (VAD) was completed on Tuesday. We reviewed 323 audio recordings and over 6000 sentences. After editing the labels, we have two files for each recording: the VAD's labels (not perfectly accurate) and the corrected labels (accurate). Our goal is to make the VAD more accurate. The following... Read More

  • Labeling the Start of My Project: Voice Activity Detection

    Hello all! I'm excited to finally share with you a rundown of what my project consists of in regards to clinical applications. At my site placement, the Dystonia Speech and Motor Control Laboratory, patients with laryngeal dystonia participate in studies where they record their speech before and after treatment. The patient is given 20 sentences... Read More

  • It’s Storming Outside and Inside: Organizing the Chaos of Speech Processing

        8:57 am. High winds. Current temp: 26º. Feels like: 19º. Looks like: 72º The wind almost knocked me down as I was walking across Charles River from CrossFit to the lab this morning. The day after a snowstorm is apparently worse than the actual snowstorm because it looks nice but the temperature drops... Read More

  • Introduction

    Hello! My name is Rohan Kshatriya. Welcome to my Senior Project blog. After working on several AI + Biology projects last summer, I discovered the field of computational biology. For my project, I will be researching computational methods for studying laryngeal dystonia (LD), a rare neurological speech disorder. On Friday, February 9th, I will be... Read More