Rohan K's Senior Project Blog
Project Title: Improving Speech Recognition Technology for Patients with Voice Disorders BASIS Advisor: Bryn Sharp Internship Location: Dystonia and Speech Motor Control Laboratory, Harvard Onsite Mentor: Kristina Simonyan |
Project Abstract
Speech recognition is an AI technology impacting our day-to-day lives. However, current models are biased towards healthy speech and do not perform well on tasks specific to disordered speech. The goal for my project is to piece together existing speech recognition models and fine-tune them to accommodate disordered speech. I will be analyzing audio data from the Dystonia and Speech Motor Control Laboratory at Harvard Medical School where speech recordings of patients with laryngeal dystonia and voice tremor are collected.
Final Product
For my final post, I wanted to share my final product which is an interactive demo where you can actually run my model! I recorded myself saying 20 sentences, and the model will extract each of them using a Voice Activity Detector (VAD) and a Speech-to-Text model. To watch a recording of the demo, scan... Read More
Back Home!
Welcome back! My final day in the lab was last Tuesday, where I presented in front of the entire team of researchers. It went really well, and it was nice to get some feedback from the experts in the field who have done these presentations several times in their career. After my presentation we ate... Read More
The Presentation before The Presentation
Hello all! Today is my last day working at the lab! Tomorrow I am coming in for the last time to give a presentation to all the lab researchers... This is even scarier than the actual presentation because everyone working here is an expert in the field. It will be a good experience though, because... Read More
RohansModel.extract_sentences
Hello! Right now I am re-writing my model into a clean, easy to use package named RohansModel (I'm taking name suggestions in the comments!). For a user to implement my model, all they have to do is import it from a python file. Then, they can input a custom set of sentences for my model... Read More
Finalizing Results
Welcome back! Last week I discussed my goal for the end result. I mentioned I wanted 90-95% accuracy for sentences found, ideally greater than 95% so that most of the time it would find all 20 sentences. After running my most recent model, I achieved 93.98% accuracy! We are very close to the end which... Read More
Measuring the Problem: Clear Speech vs. Disordered Speech
Hello and welcome back! Last week I introduced the prospect of a new model called Whisper. As I continue to compare models and run optimization algorithms, I am also getting ready to prepare results. The first step is measuring the initial problem, showing why this work is important in the first place. This week I... Read More
Using OpenAI but not ChatGPT? Exploring Speech Recognition Systems
Hello all! There are two components of this project: the VAD model and the Speech-to-Text (STT) model. The VAD feeds time clips recognized as human speech into the STT model, and then the STT model outputs transcribed sentences. However, working with disordered speech creates a higher error rate. The past two weeks, I have been... Read More
$50,000 Computer!?
Hello everyone! I finished writing my random search algorithm to find the best parameters for the Voice Activity Detector. However... on the iMac that I'm working on, to search 1,000 parameter combinations it would take an estimated 330,000 HOURS! That's 37 years. Thankfully, the lab has a $50,000 Lambda Vector computer that can run my program... Read More
Calculating the Error of a Speech Recognition AI
Status Update: Manual correction for Voice Activity Detector (VAD) was completed on Tuesday. We reviewed 323 audio recordings and over 6000 sentences. After editing the labels, we have two files for each recording: the VAD's labels (not perfectly accurate) and the corrected labels (accurate). Our goal is to make the VAD more accurate. The following... Read More
Labeling the Start of My Project: Voice Activity Detection
Hello all! I'm excited to finally share with you a rundown of what my project consists of in regards to clinical applications. At my site placement, the Dystonia Speech and Motor Control Laboratory, patients with laryngeal dystonia participate in studies where they record their speech before and after treatment. The patient is given 20 sentences... Read More
It’s Storming Outside and Inside: Organizing the Chaos of Speech Processing
8:57 am. High winds. Current temp: 26º. Feels like: 19º. Looks like: 72º The wind almost knocked me down as I was walking across Charles River from CrossFit to the lab this morning. The day after a snowstorm is apparently worse than the actual snowstorm because it looks nice but the temperature drops... Read More
Introduction
Hello! My name is Rohan Kshatriya. Welcome to my Senior Project blog. After working on several AI + Biology projects last summer, I discovered the field of computational biology. For my project, I will be researching computational methods for studying laryngeal dystonia (LD), a rare neurological speech disorder. On Friday, February 9th, I will be... Read More