Calculating the Error of a Speech Recognition AI

Rohan K -

Status Update: Manual correction for Voice Activity Detector (VAD) was completed on Tuesday. We reviewed 323 audio recordings and over 6000 sentences. After editing the labels, we have two files for each recording: the VAD’s labels (not perfectly accurate) and the corrected labels (accurate). Our goal is to make the VAD more accurate. The following is how I am approaching this task:

A VAD model is pre-trained (it already comes trained with hours of clear speech) and customizable (you can edit the ‘settings’ to perform better on your specific data). We adjusted the settings to accommodate disordered speech. After manually reviewing the timestamps, the accuracy was low. But how low is low? We need a quantifiable method of measuring accuracy. This week, I wrote code that goes through the sentences in the file with correct timestamps, matches them to the corresponding sentence in the VAD’s file with not-so correct timestamps, and calculates the mean absolute error between the timestamps. Ultimately, we have the VAD’s errorĀ for each audio recording, and if we take the average, we obtain one important number: the error of the VAD.

Now that we have the ability to calculate how well the VAD performs, we can adjust the settings, and see if the accuracy goes up or down. There are two techniques to find the best performing VAD settings: brute force and optimization. The brute force technique simply means calculating the error for every possible combination in the VAD’s settings, and selecting the one with the least error. This takes a lot of computational power and might break the computer. A more efficient method is optimization (hill climbing), where you make small adjustments to the settings and move toward the higher performing one, in order to find a more direct path to the top of the hill (highest accuracy). This is a machine learning technique where the model learns what settings work the best to produce the least amount of error.

Right now I have the error calculation ready. I am working on creating the machine learning model to train the VAD to give more accurate timestamps of the start and end of each sentence. After that, we can begin training the larger model to rate the sentences out of 10 on harshness, breathiness, and tremor. Progress is being made faster than expected, and I am now confident in my Python skills. I have found that the best way to learn how to code is to throw yourself in the middle of a project that works with real data!

More Posts


All viewpoints are welcome but profane, threatening, disrespectful, or harassing comments will not be tolerated and are subject to moderation up to, and including, full deletion.

    Hi Rohan, this is so interesting to read about! I am super impressed that you are diving so deep into the process and have gained such a strong understanding of determining accuracy with machine learning. What has been most useful for you to grasp these difficult concepts? Have you been doing a great deal of research independently or has your mentor/research team been the most helpful?
    Ms. Bennett - Trial and error is a huge part of coding. If I get an error after running code, I usually start by reviewing the code by myself to see if there is a quick fix. If I can't find anything, I will look on the internet to see if someone had the same issue. Most often I find a solution there. If I'm still not quite sure, I ask the post-doc, and we work through the code together to break down the source of the error.
    Moksha Dalal
    Hi Rohan! This is super interesting research. Is there a specific reason you are using Python over other code?
    Hi Moksha! The post-doc I'm working with uses Python for his scripts. I have found its very readable and comes with many useful packages such as librosa, which can read audio files.
    Hi Rohan! How much have you looked into the statistical side of machine learning? It looks like there is a lot of analysis that could be done.
    Hi Nick! Machine learning techniques require a ton of statistics that I do not fully understand yet. Luckily, a lot of open-source ML code takes care of these, so I don't have to worry so much about inventing my own statistical equations. Throughout the process, I'm still trying my best to understand the calculations. One example is the gradient descent calculation for optimization, which uses multivariable calculus.

Leave a Reply

Your email address will not be published. Required fields are marked *