It’s Storming Outside and Inside: Organizing the Chaos of Speech Processing

Rohan K -


Charles River, Boston, MA. The arrow points to Mass Eye and Ear, where the Dystonia and Speech Motor Control Laboratory is on the 4th floor.


8:57 am. High winds. Current temp: 26º. Feels like: 19º. Looks like: 72º

The wind almost knocked me down as I was walking across Charles River from CrossFit to the lab this morning. The day after a snowstorm is apparently worse than the actual snowstorm because it looks nice but the temperature drops 20 degrees, and the wind gusts rise to 36 mph. I did not know this.

The revolving doors at the entrance of Mass Eye and Ear were very welcoming. Today, I met the director of the lab and my advisor, Dr. Kristina Simonyan. I completed all of the health and safety training, so now I can be added to the IRB protocol. By the end of the week, I should be added to the lab server and database, to give me access to speech samples and existing programs. Until then, I have been reading Intro to Speech Processing which breaks down the phonetics of speech and how to analyze samples computationally. Because I’m not sure yet what current research I can share publicly, for now, I can show you what I have been exploring on my own:

I set myself a small task: converting my voice into an image in Python. I opened Audacity, a sound recording software, and recorded a two-second sample of my voice: “this is a test.”

A spectrogram is an image showing amplitude (color) and frequency (y-axis in Hz) across a period of time (x-axis in seconds). Here is what my spectrogram looked like:

Mel spectrogram of “this is a test”.


Next, I labeled parts of the sentence I recognized in the spectrogram:

NOTE: This is not labeled according to any sort of phonetic system standard. I am currently learning how to label different phones (the smallest discrete segments of sound).


Now the challenge was transporting this data into Python. When you export the labels into a text file, Audacity saves the time bounds along with the min/max frequency for each label:

After importing the audio file, text file, and additional code into the Python audio library ‘librosa’, the following image returned:

Success! (After a lot of troubleshooting). The image above now contains statistical data that a computer can read. Each pixel contains three numbers for red, green, and blue (RGB). Example: pixel_x = [22 98 14]. For a computer, these three numbers hold computational value.

This small project was very simplified, but it helped me learn how to use Audacity and Python together. I am also learning how to code a model that determines whether an image is a dog or a road, but I will save that for the next post. Again, I am not sure how much of the laryngeal dystonia project I can share with you, but I will try my best.

More Posts


All viewpoints are welcome but profane, threatening, disrespectful, or harassing comments will not be tolerated and are subject to moderation up to, and including, full deletion.

    Thank you for sharing your experience with data and the transferring of data betwixt programs; visual representations of data are effective vectors for a faraway audience - and also for helping us imagine what it is to step into a new (frigid!) environment for mentorship. Your bit of bravery is an example of possibility. Keep it up! Super cool.
    Sounds so cold in Boston!! I also love the images you have included to help us understand your work. What were some of the difficulties that you ran into with this self-directed task?
    Ms. Bennett - There were several instances where I had to troubleshoot. The first was when I was trying to export the labels. My text file wasn't including the frequency min/max's for each label. It only showed the time bounds. After an hour of looking up solutions, it turned out to be a simple fix of changing the settings in Audacity. Another issue was locating the audio and text file paths and importing them into Python. It was my first time handling this so I asked Eddie, and he worked me through it. There is a lot of trial and error in coding, but I'm learning something new everyday.
    Elizabeth V
    This project is so cool Rohan! I love that you shared your process in detail with us, as the work you're doing is something I'm betting not many people (including me) are very familiar with. Do you have experience working with tools like spectrograms and coding or is this all new to you? I can't wait to hear more about your project!
    Great post, Rohan! It seems that the East Coast has given you a warm welcome! Glad to see that you are learning more and more every day. Can't wait to see more!
    Hi Libby! - I am also not familiar with spectrograms or coding, but one of the post-docs in the lab is helping me out, so it makes the learning process a lot easier.

Leave a Reply

Your email address will not be published. Required fields are marked *