VizAudi
Vizaudi means Visualizing the audio! Speech-to-text is great, but why stop there? There's a lot more to audio than just words Just upload an audio file, and watch your screen come to life.
Created on 20th October 2019
•
VizAudi
Vizaudi means Visualizing the audio! Speech-to-text is great, but why stop there? There's a lot more to audio than just words Just upload an audio file, and watch your screen come to life.
The problem VizAudi solves
This can be used by people with hearing impairments, helping them not just understand what is being said, but also showing them what's happening around them visually (cars passing by, dogs barking, police/ambulance sirens, and even gunshots). We hope that it may bring them a step forward to experience the world as we do. It can also be used to enhance any audio-only form of media by augmenting it with relevant images, GIFs and text.
Challenges we ran into
We did not have any prior knowledge of working with audio. It took a lot of time just to understand how to process an audio file, but at the end of the day, we got to learn a lot.
The speech dataset was not clear enough and the amount of dataset that we managed to scope out of the internet was not enough to the train the model so we scraped youtube to get relevant speech data for which we made our python script and we also recorded the audio clips by ourselves using the mic.