VizAudi

Vizaudi means Visualizing the audio! Speech-to-text is great, but why stop there? There's a lot more to audio than just words Just upload an audio file, and watch your screen come to life.

Built at Hack-A-BIT

Created on 20th October 2019

•

VizAudi

Vizaudi means Visualizing the audio! Speech-to-text is great, but why stop there? There's a lot more to audio than just words Just upload an audio file, and watch your screen come to life.

The problem VizAudi solves

This can be used by people with hearing impairments, helping them not just understand what is being said, but also showing them what's happening around them visually (cars passing by, dogs barking, police/ambulance sirens, and even gunshots). We hope that it may bring them a step forward to experience the world as we do. It can also be used to enhance any audio-only form of media by augmenting it with relevant images, GIFs and text.

Challenges we ran into

We did not have any prior knowledge of working with audio. It took a lot of time just to understand how to process an audio file, but at the end of the day, we got to learn a lot.
The speech dataset was not clear enough and the amount of dataset that we managed to scope out of the internet was not enough to the train the model so we scraped youtube to get relevant speech data for which we made our python script and we also recorded the audio clips by ourselves using the mic.

Technologies used

Keras

moviepy

LibROSA

FFmpeg

Discussion

Builders also viewed

See more projects on Devfolio