V

Vision Voice

Enhancing Accessibility through AI and Voice Technology


The problem Vision Voice solves

Challenge: Over 285 million people worldwide are visually impaired.
Need: Many videos lack descriptive audio, making it difficult for the visually impaired to understand visual content.
Impact: Limited access to educational, entertainment, and informative content.

Challenges we ran into

Understanding YouTube's API:

  • The YouTube API documentation is extensive, and it took a lot of time to understand how to properly use it. Managing API rate limits was another headache, constantly hitting the limits and having to wait to continue testing.
  • The whole authentication and authorization process was cumbersome, requiring frequent tweaks and careful handling to ensure tokens were managed correctly.

Navigating YouTube's DOM Structure:

  • YouTube's DOM structure changes frequently, making it a moving target. Just when I thought I had figured it out, an update would come along and break everything.
  • Selecting and manipulating the right elements was a delicate task, and one wrong move could disrupt the entire page.

Handling Asynchronous Data Loading:

  • YouTube loads a lot of content asynchronously, which meant that my extension needed to be highly responsive to changes. MutationObservers were helpful but added complexity.
  • Ensuring that my code executed at the right time and place was a constant battle, especially with the dynamic nature of YouTube's content.

Discussion