TalkingTom

Empowerment through Voice: Our innovative project harnesses the power of voice recognition and automation to assist differently-abled individuals in navigating their digital world effortlessly.

Created on 25th February 2024

•

Barrier-Free Access: The script ensures digital content is accessible to all users, including those with disabilities, by breaking down barriers to access.
Independence and Empowerment: By providing access to digital content, the script fosters independence and empowers differently-abled individuals to navigate the digital world with ease.

Streamlined Processes: The script automates tasks related to digital content, reducing manual efforts and improving efficiency.
Time-Saving: Automated processes save time for users, allowing them to focus on other important tasks.

One Script, Many Lives: With just one script, countless lives can be positively impacted, making it a powerful tool for change.
Inclusivity: The script promotes inclusivity by ensuring that digital content is accessible to everyone, regardless of their abilities

Handling Different File Types: Managing various file types (e.g., PDFs, text files, images) and ensuring the script can read, write, and manipulate them appropriately.
Speech Recognition Accuracy: Achieving high accuracy in speech recognition, especially for users with speech impediments or in noisy environments.
Text-to-Speech Quality: Ensuring the text-to-speech functionality provides clear and understandable output for users with visual impairments.
User Interface Design: Designing an intuitive user interface that is accessible and easy to use for all users, including those with disabilities.
Error Handling: Implementing robust error handling to gracefully manage unexpected situations and provide helpful feedback to users.

Handling Different File Types: We implemented specific functions and libraries to handle various file types, such as PyPDF2 for PDF files and PyMuPDF for image descriptor. This allowed us to effectively manage and manipulate different file formats within the script.
Speech Recognition Accuracy: To improve speech recognition accuracy, we experimented with different speech recognition engines and adjusted parameters for better performance. Additionally, we provided users with the option to correct recognized text, enhancing the overall accuracy of the system.
Text-to-Speech Quality: We utilized the pyttsx3 library for text-to-speech conversion and fine-tuned the voice settings to improve clarity and understanding. This ensured that users with visual impairments receive high-quality audio output.
Error Handling: Robust error handling was implemented throughout the script to catch and gracefully handle unexpected situations. This included informative error messages and prompts for users to retry actions if necessary.

Technologies used

Machine Learning

Pygame

Python

Speech Recognition

Natural Language Toolkit (NLTK)

Pillow

Text-to-Speech

sumy

PyPDF2

PyAutoGUI