Nearly 50% of pregnancies in India are high-risk, with major challenges including: Limited access to medical information
, Educational barriers (22.5% higher risk for uneducated women), Regional disparities in healthcare access and lack of mental health support. Our project is an Android-based conversational app aimed at providing physical and mental health support for them. It is built with React Native, aimed at delivering a conversational experience that feels natural and culturally resonant for Indian English speakers. When a user speaks to the app, it captures the audio, converts it to a 16kHz WAV format for optimal clarity, and then sends it to the backend through a RESTful WebSocket connection. Here, an Automatic Speech Recognition (ASR) model—tuned specifically for Indian English—transcribes the spoken content into text. This ASR model is designed to accurately pick up on the varied accents and unique expressions common in Indian English, making it well-suited for capturing the full intent of the user.
The transcribed text is then processed by the Llama 3.2 1B LLM, which interprets the meaning, considers context, and crafts a response that’s not only accurate but also conversationally fluid. This text response is then passed to a Text-to-Speech (TTS) model, also tailored for Indian English, which transforms it into audio with intonation and expression that aligns with local nuances. Finally, this audio is sent back to the app and played for the user, creating a conversational flow that feels both seamless and personal, all while retaining a distinctly Indian touch.
Dependency Management: We isolated and standardized dependencies using virtual environments and a requirements.txt file to avoid conflicts.
Version Control: A structured branching strategy and .gitignore helped manage merge conflicts and large files.
FFmpeg to PyAudio Migration: We adapted PyAudio for real-time processing with additional preprocessing to handle format compatibility.
Docker Configuration: docker-compose simplified network setups, though real-time data handling in Docker required additional management.
Audio Processing: Balancing audio clarity with low latency was achieved using librosa, pydub, and PyAudio optimizations.
Debugging: Structured logging and network tuning were essential for resolving issues in complex, multi-component interactions.
Network Issues: Network configuration adjustments improved bandwidth, port forwarding, and latency for stable communication.
WebSocket Deadlock: Managing thread lifecycles and concurrency settings minimized risks of deadlocks in real-time communication.
Data Flow Pipeline: Streamlined data transfer across ML models for improved efficiency.
Custom Text-to-Speech Model: Built and fine-tuned for enhanced speech clarity and user adaptation.
Prompt Engineering for Llama3.2 1B: Optimized prompts for precise, context-aware responses.
Real-Time Communication: Established a RESTful API for interaction between the Android app and server.
Interface Updates: Revamped the application’s interface and features for better usability.
Technologies used
Discussion