NavVision

""Guiding the way, Empowering every step "" (AI-powered Augmented Reality (AR) navigation system designed to assist visually impaired individuals in navigating their surroundings.)

Built at Hack JMI 2025

Created on 8th February 2025

•

NavVision

""Guiding the way, Empowering every step "" (AI-powered Augmented Reality (AR) navigation system designed to assist visually impaired individuals in navigating their surroundings.)

The problem NavVision solves

Moving around is a big challenge for millions of visually impaired people. Although aids such as guide dogs, GPS-enabled devices, and white canes help them move around, they are unable to sense surrounding environments such as obstacles in the distance or help with navigation in real-time. This makes them overly dependant on others and makes them lose self-confidence when placed in unfamiliar situations.
Many assistive technologies that can overcome these issues tend to be expensive or fail to provide instant responsive feedback. For instance, GPS systems give sequential instructions for movement but they do not work when one is about to hit a pole, stairs, or a moving vehicle. In addition, most AI systems are not practical to use in ever-changing situations since they need constant interaction with the user.
NavVision solves these problems by using AI, AR, and computer vision for instant object recognition. This means NavVision can provide audio instructions in real time too. It uses a camera powered by YOLO and works with OpenCV for distance recognition, detects obstacles, evaluates risks, and issues instant voice warnings regarding distance, direction, and the threat level of the obstacle. It not only uses voice alerts, but its frame design obstructs the central view to guarantee the user's vision is always clear, resulting in users being guided to safe routes instead of just being warned of obstacles.
NavVision voice guidance is instantaneous so users get real-time feedback. Users do not have to look at the device to make it work, it also responds to voice and motion commands. Furthermore, text recognition aids users read public signs and labels.
NavVision integrates affordability, user-friendliness and AI-driven superintelligence to provide a sophisticated and powerful mobility aid, enabling visually impaired individuals to move around the world with confidence and independence.

Challenges we ran into

The development of NavVision presented a number of practical and technical difficulties, all of which called for creative solutions to guarantee the dependability and efficiency of the system. Latency was one of the main issues; it was essential to process real-time visual data while maintaining quick reaction times. At first, delays caused by object detection and distance estimates rendered real-time navigation unfeasible. We overcame this by streamlining computations in OpenCV and optimizing the YOLO model for efficiency, which decreased processing time and improved responsiveness.
Accurate distance estimation was another significant obstacle. The system required to accurately measure an object's distance in order to evaluate whether or not it posed a risk to the user; simply recognizing things was insufficient. We used object size as a depth indication because depth sensors were not an option, but this resulted in inconsistent results.
Another challenge we encountered was voice latency. At first, there was a discernible lag in the auditory feedback caused by the text-to-speech (TTS) technology, which delayed user responses. For real-time navigation, this presented a challenge. In order to guarantee timely transmission of navigation instructions, we fixed this by improving the audio pipeline, preloading frequently used phrases, and fine-tuning the TTS engine.
There were also limits due to hardware. Real-time inference was challenging when AI models were run on edge devices with constrained processing resources. We fixed this by optimizing the model to function well on less powerful hardware without compromising accuracy.
Furthermore, accurate object recognition was difficult in complicated settings. Accuracy was frequently decreased by crowded spaces, moving objects, and inadequate lighting. In order to combat this, we used a variety of datasets to train the model and applied adaptive thresholding for better detection.

Technologies used

PyTorch

NumPy

OpenCV

Python

pyttsx3

YOLO

Ultralytics

Camera Module

Discussion

Builders also viewed

See more projects on Devfolio