VisionNodes

Audio-visual control system for blind

Built at brAInwave 2.0

Created on 17th January 2026

•

VisionNodes

Audio-visual control system for blind

The problem VisionNodes solves

Visionnodes is assistive models and accessibility tools provide more capabilities to visually impaired users. They deliver higher accuracy, better context understanding, gestures or voice commands, multi-app control, and faster interaction. It processes visual inputs (hand gestures) and audio commands in parallel .

It uses 8 OnDemand agents differentiated by core responsibilities for automation like We used the system prompt to make the output specialized and better

1.Wikipedia search summary
2. air quality
3. video summary
4. send emails
5. health chat
6. google books agent ( for education purpose to BLIND people ) ,
7. chat bot
8. web summary

All these Ondemand are triggered based on user's voice words->sentence transformer -> cosine similarity -> triggered when threshold increased and then it Read Aloud the response to user to make it real time

These models support workflows, and integrate with modern web.

IN FUTURE we can integrate our entire local models and inference endpoints ( as of now due to privacy ) and workflow into the Agentic workflow given by OnDemand ...as of now we have integrated 8 agentic workflows for useful ease of them

Audio‑Visual control system for Blind Users
VisionNodes = multi-modal control + reasoning + safety + execution.

Core 5 layers (MVP stack)

Layer 1 — Perception inputs: Captures audio + visual signals together for robustness.
Layer 2 — Classification & processing: Converts raw signals into structured understanding locally (fast + private).
Layer 3 — Reasoning / intent: Maps signals into actionable intents so the system behaves predictably.
Layer 4 — Safety gates + fallbacks: Prevents accidental actions and handles uncertainty safely.
Layer 5 — Execution + triggers: Turns intent into real web actions and gives feedback.

Challenges we ran into

1-playwright automation execution errors
2-static and dynamic integration errors
3-threading issues
4-gating logic similarity and confidence logic issues
5-OnDemand integration issues

1- we used the inspect and then selection of OuterHTML and inner class names
2- we used sequential integration logic along with UI integration
3- we reduced the no. of threads to reduce computation and blocking
4- we used min-max normalisation
5- We refered to OnDemand docs and figured the requirements of it

Tracks Applied (4)

Open Innovation

Core Innovation lies in bridging the gap between the accessibility of visually impaired people with help of AI integrati...Read More

SCAILE Track

local first , visual disability barrier reduction , potential large scale for visually impaired people , future potentia...Read More

SCAILE

OnDemand Track

visionnodes uses 8 OnDemand agents differentiated by core responsibilities for automation like Wikipedia search summary ...Read More

Airev

Best Freshers Team

ALL OUR TEAM MEMBERS IN FIRST YEAR

Technologies used

Flask

PyTorch

scikit-learn

OpenCV

Deep Learning

PyAudio

Python

PyAutoGUI

Mediapipe

Discussion

Builders also viewed

See more projects on Devfolio