VisionNodes
Audio-visual control system for blind
Created on 17th January 2026
•
VisionNodes
Audio-visual control system for blind
The problem VisionNodes solves
Visionnodes is assistive models and accessibility tools provide more capabilities to visually impaired users. They deliver higher accuracy, better context understanding, gestures or voice commands, multi-app control, and faster interaction. It processes visual inputs (hand gestures) and audio commands in parallel .
It uses 8 OnDemand agents differentiated by core responsibilities for automation like We used the system prompt to make the output specialized and better
1.Wikipedia search summary
2. air quality
3. video summary
4. send emails
5. health chat
6. google books agent ( for education purpose to BLIND people ) ,
7. chat bot
8. web summary
All these Ondemand are triggered based on user's voice words->sentence transformer -> cosine similarity -> triggered when threshold increased and then it Read Aloud the response to user to make it real time
These models support workflows, and integrate with modern web.
IN FUTURE we can integrate our entire local models and inference endpoints ( as of now due to privacy ) and workflow into the Agentic workflow given by OnDemand ...as of now we have integrated 8 agentic workflows for useful ease of them
Audio‑Visual control system for Blind Users
VisionNodes = multi-modal control + reasoning + safety + execution.
Core 5 layers (MVP stack)
Layer 1 — Perception inputs: Captures audio + visual signals together for robustness.
Layer 2 — Classification & processing: Converts raw signals into structured understanding locally (fast + private).
Layer 3 — Reasoning / intent: Maps signals into actionable intents so the system behaves predictably.
Layer 4 — Safety gates + fallbacks: Prevents accidental actions and handles uncertainty safely.
Layer 5 — Execution + triggers: Turns intent into real web actions and gives feedback.
Challenges we ran into
1-playwright automation execution errors
2-static and dynamic integration errors
3-threading issues
4-gating logic similarity and confidence logic issues
5-OnDemand integration issues
1- we used the inspect and then selection of OuterHTML and inner class names
2- we used sequential integration logic along with UI integration
3- we reduced the no. of threads to reduce computation and blocking
4- we used min-max normalisation
5- We refered to OnDemand docs and figured the requirements of it
Tracks Applied (4)
Open Innovation
SCAILE Track
SCAILE
OnDemand Track
Airev