AutVid AI

Video emotion detection for Autistic Individuals

Created on 7th September 2025

•

AutVid AI

Video emotion detection for Autistic Individuals

The problem AutVid AI solves

This project, the Real-Time Multi-Modal Emotion Analyzer, solves the problem of understanding complex emotional cues in social interactions, particularly for individuals who find this challenging.

It acts as a "social interpreter," providing a combined analysis of facial expressions, vocal tone, and spoken language in real-time. This helps users, especially those with conditions like autism, to get a clear, concise summary of the emotional context, which can be difficult to interpret from non-verbal and verbal cues alone.

In essence, it simplifies a complex social interaction into an easy-to-understand emotional summary, bridging a gap in communication and social understanding.

Challenges we ran into

A significant hurdle in developing the "Real-Time Multi-Modal Emotion Analyzer" was the performance of the large language model (LLM), specifically Meta's Llama 3. The original plan was to use Unsloth for fast, efficient summarization on a GPU. However, a major issue arose: Unsloth is designed to accelerate training and inference on GPUs, and it simply did not function correctly or provide any speed benefits without a GPU. This meant that running Llama 3 on a CPU was incredibly slow, making the "real-time" aspect of the project unfeasible. The summarization step would take several seconds, disrupting the smooth flow of the application.

The initial thought was to switch to a smaller, more CPU-friendly model like BERT, which is known for its efficiency on standard hardware. However, this would have meant sacrificing the nuanced, human-like summarization capabilities of a more powerful model like Llama 3.

Technologies used

HTML

CSS

Flask

Docker

Python

Whisper

Streamlit

Unsloth

yolov11

LLAMA3-8B

Discussion

Builders also viewed

See more projects on Devfolio