Protocall
“From preparation to presence.”
Created on 3rd January 2026
•
Protocall
“From preparation to presence.”
The problem Protocall solves
The Problem Protocall Solves
Interview preparation today is largely one-dimensional. Most platforms focus on what a candidate says, but real interviews evaluate how a candidate communicates , their confidence, clarity, tone, and non-verbal presence. Access to realistic mock interviews with immediate, objective feedback is limited, expensive, or inconsistent, leaving candidates underprepared despite strong technical skills.
How Protocall Makes Interview Preparation Better
Protocall transforms interview practice into a realistic, AI-driven simulation using Google Studio AI, making preparation smarter, safer, and more effective.
🎯 What People Use Protocall For
- Practicing real-time, voice-to-voice mock interviews
- Improving communication, confidence, and clarity
- Preparing for role-specific interviews (Frontend, Backend, Leadership, System Design)
- Receiving objective, unbiased feedback without human pressure
- Tracking personal growth across multiple interview sessions
🧠 How It Makes Existing Tasks Easier & Safer
- Instant feedback instead of delayed or subjective reviews
- 24/7 availability without scheduling peers or mentors
- Privacy-first practice with no persistent data storage
- Reduced anxiety through judgment-free AI coaching
- Consistent evaluation using standardized AI metrics
🚀 What Makes It Different
Powered by Google Studio AI, Protocall uses multimodal intelligence to:
- Analyze spoken responses in real time
- Interpret non-verbal cues like eye contact and posture
- Adapt interview difficulty dynamically
- Deliver actionable insights immediately after the session
Impact
Protocall bridges the gap between preparation and performance by turning interview practice into measurable intelligence, helping candidates walk into real interviews with confidence, clarity, and control.
🛠 Tech Stack Used
Frontend
- React 19 – Leveraging concurrent rendering and modern hooks for a smooth, responsive UI.
- TypeScript – Strong typing for managing complex AI interaction states, audio streams, and UI logic.
- Tailwind CSS – Utility-first styling with a custom professional palette and Glassmorphism design.
Artificial Intelligence & AI Orchestration
- Genkit (Google AI Studio) – Used to orchestrate agent-based reasoning and manage multimodal AI workflows.
- Gemini 2.5 Flash (Native Audio) – Powers low-latency, real-time voice-to-voice interview conversations.
- Gemini APIs – Enables live transcription, reasoning, evaluation, and function calling.
- Function Calling – Allows the AI agent to update visual feedback in real time without interrupting the interview flow.
Multimedia & Web APIs
- Web Audio API – Real-time audio capture, PCM encoding/decoding, and streaming.
- MediaDevices API – Camera and microphone access for multimodal interaction.
- Canvas API – Video frame extraction for visual cue analysis.
#Data Visualization
- Recharts – Interactive radar and bar charts for post-interview performance analytics.
Infrastructure & Delivery
- ESM.sh – Zero-install ES module CDN for fast, serverless dependency delivery.
- HTML5 / CSS3 – Modern web standards for performance and accessibility.
Challenges we ran into
Challenges We Ran Through
Building Protocall required solving several technical and design challenges to deliver a realistic, low-latency, and privacy-first interview experience.
#🔊 Real-Time Audio Latency
Creating natural, voice-to-voice conversations was challenging due to strict latency requirements. We had to carefully manage raw PCM audio streaming and buffering to ensure smooth, uninterrupted dialogue using the Gemini Live API.
🎥 Multimodal Synchronization
Coordinating live audio, video frames, transcription, and AI reasoning in parallel was complex. Ensuring that visual cue analysis aligned accurately with spoken responses required precise timing and frame extraction logic.
#🧠 Agent-Based Reasoning Design
Designing an AI agent that could perceive, reason, and act in real time,without interrupting the interview flow,was a key challenge. This involved implementing function calling so the agent could update UI feedback dynamically while maintaining conversational continuity.
🖥 UI Feedback Without Distraction
Delivering real-time behavioral feedback (confidence, eye contact, posture) without overwhelming or distracting the user required multiple UI iterations. We balanced visibility and subtlety to maintain interview realism.
📊 Meaningful Evaluation Metrics
Translating subjective interview qualities like confidence and clarity into structured, measurable scores and radar charts was non-trivial. We refined scoring weights and feedback logic to ensure fairness and interpretability.
🛡 Privacy & Security Constraints
We deliberately avoided persistent data storage to protect user privacy. This required careful session handling and real-time-only processing, which added complexity to analytics generation.
#⚙️ Browser & Device Constraints
Working with Web Audio, MediaDevices, and Canvas APIs across browsers introduced compatibility and performance challenges, especially for camera and microphone handling.
Despite these challenges, overcoming them allowed us to build a robust, scalable, and immersive AI interview coaching platform that closely mirrors real-world interview dynamics.
Technologies used
