Skip to content
S

SWARNAM.AI

SWARNAM

The problem SWARNAM.AI solves

LLM-to-Agent Transformation Bottleneck
Current LLMs are conversational but not operational. Developers struggle to build truly agentic AI systems that can perceive, reason, and act in real-world environments. Existing frameworks either lack real-time capabilities, have poor tool orchestration, or can't maintain context across complex multi-step operations.

What People Can Use This Framework For
๐Ÿค– Build Production-Ready AI Agents

Customer service agents that can access CRM, process refunds, and update tickets while talking to customers
Personal assistants that manage calendars, book travel, and coordinate across multiple platforms via voice
Healthcare agents that can access patient records, schedule appointments, and provide real-time medical guidance
E-commerce agents that process orders, check inventory, and handle returns through natural conversation

โšก Eliminates Development Complexity

Before: Months of building custom WebSocket handlers, audio processing, tool orchestration, and context management
After: Deploy multi-modal agents in days using pre-built voice pipeline, LLM tool selection, and API integration framework

๐Ÿ”„ Real-Time Bidirectional Intelligence

Sub-3-second response times with complex multi-API workflows and persistent context
Intelligent tool chaining where agents autonomously decide which APIs to call in sequence
Multi-modal input/output supporting voice, text, and structured data simultaneously
Cross-platform state management maintaining context across different systems and sessions

๐ŸŒ Universal Agent Development Platform

Language-agnostic supporting any combination of speech recognition and synthesis models
API-flexible integrating with any REST/GraphQL endpoints through standardized tool definitions
Authentication-aware handling OAuth, API keys, and complex auth flows automatically
Deployment-ready with production-grade error handling, logging, and monitoring

๐Ÿ”ง Framework Applications Beyond Finance

Healthcare: Voice-controlled patient management with EHR integration
Education: AI tutors with real-time access to learning management systems
Logistics: Voice-controlled inventory and shipping coordination
Real Estate: Property search and management through conversational interfaces

Challenges we ran into

Challenges I Ran Into

1. Parallel Processing Architecture Bottleneck

Problem: Initial sequential processing caused 8+ second delays - audio processing โ†’ STT โ†’ LLM โ†’ tool execution โ†’ TTS โ†’ audio output created unacceptable latency.
Solution: Built dual-stream architecture with concurrent audio buffering and LLM inference, reducing end-to-end response time to sub-3 seconds through parallel pipeline execution.

2. SarvamAI Audio Format Compatibility

Problem: SarvamAI streaming API only accepts WAV format, but Twilio streams 8kHz ยต-law encoded audio, causing complete audio processing failure.
Solution: Implemented real-time audio conversion layer using

audioop-lts

for Python 3.13 compatibility, creating seamless ยต-law โ†” WAV conversion pipeline.

3. Bhindi Platform API Gateway Limitation

Problem: Bhindi doesn't provide developer API gateway access, blocking external model integration and tool execution outside their platform.
Solution: Built custom Express.js proxy server that abstracts Bhindi's internal APIs, exposing standardized REST endpoints with proper authentication for external developers.

4. Web2 Payment Gateway Constraints

Problem: Traditional payment gateways like Cashfree/Razorpay have settlement delays, KYC requirements, and limited programmability for agent-driven transactions.
Solution: Architected hybrid payment infrastructure supporting both Web2 APIs for compliance and Web3 protocols for instant, programmable payments.

5. Multi-Chunk TTS Audio Fragmentation

Problem: SarvamAI TTS returns audio in multiple chunks, causing incomplete speech playback and broken user experience during longer responses.
Solution: Developed intelligent WAV file combination algorithm that merges multiple audio chunks into single coherent audio stream before Twilio transmission.

6. Context Loss Across Tool Chains

Problem: LLM losing conversation context during complex multi-step operations like "pay Daksh" โ†’ fetch expenses โ†’ calculate debt โ†’ create payment link.
Solution: Implemented persistent state management with 24KB context buffer, maintaining conversation flow and user relationships across 16+ tool executions.

7. Token Limit Overflow in Expense Processing

Problem: Large Splitwise expense datasets caused LLM token limit exceeded errors, breaking tool execution and user experience.
Solution: Built smart expense summarization system limiting to 15 most recent transactions with intelligent name matching using word-set algorithms for robust person identification.

Progress made before hackathon

Progress Made Before Hackathon

Pre-Existing Knowledge & Experience

Prior SarvamAI Integration Experience: Had previously worked with SarvamAI's speech-to-text and text-to-speech APIs in earlier projects, understanding their model capabilities (saaras:v2.5 for STT, bulbul:v2 for TTS) and API response formats.

General AI/LLM Architecture Knowledge: Brought foundational understanding of LLM tool-calling patterns, temperature-based response generation, and basic audio processing concepts from previous development work.

Tracks Applied (2)

Sarvam AI Track

SarvamAI Speech Processing Integration Real-Time Speech Recognition: Integrated SarvamAI's saaras:v2.5 model for speech...Read More
Sarvam.ai

Sarvam.ai

Best Bhindi Usecase for Developers

Bhindi Agent Integration Created Two Specialized Agents on Bhindi Platform: Splitwise Agent: Handles expense management...Read More
Bhindi AI

Bhindi AI

Cheer Project

Cheering for a project means supporting a project you like with as little as 0.0025 ETH. Right now, you can Cheer using ETH on Arbitrum, Optimism and Base.

Discussion

Builders also viewed

See more projects on Devfolio