SWARNAM.AI
SWARNAM
Created on 22nd June 2025
โข
SWARNAM.AI
SWARNAM
The problem SWARNAM.AI solves
LLM-to-Agent Transformation Bottleneck
Current LLMs are conversational but not operational. Developers struggle to build truly agentic AI systems that can perceive, reason, and act in real-world environments. Existing frameworks either lack real-time capabilities, have poor tool orchestration, or can't maintain context across complex multi-step operations.
What People Can Use This Framework For
๐ค Build Production-Ready AI Agents
Customer service agents that can access CRM, process refunds, and update tickets while talking to customers
Personal assistants that manage calendars, book travel, and coordinate across multiple platforms via voice
Healthcare agents that can access patient records, schedule appointments, and provide real-time medical guidance
E-commerce agents that process orders, check inventory, and handle returns through natural conversation
โก Eliminates Development Complexity
Before: Months of building custom WebSocket handlers, audio processing, tool orchestration, and context management
After: Deploy multi-modal agents in days using pre-built voice pipeline, LLM tool selection, and API integration framework
๐ Real-Time Bidirectional Intelligence
Sub-3-second response times with complex multi-API workflows and persistent context
Intelligent tool chaining where agents autonomously decide which APIs to call in sequence
Multi-modal input/output supporting voice, text, and structured data simultaneously
Cross-platform state management maintaining context across different systems and sessions
๐ Universal Agent Development Platform
Language-agnostic supporting any combination of speech recognition and synthesis models
API-flexible integrating with any REST/GraphQL endpoints through standardized tool definitions
Authentication-aware handling OAuth, API keys, and complex auth flows automatically
Deployment-ready with production-grade error handling, logging, and monitoring
๐ง Framework Applications Beyond Finance
Healthcare: Voice-controlled patient management with EHR integration
Education: AI tutors with real-time access to learning management systems
Logistics: Voice-controlled inventory and shipping coordination
Real Estate: Property search and management through conversational interfaces
Challenges we ran into
Challenges I Ran Into
1. Parallel Processing Architecture Bottleneck
Problem: Initial sequential processing caused 8+ second delays - audio processing โ STT โ LLM โ tool execution โ TTS โ audio output created unacceptable latency.
Solution: Built dual-stream architecture with concurrent audio buffering and LLM inference, reducing end-to-end response time to sub-3 seconds through parallel pipeline execution.
2. SarvamAI Audio Format Compatibility
Problem: SarvamAI streaming API only accepts WAV format, but Twilio streams 8kHz ยต-law encoded audio, causing complete audio processing failure.
Solution: Implemented real-time audio conversion layer using
audioop-lts
for Python 3.13 compatibility, creating seamless ยต-law โ WAV conversion pipeline.3. Bhindi Platform API Gateway Limitation
Problem: Bhindi doesn't provide developer API gateway access, blocking external model integration and tool execution outside their platform.
Solution: Built custom Express.js proxy server that abstracts Bhindi's internal APIs, exposing standardized REST endpoints with proper authentication for external developers.
4. Web2 Payment Gateway Constraints
Problem: Traditional payment gateways like Cashfree/Razorpay have settlement delays, KYC requirements, and limited programmability for agent-driven transactions.
Solution: Architected hybrid payment infrastructure supporting both Web2 APIs for compliance and Web3 protocols for instant, programmable payments.
5. Multi-Chunk TTS Audio Fragmentation
Problem: SarvamAI TTS returns audio in multiple chunks, causing incomplete speech playback and broken user experience during longer responses.
Solution: Developed intelligent WAV file combination algorithm that merges multiple audio chunks into single coherent audio stream before Twilio transmission.
6. Context Loss Across Tool Chains
Problem: LLM losing conversation context during complex multi-step operations like "pay Daksh" โ fetch expenses โ calculate debt โ create payment link.
Solution: Implemented persistent state management with 24KB context buffer, maintaining conversation flow and user relationships across 16+ tool executions.
7. Token Limit Overflow in Expense Processing
Problem: Large Splitwise expense datasets caused LLM token limit exceeded errors, breaking tool execution and user experience.
Solution: Built smart expense summarization system limiting to 15 most recent transactions with intelligent name matching using word-set algorithms for robust person identification.
Progress made before hackathon
Progress Made Before Hackathon
Pre-Existing Knowledge & Experience
Prior SarvamAI Integration Experience: Had previously worked with SarvamAI's speech-to-text and text-to-speech APIs in earlier projects, understanding their model capabilities (saaras:v2.5 for STT, bulbul:v2 for TTS) and API response formats.
General AI/LLM Architecture Knowledge: Brought foundational understanding of LLM tool-calling patterns, temperature-based response generation, and basic audio processing concepts from previous development work.
Tracks Applied (2)
Sarvam AI Track
Sarvam.ai
Best Bhindi Usecase for Developers
Bhindi AI
Technologies used
Cheer Project
Cheering for a project means supporting a project you like with as little as 0.0025 ETH. Right now, you can Cheer using ETH on Arbitrum, Optimism and Base.