Krishi Setu
AI-Powered Voice Intelligence for Every Farmer.
Created on 7th March 2026
•
Krishi Setu
AI-Powered Voice Intelligence for Every Farmer.
The problem Krishi Setu solves
Krishi Setu — AI Voice Assistant for Indian Farmers
The Problem It Solves
Over 100 million smallholder farmers in India make critical decisions every day — when to spray, what fertilizer to use, which government scheme to apply for — with little to no expert support. Agronomists are scarce, expensive, and rarely available in rural areas. Existing agri-tech apps assume smartphones and reliable internet, leaving out the majority of farmers who still use feature phones or live in low-connectivity zones.
Bad advice, or no advice at all, costs Indian farmers an estimated ₹50,000+ crore in avoidable crop losses every year.
What Krishi Setu Does
Krishi Setu ("Agriculture Bridge") lets any farmer call a phone number and get instant, personalised farming advice in Hinglish — no smartphone, no app, no internet required on their end.
🌾 Voice Call → Instant AI Advice
A farmer dials the Twilio number from any phone. They speak naturally — Hindi, English, or Hinglish. The AI responds in seconds with crop-specific, location-aware advice spoken back to them.
"Mere gehu ki patti peeli ho rahi hai" → AI diagnoses yellow rust, recommends propiconazole dose and timing
🌦️ Live Weather + Farming Advice
Weather queries fetch real-time data from Open-Meteo. The AI tells farmers whether to irrigate today, delay spraying due to rain, or harvest before a cold snap.
"Kal barish hogi kya Lucknow mein?" → Actual forecast + whether it's safe to spray pesticides
🏛️ Government Scheme Guidance
BM25 retrieval over a curated knowledge base of PM-KISAN, PMFBY, MSP, KCC, PM Kusum, e-NAM, and NABARD schemes. Farmers get plain-language answers about their entitlements.
"PM Kisan ka paisa kab aayega?" → Installment schedule and how to check status
📸 Crop Image Analysis (For Smartphone Users)
Farmers who have smartphones can visit the web interface, upload a photo of a diseased crop, and receive a structured diagnosis with severity rating, treatment plan, and prevention tips — read aloud via AI voice.
The image analysis uses an agentic 3-stage pipeline (LLM Analyzer → Voice Fetcher → Player running concurrently) so the voice response starts playing within ~2 seconds instead of waiting 8–10 seconds for the full response.
👨🌾 Personalised to Every Farmer
Each farmer is profiled in the system — their crops, soil type, location, language preference. The AI greets them by name and tailors every answer to their specific situation.
Who Can Use This
| User | How They Benefit |
|---|---|
| Smallholder farmer (feature phone) | Gets expert-quality crop advice via a plain phone call in their own language |
| Farmer with smartphone | Uploads a photo of a sick crop and hears an AI diagnosis in seconds |
| Farmer cooperative / FPO | Can register all member farmers and route support calls through one number |
| Agri NGO / Krishi Vigyan Kendra | Deploy as a 24/7 first-response advisory before connecting to a human expert |
| State Agriculture Dept | Distribute scheme information at scale without call centre overhead |
What Makes It Different
- Works on any phone — no smartphone or internet required for the farmer
- Hinglish AI — not just translated; genuinely mixed-language responses that mirror how farmers actually talk
- Zero-latency RAG — custom in-memory BM25 retriever, no vector DB, no embedding API calls, <1 ms retrieval
- Sentence-streaming voice — AI starts speaking after the first sentence, not after the full response is ready
- Free weather data — Open-Meteo, no API key, real-time forecasts for 60+ Indian cities
Challenges we ran into
Challenges We Ran Into
1. Node.js Stream Incompatibility Crashed TTS Mid-Hackathon
The bug: After we switched from buffering the full TTS audio to streaming it directly, the server threw:
OpenAI TTS stream failed: The "readableStream" argument must be an instance of ReadableStream. Received an instance of PassThrough
What happened: We called
Readable.fromWeb(response.body)
assuming the OpenAI SDK would return a WebReadableStream
. But the version of the SDK installed returns a Node.jsPassThrough
stream - which is aReadable
, butfromWeb()
only accepts the Web API variant. The error only surfaced at runtime, not at build time, so it silently fell through to the fallback model and then crashed entirely.Fix: We wrote a
toNodeStream()
helper that checksinstanceof Readable
first. If the SDK gives back a Node stream, return it as-is. Only callReadable.fromWeb()
if it's actually a WebReadableStream
. One function, handles both SDK behaviours without breaking the pipeline.2. 8-Second Wait Before Any Voice Was Heard
The bug: The original crop analyzer had two sequential waits — full LLM response (~4 s), then full TTS generation (~4 s) — before the user heard a single word. During testing this felt completely broken even when it was working correctly.
What happened: The architecture was a simple request → wait → response. The
/api/analyze
route waited for the entire LLM response, then the client called/api/analyze/tts
with the full text, waited for the entire MP3 to download, then played it. Every step was sequential and no work overlapped.Fix: Rearchitected into a three-stage concurrent pipeline on judge feedback:
- Stage 1 (Analyzer): LLM streams tokens over SSE as they arrive
- Stage 2 (Voice Fetcher): Client detects sentence boundaries in the incoming token stream and fires a TTS request per sentence immediately — running in parallel with the LLM still generating
- Stage 3 (Player):
drainQueue()
plays each audio blob the moment it resolves, while TTS for the next sentence is already fetching
Time-to-first-word dropped from ~8 s to ~2–3 s. Subsequent sentences play with near-zero gaps.
3. Hinglish TTS Sounded Robotic and Wrong
The bug: OpenAI TTS with default settings mispronounced every Hindi word. "Fasal" became "fah-sal" with an American accent. "Kisan" sounded like "kee-san". Farmers in testing did not understand their own language being spoken back at them.
What happened: TTS models default to American English phonetics. Hindi words embedded in otherwise-English text got mangled. Standard voice options don't have Indian accent models.
Fix: Used the
gpt-4o-mini-tts
model which supports a freeforminstructions
field. We wrote a detailed prompt instructing the model to speak in Hinglish with authentic Indian pronunciation, referencing specific words like "fasal", "kisan", "khad", "mitti" as pronunciation anchors, and describing the persona as "a warm local agronomist". The difference was immediately noticeable in testing.4. BM25 Matching Failed on Hindi Crop Names
The bug: A farmer asking "meri gehun ki fasal mein kya khaad dein" (what fertilizer for wheat) got zero retrieval hits. The knowledge base had extensive wheat content, but BM25 scored it near zero.
What happened: Our BM25 tokenizer split on whitespace and lowercased — but "gehun" (Hindi for wheat) had no overlap with "wheat" in the knowledge base. The retriever was built for English and had no bilingual coverage.
Fix: Two-part solution. First, we added Hindi synonyms directly into the knowledge base documents alongside the English terms ("wheat/gehun", "rice/chawal", "maize/makka", etc.) so BM25 could match either form. Second, the LLM system prompt instructs the model to reformulate the farmer's query in English before the RAG lookup, effectively acting as a translation layer before retrieval.
5. Twilio WebSocket Sending Responses Before STT Was Complete
The bug: On fast speakers, the AI would sometimes respond to a half-sentence. A farmer saying "Mere gehu ki..." would get interrupted mid-thought with an AI response about wheat before they had finished their question.
What happened: Twilio Conversation Relay sends
prompt
events as the speech-to-text becomes confident, but thelast: true
flag (indicating the final transcript) could be delayed. Our session handler was dispatching to the LLM on the firstprompt
event regardless.Fix: Added a check for the
last: true
flag on the Twilio message before triggering the RAG + LLM pipeline. Only the final confirmed transcript goes to the model. Partial transcripts update a buffer for display purposes only, never trigger a response.6. SQLite Blocking the Event Loop on Concurrent Calls
The bug: During a load test with three simultaneous calls, all three sessions would occasionally hang for 200–400 ms at t
