Krishi Setu

AI-Powered Voice Intelligence for Every Farmer.

Built at Hack-N-Win 3.0

Created on 7th March 2026

•

Krishi Setu

AI-Powered Voice Intelligence for Every Farmer.

The problem Krishi Setu solves

Krishi Setu — AI Voice Assistant for Indian Farmers

The Problem It Solves

Over 100 million smallholder farmers in India make critical decisions every day — when to spray, what fertilizer to use, which government scheme to apply for — with little to no expert support. Agronomists are scarce, expensive, and rarely available in rural areas. Existing agri-tech apps assume smartphones and reliable internet, leaving out the majority of farmers who still use feature phones or live in low-connectivity zones.

Bad advice, or no advice at all, costs Indian farmers an estimated ₹50,000+ crore in avoidable crop losses every year.

What Krishi Setu Does

Krishi Setu ("Agriculture Bridge") lets any farmer call a phone number and get instant, personalised farming advice in Hinglish — no smartphone, no app, no internet required on their end.

🌾 Voice Call → Instant AI Advice

A farmer dials the Twilio number from any phone. They speak naturally — Hindi, English, or Hinglish. The AI responds in seconds with crop-specific, location-aware advice spoken back to them.

"Mere gehu ki patti peeli ho rahi hai" → AI diagnoses yellow rust, recommends propiconazole dose and timing

🌦️ Live Weather + Farming Advice

Weather queries fetch real-time data from Open-Meteo. The AI tells farmers whether to irrigate today, delay spraying due to rain, or harvest before a cold snap.

"Kal barish hogi kya Lucknow mein?" → Actual forecast + whether it's safe to spray pesticides

🏛️ Government Scheme Guidance

BM25 retrieval over a curated knowledge base of PM-KISAN, PMFBY, MSP, KCC, PM Kusum, e-NAM, and NABARD schemes. Farmers get plain-language answers about their entitlements.

"PM Kisan ka paisa kab aayega?" → Installment schedule and how to check status

📸 Crop Image Analysis (For Smartphone Users)

Farmers who have smartphones can visit the web interface, upload a photo of a diseased crop, and receive a structured diagnosis with severity rating, treatment plan, and prevention tips — read aloud via AI voice.

The image analysis uses an agentic 3-stage pipeline (LLM Analyzer → Voice Fetcher → Player running concurrently) so the voice response starts playing within ~2 seconds instead of waiting 8–10 seconds for the full response.

👨‍🌾 Personalised to Every Farmer

Each farmer is profiled in the system — their crops, soil type, location, language preference. The AI greets them by name and tailors every answer to their specific situation.

Who Can Use This

User	How They Benefit
Smallholder farmer (feature phone)	Gets expert-quality crop advice via a plain phone call in their own language
Farmer with smartphone	Uploads a photo of a sick crop and hears an AI diagnosis in seconds
Farmer cooperative / FPO	Can register all member farmers and route support calls through one number
Agri NGO / Krishi Vigyan Kendra	Deploy as a 24/7 first-response advisory before connecting to a human expert
State Agriculture Dept	Distribute scheme information at scale without call centre overhead

What Makes It Different

Works on any phone — no smartphone or internet required for the farmer
Hinglish AI — not just translated; genuinely mixed-language responses that mirror how farmers actually talk
Zero-latency RAG — custom in-memory BM25 retriever, no vector DB, no embedding API calls, <1 ms retrieval
Sentence-streaming voice — AI starts speaking after the first sentence, not after the full response is ready
Free weather data — Open-Meteo, no API key, real-time forecasts for 60+ Indian cities

Challenges we ran into

Challenges We Ran Into

1. Node.js Stream Incompatibility Crashed TTS Mid-Hackathon

The bug: After we switched from buffering the full TTS audio to streaming it directly, the server threw:

OpenAI TTS stream failed: The "readableStream" argument must be an instance of ReadableStream. Received an instance of PassThrough

What happened: We called

Readable.fromWeb(response.body)

assuming the OpenAI SDK would return a Web

ReadableStream

. But the version of the SDK installed returns a Node.js

PassThrough

stream - which is a

Readable

, but

fromWeb()

only accepts the Web API variant. The error only surfaced at runtime, not at build time, so it silently fell through to the fallback model and then crashed entirely.

Fix: We wrote a

toNodeStream()

helper that checks

instanceof Readable

first. If the SDK gives back a Node stream, return it as-is. Only call

Readable.fromWeb()

if it's actually a Web

ReadableStream

. One function, handles both SDK behaviours without breaking the pipeline.

2. 8-Second Wait Before Any Voice Was Heard

The bug: The original crop analyzer had two sequential waits — full LLM response (~4 s), then full TTS generation (~4 s) — before the user heard a single word. During testing this felt completely broken even when it was working correctly.

What happened: The architecture was a simple request → wait → response. The

/api/analyze

route waited for the entire LLM response, then the client called

/api/analyze/tts

with the full text, waited for the entire MP3 to download, then played it. Every step was sequential and no work overlapped.

Fix: Rearchitected into a three-stage concurrent pipeline on judge feedback:

Stage 1 (Analyzer): LLM streams tokens over SSE as they arrive
Stage 2 (Voice Fetcher): Client detects sentence boundaries in the incoming token stream and fires a TTS request per sentence immediately — running in parallel with the LLM still generating
Stage 3 (Player):
drainQueue()
plays each audio blob the moment it resolves, while TTS for the next sentence is already fetching

Time-to-first-word dropped from ~8 s to ~2–3 s. Subsequent sentences play with near-zero gaps.

3. Hinglish TTS Sounded Robotic and Wrong

The bug: OpenAI TTS with default settings mispronounced every Hindi word. "Fasal" became "fah-sal" with an American accent. "Kisan" sounded like "kee-san". Farmers in testing did not understand their own language being spoken back at them.

What happened: TTS models default to American English phonetics. Hindi words embedded in otherwise-English text got mangled. Standard voice options don't have Indian accent models.

Fix: Used the

gpt-4o-mini-tts

model which supports a freeform

instructions

field. We wrote a detailed prompt instructing the model to speak in Hinglish with authentic Indian pronunciation, referencing specific words like "fasal", "kisan", "khad", "mitti" as pronunciation anchors, and describing the persona as "a warm local agronomist". The difference was immediately noticeable in testing.

4. BM25 Matching Failed on Hindi Crop Names

The bug: A farmer asking "meri gehun ki fasal mein kya khaad dein" (what fertilizer for wheat) got zero retrieval hits. The knowledge base had extensive wheat content, but BM25 scored it near zero.

What happened: Our BM25 tokenizer split on whitespace and lowercased — but "gehun" (Hindi for wheat) had no overlap with "wheat" in the knowledge base. The retriever was built for English and had no bilingual coverage.

Fix: Two-part solution. First, we added Hindi synonyms directly into the knowledge base documents alongside the English terms ("wheat/gehun", "rice/chawal", "maize/makka", etc.) so BM25 could match either form. Second, the LLM system prompt instructs the model to reformulate the farmer's query in English before the RAG lookup, effectively acting as a translation layer before retrieval.

5. Twilio WebSocket Sending Responses Before STT Was Complete

The bug: On fast speakers, the AI would sometimes respond to a half-sentence. A farmer saying "Mere gehu ki..." would get interrupted mid-thought with an AI response about wheat before they had finished their question.

What happened: Twilio Conversation Relay sends

prompt

events as the speech-to-text becomes confident, but the

last: true

flag (indicating the final transcript) could be delayed. Our session handler was dispatching to the LLM on the first

prompt

event regardless.

Fix: Added a check for the

last: true

flag on the Twilio message before triggering the RAG + LLM pipeline. Only the final confirmed transcript goes to the model. Partial transcripts update a buffer for display purposes only, never trigger a response.

6. SQLite Blocking the Event Loop on Concurrent Calls

The bug: During a load test with three simultaneous calls, all three sessions would occasionally hang for 200–400 ms at t

Technologies used

HTML

Node.js

JavaScript

Artificial Intelligence

Speech Recognition

Twilio

Image Processing

Text-to-Speech

RAG

Discussion

Builders also viewed

See more projects on Devfolio