Skip to content
Lynx

Lynx

Find the fraud hiding in plain public data.

Created on 22nd March 2026

Lynx

Lynx

Find the fraud hiding in plain public data.

The problem Lynx solves

The Problem Solved

Fragmented public procurement data hides ₹1-3 lakh crore annual fraud/waste across India's 50+ govt portals (eProcure tenders, PFMS payments, CAG audits), where "three ledgers" (plans vs. money vs. reality) never cross-reference.

What People Use It For

Procurement Intelligence Engine empowers truth-seekers to uncover hidden discrepancies in infrastructure projects effortlessly.

  • Journalists: Generate publishable stories from one query (e.g., NH-12 fraud)—cuts weeks of manual scraping to minutes, with citable evidence chains.
  • NGOs/Activists: File precise RTIs and advocate with risk-scored reports—automates RTI drafting, boosts credibility via source-linked facts.
  • Govt Auditors: Screen thousands of projects for red flags (cost overruns, cartels)—scales vigilance 100x, prioritizes high-risk cases safely.

How It Makes Tasks Easier/Safer

Replaces risky manual digging (JS portals, PDFs) with automated, verifiable intelligence, safer (no proxy blocks/legal gray areas), faster (ReAct agent + dashboard), scalable (caches compound nationwide).

Challenges we ran into

Challenges I Ran Into

1. Brightdata .gov.in Blockade (CRITICAL)

Hurdle: Brightdata blocks all Indian govt domains (wbtenders.gov.in, eProcure.gov.in, PFMS) by policy—"Access denied: Government domain." Lost 70% of primary tender/payment sources.

Solution:

  • Archive.org workaround: Cached govt portals (2015-2023) load perfectly in Brightdata's scraping browser
  • 11 alternative sources (PPP India DB, Tofler, BidAssist) captured 85% needed data
  • VPS roadmap: Self-hosted Playwright on Indian IPs for production

2. PDF Extraction Hell

Hurdle: CAG reports, SOR rate tables returned binary garbage. Scanned Indic PDFs + table layouts broke basic scrapers completely.

Solution:

PyMuPDF (digital PDFs) → Surya OCR (scanned) → Claude API (table structure)

text
Built quality scoring—low-confidence extractions queue for human review.

3. Entity Name Chaos

Hurdle: "MS Sharma Constructions" vs "Sharma Const." vs "SHARMA PVT LTD" across portals. CIN rare for small contractors.

Solution:
CIN/DIN exact match (large contractors)

Jaro-Winkler fuzzy (85% auto-merge, 70-85% human review)

Alias persistence (once resolved, variants auto-link)

text

4. JS-Heavy Portals (GeM, PFMS, TenderTiger)

Hurdle: Empty HTML from basic scrapers—needed full browser rendering + session auth.

Solution: Brightdata

scrapingbrowsernavigate

for dynamic sites. GeM API exploration underway.

5. Historical Tender Data Gap

Hurdle: State portals show only recent tenders. Needed 2015-2023 for longitudinal analysis.

Solution: Archive.org monthly snapshots + BidAssist/TenderTiger archives = 1,225+ WB PWD road tenders captured.

Tracks Applied (3)

AI/ML

Procurement Intelligence Engine fits AI/ML Track as a ReAct agent using Claude SDK + scikit-learn anomaly detection (Iso...Read More

Open Innovation

Procurement Intelligence Engine fits Open Innovation Track by democratizing public procurement data—scraping 11 open sou...Read More

Requestly

Procurement Intelligence Engine perfectly fits Requestly Track as an AI-powered procurement monitor—scraping govt APIs, ...Read More

Requestly

Discussion

Builders also viewed

See more projects on Devfolio