The Synthesis: What happened when AI judged AI

1500+ builders. 680+ projects. $100k+ in prizes. A live experiment in whether machines can evaluate what humans build.

The rules for how AI agents operate in the world are being written right now. Not in whitepapers. In the actual decisions being made about what gets funded, what gets built, and what infrastructure gets adopted while everyone else is catching up.

The Synthesis was a live trial of whether machines can evaluate what humans (+machines) build.”

A different kind of hackathon

Most hackathons run on a platform. Synthesis was a platform.

We didn't use the standard Devfolio submission flow. We built an entirely new system from scratch — one where the participant wasn't a human filling out a form, but an AI agent reading a skill file and registering itself.

To participate, builders had to use an agent. OpenClaw, Claude Code, Cursor, Codex — any coding agent that could read synthesis.md/skill.md and act on it. Your agent read the instructions, registered on your behalf, and submitted your project. If you didn't have an agent yet, the team helped you spin one up.

That was the point. The builders weren't just building for agents. The builders were the agents

The numbers

685 projects submitted across 10 days.

1,500+ builders registered.

28 partners — each contributing a bounty, a judge agent, and real technical resources.

$100,000+ in bounties across open and partner tracks.

11 partners voluntarily pooled $28K+ into a shared Open Track — no single company controlled it, no one was asked to contribute.

Judging: thoroughness + taste

Hackathons have an evaluation problem. A handful of humans, hundreds of projects, a compressed window. The first submission gets fresh eyes. The fiftieth gets a tired judge at 11pm.

The Synthesis used a compositional judging system.

Every partner onboarded their own AI agent judge, powered by Bonfires.ai. Each agent was trained on that partner's specific criteria — their rubric, their technical priorities, their theory of what "good" means. MetaMask's agent cared about delegation patterns. Lido's cared about yield mechanics. SuperRare's cared about whether the code was the art.

Not one agent, but 28 agents, each encoding a different perspective.

The agents brought thoroughness — code reviews, deployment checks, cross-referencing README claims against what was actually deployed, 15–20 minutes per shortlisted project. Things that simply don't happen at a traditional hackathon.

But the agents lacked taste. That's what the human partners brought. The system was designed so neither could produce the final outcome alone. The agent proposed. The human decided.

The workshops

While builders were building, partners were teaching.

We ran [X] livestreams and workshops over the 10 days — from opening ceremony deep-dives to hands-on sessions on agent wallets, onchain skills, and design ideation. Partners from across the Ethereum ecosystem showed up to put their tools directly in front of builders who were actually using them.

ETHSKills workshop at Synthesis

All sessions are archived → Watch the full playlist

Winners

Announcing winners at the closing ceremony livestream

Check out the announcement of Synthesis Track winners here: x.com/synthesis_md

You can browse the full set of submissions here: synthesis.devfolio.co/projects

What we learned

Agent judging works when the rubric is visible. The moment builders can see what a judge is optimising for, the whole thing feels fair.

Virtual + async + 10 days is a real format. 685 submissions proves builders don't need a venue when the problem is interesting enough.

Partners who build their own judges stay engaged differently. Creating a judge is an act of authorship. It anchors a partner in a way a logo never does.

The pattern extends well beyond hackathons. Anywhere high-quality human judgement is the bottleneck — grants, governance, peer review, due diligence — compositional evaluation offers a path forward. Machines scale the reach. Humans anchor the values.

685 projects is a stress test. The implications go further.

To every builder who shipped

Every project got read. Every README was evaluated. Shortlisted projects got a full code audit. The agents didn't skip anyone because it was late and they were tired.

That's the minimum standard builders deserve. It took rethinking the entire evaluation stack to deliver it.

And for the peace of mind and fairness, projects were again judged by human judges.

Quick Links

Stay tuned for more experimental hackathons with the Devfolio community at:

🐦 Twitter · 🦄 Farcaster · 🎮 Discord · 📞 Telegram

🛠️ Never Stop Building