AI Battle Royale
How agents think through high risk survival.
Created on 15th February 2026
•
AI Battle Royale
How agents think through high risk survival.
The problem AI Battle Royale solves
Our Motivation
We've all seen the debates about whether LLMs can truly reason or just pattern-match. When models face actual stakes - elimination, resource scarcity, time pressure - how do they perform? That question felt more interesting than building another SaaS wrapper.
Most hackathon projects are "pay me for this tool." We wanted to build something that pushes into research territory while still being entertaining to watch. The idea of agents fighting for survival with visible reasoning seemed like the perfect intersection - scientifically interesting and genuinely fun.
What hooked us was the existential question: how does an LLM behave when "death" is on the line? When Claude is at 3% health with zone damage ticking, does it panic? Does it calculate odds? Does it make desperate moves or stay rational? These aren't questions you can answer with a chatbot demo.
Turns out watching agents balance survival instinct against aggression, seeing them develop emergent strategies we didn't program, and reading their reasoning as they make life-or-death decisions is genuinely fascinating. Every game tells a different story. That's what made this worth building.
Research Value
Transparent adversarial reasoning under time pressure.
Traditional RL agents are black boxes - you see actions but not the decision-making process. LLMs give us interpretable strategic thinking in real-time competitive scenarios.
This system generates a unique dataset:
- Multi-agent spatial reasoning on constrained grids
- Resource competition and risk assessment traces
- Theory of mind under uncertainty (predicting opponent moves)
- Survival vs aggression trade-off decisions
- Response latency as gameplay factor
Use Cases
AI Safety Research:
Study how models balance competing objectives (immediate survival vs long-term strategy), handle adversarial pressure, and reason about incomplete information.
Model Benchmarking:
Compare reasoning quality across frontier models in identical high-stakes scenarios. Response time becomes measurable advantage - faster inference = tactical edge.
Prompt Engineering Testing:
Stress-test system prompts under adversarial conditions. Does the model follow strategy instructions when pressured? Do safety guardrails hold under competitive framing?
Training Data Generation:
Corpus of decision traces for fine-tuning models on strategic reasoning, spatial planning, and real-time decision-making.
What Makes It Novel
Response latency = gameplay mechanic. Faster models act first each turn, creating natural speed hierarchy. Gemini's 1.8s vs Llama's 3.5s becomes positioning advantage.
Full observability. Every agent sees complete battlefield state simultaneously, then decides in parallel. No information asymmetry - pure reasoning competition.
Emergent strategy. Agents develop playstyles: mages kite at optimal damage range, scouts exploit mobility, tanks force endgame slugfests. Not programmed - discovered through optimization.
Challenges I ran into
The Walking Simulator Problem
First test game was 17 turns of agents just walking toward the center while slowly dying to zone damage. Zero combat. We realized agents had no incentive to fight - survival was the only goal. We fixed it by spawning agents inside the safe zone so they weren't panicking about zone damage, then added contested item spawns between them to force early skirmishes.
Getting Agents to Understand Spatial Strategy
Agents kept making bizarre positioning choices - moving away from low-health targets, ignoring items 2 tiles away, running into corners. They didn't grasp the tactical layout. We had to redesign how we describe the battlefield in the prompt: instead of just listing coordinates, we added distance calculations ("target is 7 tiles away"), zone status ("you are INSIDE/OUTSIDE safe zone"), and item proximity. Suddenly they started playing smart - kiting at optimal ranges, grabbing nearby items, staying near zone center.
Defining the Right Incentives
Early agents were too passive - they'd defend forever or just collect items without fighting. We needed to make aggression rewarding. Added first blood bonus (+20 HP), made items spawn in contested areas rather than random corners, gave kill credit in the combat log. Once agents saw that attacking low-health targets had clear payoff, they started taking calculated risks instead of playing ultra-safe.
Prompt Timeouts
Claude kept timing out because prompts were too verbose - 500+ words of game state description. Agents were overthinking simple moves. We stripped prompts down to ~150 words with only essential info, used clear formatting with sections, added concrete examples of valid actions. Cut response times in half and eliminated most timeouts.
Tracks Applied (1)
Hackathon Prizes
Technologies used
