RegressAI
Postman for AI tests, Git blame for regression
Created on 30th December 2025
•
RegressAI
Postman for AI tests, Git blame for regression
The problem RegressAI solves
RegressAI solves a critical and growing problem in real-world AI deployment where teams do not know whether an LLM change is actually safe to release.
It can be used to:
- Evaluate prompt changes, model upgrades, and configuration tweaks before deployment.
- Detect regressions in quality, safety compliance, hallucination risk, and behavioural tone.
- Prevent silent failures that reduce user trust or introduce legal and compliance risks.
- Provide teams with a clear and confident ship or do not ship decision for AI systems.
In short, RegressAI helps teams deploy AI updates with confidence instead of guesswork.
Challenges we ran into
One of the biggest challenges was designing evaluation logic that remains accurate, explainable, and trustworthy despite the non-deterministic nature of LLM outputs.
Major hurdles and solutions:
- Early versions mixed quality, safety, and structure into a single score, which produced misleading results when improvements in one area caused regressions in another.
- This was solved by separating deterministic metrics from interpretive analysis and enforcing safety-first decision rules.
- Another challenge was avoiding hallucinated or fabricated metrics from evaluation models. We solved this by ensuring that every metric shown is directly derived from observable model outputs, not inferred assumptions.
- A second LLM pass was introduced only for human-readable explanation, not scoring, which improved clarity without compromising reliability.
These design decisions significantly improved the robustness and credibility of the platform.
Tracks Applied (1)
Best Innovation
Discussion
Builders also viewed
See more projects on Devfolio
