tested.dev / v0
Your agent ships faster when it can test itself.
The testing layer for AI-native teams. Feedback loop for the agent. Gate for the PR.
bare prompt closes 26%. with tested.dev: 100%, 2× faster than iterating without coverage data.
by jorge modesto · 10y shipping software · built agents + mcp tooling in production
first cohort — 50 spots — no spam, ever
runs locally · no creds stored · source visible on green-light
- Claude Code
- Cursor
- Codex
- Cline
- Aider
- Continue
01 / the bench
Every layer pays for itself.
Four-arm ablation, 13 TypeScript fixtures, N=3 trials — 156 test-closure attempts against DeepSeek V4 Pro. Each arm adds one capability over the previous. All three deltas separate at 95% Wilson CI.
markdown-link-extractoris the canary — a regex over nested brackets. A and A* never close it. B closes it 1/3. Only the MCP loop closes it consistently. The wedge isn't an average; it's real on the hard fixtures.
model
deepseek-v4-pro
+ V4 Flash cross-model: same A* < B < C ordering
total spend
$4.79
186 cells · 19m wall-clock
caveat
mutation kill N=1
A* 90 > C 86 > B 83 — suggestive, not significant
02 / the problem
AI codes faster than humans can verify it.
Your agent can't see what it didn't test.
No structured coverage signal — so your agent guesses what needs tests, ships incomplete PRs, and you catch it in review. The loop never closes because the agent never had the signal to begin with.
Your team's standards live in your head.
You care about coverage. Half your team doesn't. PRs ship without tests, badges stay green, debt compounds. Without an enforceable gate, "we test our code" is a culture line, not a guarantee.
100% coverage isn't 100% trust.
Line coverage doesn't catch `expect(true).toBe(true)`. Mutation testing does. Smell scanning does. Both are missing from every tool your agent can actually call.
03 / the answer
Two surfaces over the same signal.
Agent reads it.
MCP server + JSON CLI. Coverage data lands in the same tool turn the agent calls it. Patch coverage in 30ms. Your agent writes the test that closes the gap and verifies the result before opening the PR.
{
"files": [
{
"path": "src/auth.ts",
"ranges": [
{ "start": 14, "end": 22, "kind": "line" }
]
}
]
}Team enforces it.
GitHub App posts patch + project coverage on every PR. Configurable gate blocks merge under threshold. Your senior taste becomes everyone's gate — including the agent's. Standards stop being a culture line.
Quality, not vanity.
Mutation testing built in, not a paid add-on. Smell scan catches fake assertions. The agent gets told "this test passes but verifies nothing" and fixes it before opening the PR.
04 / what's next
The roadmap, on the record.
[ first cohort shapes the order. ship feedback that lands in v1, get founder pricing for life. ]
05 / who built this
Built in public.
Because the tool doesn't exist yet.
“Tests pass. Production breaks. Engineers end up slower chasing bugs and validating things manually. Users get broken software.
Having a solid CI/CD pipeline flips it.
Now agents amplify the speed and the damage. They only ship fast when the feedback loop works.
— Jorge Modesto
Get early access.
50 spots. First cohort. Free feedback loop.
06 / questions
FAQ
when does it launch?
Alpha is rolling now to the first 50. CLI + MCP + GitHub App land first. Hosted dashboard follows once the gate is real.
is it free during alpha?
Yes. Free for the first cohort, no card required. We will tell you what changes before it changes.
what stacks do you support today?
TypeScript and Python first — Vitest, Jest, pytest. Go and Ruby follow. If your stack is missing, email me and you skip the queue.
what about my code privacy?
Local-first. CLI runs in your shell, MCP runs in your editor. Coverage data and diffs stay on your machine. The GitHub App reads PR metadata — nothing leaves your repo except the threshold verdict.