A Three-Axis Maturity Model for Agent-Optimized Codebases
🏗️ Software architect & builder of things that should work
🤖 Working with AI agents since before it was cool (it was still questionable)
📦 Creator of agent-ready-skill — the toolkit we'll use today
🎯 Belief: AI agents are only as good as your codebase's readiness for them
Leave with a number, a mental model, and a toolkit you can use on Monday.
Mental Model
The Axis Framework + Live Demo
INSTRUCT
Scan your repo + Quickest win
NAVIGATE + VALIDATE
Tooling, tests, CI
Numerical assessment of YOUR codebase across 3 axes (0–100%)
A framework to think about agent-readiness that you can explain to your team
Working scanner, fixer, and report generator you can run on any repo
Not theory. Not hype.
Your repository. Your score. Your improvements.
You've written Python in production
You've used an AI coding agent (Cursor, Claude Code, Copilot, etc.)
An agent has messed up your codebase at least once
You want to fix that last one.
We went from blaming the environment
to blaming the model temperature.
"Just ask it again with a better prompt" is the new
"Works on my machine — just reinstall Windows."
"Add validation to our API endpoint"
The agent wasn't stupid.
It was uninformed.
"What does this project even do? What conventions should I follow?"
"Where do tests go? What's the entry point? Why is this 2000 lines?"
"Did my change break anything? Is there CI? Do tests pass?"
Three failure modes → Three axes to measure
Agents aren't stupid.
They're uninformed.
An agent landing in your codebase is like a senior developer on day one.
Would you expect them to be productive without:
✅ A README explaining what the project does
✅ Clear coding conventions
✅ Knowledge of where things live
✅ Tests to verify their changes
No? Then why expect it from an agent?
Q1: Does the agent understand WHAT we want?
Q2: Can the agent find WHERE things are?
Q3: Can the agent tell if it did RIGHT?
…and a 4th question we'll meet shortly: can the agent run SAFELY? 🛡️
"Does the agent understand WHAT we want?"
AGENTS.md first (cross-vendor), quality over bloat, scoped files. Scores conciseness — bloated instructions hurt.
specs/ with acceptance criteria, ADRs, issue/PR templates, ARCHITECTURE + comprehension signals.
"Can the agent find its way around?"
Repo map, semantic-nav amenability (typed code), dependency clarity, README, file-size sanity. Depth/naming heuristics retired.
Standard Skills, bundled scripts, MCP declaration + nav servers (Serena/Sourcegraph) actually wired up.
"Can the agent tell if it did it RIGHT?"
Test suite, documented + fast commands, coverage — and feedback quality: descriptive assertions + a type checker the agent can read.
CI runs tests + lint, automated formatting, pre-commit, governance (CODEOWNERS + Dependabot/Renovate).
I submitted this talk with three axes. Building v2,
one question kept surfacing that none of them answered:
"Can the agent run safely?"
Committed devcontainer, documented execution policy (LINCE / OS-sandbox / hosted)
.gitignore secrets, .env.example, committed lockfiles, Dependabot
Instructions only in trusted files; restrictive agent deny rules (CVE-2025-59536)
Security & Sandbox = 12% · D6 · the only axis with a single dimension — for now.
From chaos to autonomy — 5 stages
Each level requires 80% of criteria met at that tier.
You can't skip levels — they're sequential.
Evidence-based: file presence + content quality, not folklore
Portable (any agent) vs Target (only with --agents)
Weighted average, gated at 40/55/70/85/95
The agent itself assesses your code — not a static script
❌ No AGENTS.md, no README
❌ No linter, no type checker
❌ No tests, no CI
❌ No sandbox policy, secrets leak
❌ No .env.example, no lockfile
✅ Tight AGENTS.md (+ MCP wired)
✅ Ruff + mypy strict, typed code
✅ Tests + coverage + CI
✅ devcontainer + exec policy
✅ .env.example + lockfile + Dependabot
Same language (Python). Same complexity.
Different readiness.
Level 1: Foundational
Level 4: Optimized
The difference isn't the agent.
It's the codebase.
"Does the agent understand WHAT we want?"
Scan YOUR repository — Axis 1 only
Use repos/demo-bad/ or demo-good/
Ask a neighbor or raise hand 🙋
Ask Claude to write your AGENTS.md → bridge CLAUDE.md with a symlink
Let the agent draft it → review → save. Concise beats bloated (v2 penalizes walls of text).
BEFORE
AFTER
"That's it? One file?"
Yes. Agent readiness isn't about rewriting your codebase.
It's about giving agents the right information.
"Can the agent find its way around?"
Same tool. New flag. Different lens.
No .editorconfig → agent mixes tabs and spaces → CI fails
No type checker → agent can't verify its changes compile
No .env.example → agent guesses env vars → broken config
Large files → context window saturates → hallucinations
Based on your scan results — choose what helps most
Document your env vars
Enable type checking
Consistent editor settings
Or 🅳: Configure Ruff properly (~3 min, +6 pts)
"Can the agent tell if it did RIGHT?"
Where most repos bleed points
The pattern is everywhere: repos test the code, but the codebase doesn't test itself. No CI, no coverage, no pre-commit. The agent has nothing to lean on.
Tests are NOT for finding bugs.
Tests are for giving AGENTS a safety net.
Tests multiply your effectiveness with agents.
Copy. Adapt. Commit. Done.
7 dimensions · 4 axes · Portable + Target layers · explained findings
--agents
Put this in your engineering blog 📝
Share with your team 👥
Track progress over time 📈
The same agent that scans can also generate fixes — contextualized to YOUR project
Not boilerplate — the agent reads YOUR code and generates contextualized files
Layered report + explained findings & roadmap
schema v2 — the contract for fix/diff
Put in your README!
Self-contained HTML report, works offline
4 axes — INSTRUCT · NAVIGATE · VALIDATE · SECURE — to size up ANY codebase
Your codebase's score — and how much you improved it today
6 Agent Skills — scan, fix, report, diff, init — AGENTS.md-first, any agent
Not theory. Not hype.
Your repo. Your score. Your improvements.
Everything today was brownfield — fixing an existing repo. For a brand-new project, don't accumulate the debt in the first place.
init (greenfield) sets an opinionated baseline · fix (brownfield) remediates by impact. Same rubric, two entry points.
Questions? Let's discuss.
Repo: RisorseArtificiali/agent-ready-skill
License: MIT
Dependencies: rich, pyyaml, jinja2 (that's it!)
Built with ❤️ by Stefano Maestri • PyCon Italia 2026