SpaceN
🤖 PyCon Italia 2026 • Workshop

Measuring
AI-Readiness

A Three-Axis Maturity Model for Agent-Optimized Codebases

📅 May 29, 2026 ⏰ 11:00–13:00 🔧 Hands-on Workshop

Hi, I'm Stefano Maestri

🏗️ Software architect & builder of things that should work

🤖 Working with AI agents since before it was cool (it was still questionable)

📦 Creator of agent-ready-skill — the toolkit we'll use today

🎯 Belief: AI agents are only as good as your codebase's readiness for them

🧠

The Goal Today

Leave with a number, a mental model, and a toolkit you can use on Monday.

What We'll Do Together

🔴

Phase 1

Mental Model
The Axis Framework + Live Demo

15 min
🔵

Phase 2

INSTRUCT
Scan your repo + Quickest win

25 min
🟢

Phase 3

NAVIGATE + VALIDATE
Tooling, tests, CI

50 min
3
Magic Moments
1
Toolkit to reuse
Aha moments
THE PROMISE

In 120 Minutes You'll Have:

📊

A Number

Numerical assessment of YOUR codebase across 3 axes (0–100%)

🧠

A Mental Model

A framework to think about agent-readiness that you can explain to your team

🛠️

A Toolkit

Working scanner, fixer, and report generator you can run on any repo

Not theory. Not hype.
Your repository. Your score. Your improvements.

Raise Your Hand If… 🙋

1️⃣

You've written Python in production

2️⃣

You've used an AI coding agent (Cursor, Claude Code, Copilot, etc.)

3️⃣

An agent has messed up your codebase at least once

🎯

You want to fix that last one.

THE PROBLEM

"It Worked on My Machine"

"It Worked in My Prompt"

We went from blaming the environment
to blaming the model temperature.

"Just ask it again with a better prompt" is the new
"Works on my machine — just reinstall Windows."

A True Story 🫠

The Ask

"Add validation to our API endpoint"

What Happened
  • Added tests… in the wrong directory
  • Followed linting rules… from 2 years ago
  • Couldn't find where request schemas live
  • Used Flask patterns… we're on FastAPI
Root Cause

The agent wasn't stupid.
It was uninformed.

💥

Why Agents Fail

Doesn't Understand

"What does this project even do? What conventions should I follow?"

→ INSTRUCT
🔄

Can't Navigate

"Where do tests go? What's the entry point? Why is this 2000 lines?"

→ NAVIGATE

Can't Validate

"Did my change break anything? Is there CI? Do tests pass?"

→ VALIDATE

Three failure modes → Three axes to measure

KEY INSIGHT

It's Not the Agent's Fault

Agents aren't stupid.
They're uninformed.

An agent landing in your codebase is like a senior developer on day one.

Would you expect them to be productive without:
✅ A README explaining what the project does
✅ Clear coding conventions
✅ Knowledge of where things live
✅ Tests to verify their changes

No? Then why expect it from an agent?

The Three Questions

📝 INSTRUCT 🧭 NAVIGATE VALIDATE YOUR CODEBASE

Q1: Does the agent understand WHAT we want?

Q2: Can the agent find WHERE things are?

Q3: Can the agent tell if it did RIGHT?

…and a 4th question we'll meet shortly: can the agent run SAFELY? 🛡️

📝

Axis 1: INSTRUCT

"Does the agent understand WHAT we want?"

Weight: 28%
📋

D1 · Agent Instructions & Context

AGENTS.md first (cross-vendor), quality over bloat, scoped files. Scores conciseness — bloated instructions hurt.

+18 pts
🏗️

D7 · Spec-Driven Workflow & Docs

specs/ with acceptance criteria, ADRs, issue/PR templates, ARCHITECTURE + comprehension signals.

+10 pts
🧭

Axis 2: NAVIGATE

"Can the agent find its way around?"

Weight: 30%
🗺️

D2 · Navigability & Code Intelligence

Repo map, semantic-nav amenability (typed code), dependency clarity, README, file-size sanity. Depth/naming heuristics retired.

+18 pts
🔧

D5 · Agent Tooling & Capabilities

Standard Skills, bundled scripts, MCP declaration + nav servers (Serena/Sourcegraph) actually wired up.

+12 pts

Axis 3: VALIDATE

"Can the agent tell if it did it RIGHT?"

Weight: 30%
🧪

D3 · Testing & Feedback

Test suite, documented + fast commands, coverage — and feedback quality: descriptive assertions + a type checker the agent can read.

+16 pts
🔄

D4 · CI/CD, Automation & Governance

CI runs tests + lint, automated formatting, pre-commit, governance (CODEOWNERS + Dependabot/Renovate).

+14 pts
PLOT TWIST
🛡️

Axis 4: SECURE

I submitted this talk with three axes. Building v2, one question kept surfacing that none of them answered:
"Can the agent run safely?"

Weight: 12%
📦

Sandbox & Isolation

Committed devcontainer, documented execution policy (LINCE / OS-sandbox / hosted)

🔑

Secret & Supply-Chain Hygiene

.gitignore secrets, .env.example, committed lockfiles, Dependabot

💉

Injection & Permissions

Instructions only in trusted files; restrictive agent deny rules (CVE-2025-59536)

Security & Sandbox = 12% · D6 · the only axis with a single dimension — for now.

Maturity Levels

From chaos to autonomy — 5 stages

L5 Autonomous
≥95%
L4 Optimized
≥85%
L3 Structured
≥70%
L2 Guided
≥55%
L1 Foundational
≥40%

Each level requires 80% of criteria met at that tier.
You can't skip levels — they're sequential.

How Scoring Works

$ /agent-ready scan . ╭─────────────────────────────────────╮ │ 📝 INSTRUCT ████████░░░░ 58% │ │ 🧭 NAVIGATE ██████████░░ 71% │ │ ✅ VALIDATE █████░░░░░░░ 39% │ │ 🛡️ SECURE ███░░░░░░░░░ 25% │ ├─────────────────────────────────────┤ │ Layers (v2): │ │ Portable 54 / 88 │ │ Target-specific n/a (no --agents)│ ╰─────────────────────────────────────╯ Overall: 52% │ 🟡 Partially Ready
Step 1: Agent scans 7 dimensions

Evidence-based: file presence + content quality, not folklore

Step 2: Every sub-criterion tagged

Portable (any agent) vs Target (only with --agents)

Step 3: 7 dims → 4 axes → maturity level

Weighted average, gated at 40/55/70/85/95

Agent-Powered

The agent itself assesses your code — not a static script

Meet the Test Subjects 🐁

🔴

Repo A: Chaos Monkey

❌ No AGENTS.md, no README

❌ No linter, no type checker

❌ No tests, no CI

❌ No sandbox policy, secrets leak

❌ No .env.example, no lockfile

Expected: ~20%
🟢

Repo B: Agent Heaven

✅ Tight AGENTS.md (+ MCP wired)

✅ Ruff + mypy strict, typed code

✅ Tests + coverage + CI

✅ devcontainer + exec policy

✅ .env.example + lockfile + Dependabot

Expected: ~85%

Same language (Python). Same complexity.
Different readiness.

✨ MAGIC MOMENT #1

Same Tool. Different Scores.

Chaos Monkey

23%

Level 1: Foundational

Agent Heaven

81%

Level 4: Optimized

The difference isn't the agent.
It's the codebase.

Phase 2 — Hands-On

📝 INSTRUCT

"Does the agent understand WHAT we want?"

~25 minutes

Your Turn 🔍

Scan YOUR repository — Axis 1 only

$ /agent-ready scan . ↑ Scanning your repo... (results appear here)
No repo?

Use repos/demo-bad/ or demo-good/

Stuck?

Ask a neighbor or raise hand 🙋

QUICKEST WIN

One File = biggest INSTRUCT jump

Ask Claude to write your AGENTS.md → bridge CLAUDE.md with a symlink

> Read this project and write a concise AGENTS.md > (< 200 lines): overview, build/test/lint, structure, > conventions, a short safe-to-run note. # AGENTS.md — quicknote ## What this project is …one paragraph ## Build · test · lint pip install -e ".[dev]" · pytest · ruff ## Conventions double quotes, type hints, snake_case ## Pitfalls all persistence via storage.py ## Safe to run pytest, ruff check, git status $ ln -s AGENTS.md CLAUDE.md # bridge, no drift

Let the agent draft it → review → save. Concise beats bloated (v2 penalizes walls of text).

✨ MAGIC MOMENT #2

Before → After

BEFORE

35%

AFTER

63%
+28 points!

"That's it? One file?"
Yes. Agent readiness isn't about rewriting your codebase.
It's about giving agents the right information.

Phase 3 — Hands-On

🧭 NAVIGATE

"Can the agent find its way around?"

~20 minutes

Scan Your Repo — Axis 2 🔍

Same tool. New flag. Different lens.

$ /agent-ready scan . ╭──────────────────────────────────────╮ │ 🧭 AXIS 2: NAVIGATE │ │ ════════════════════════════════ │ │ │ │ Score: 52/100 ██████████░░░░ 52% │ │ │ │ ✅ .editorconfig found +2 │ │ ✅ pyproject.toml found +2 │ │ ✅ ruff configured [tool] +4 │ │ ❌ pyrightconfig.json missing -4 │ │ ❌ .env.example missing -2 │ │ ❌ Dockerfile missing -2 │ │ ⚠️ Large files (>300ln) 3 found -2 │ ╰──────────────────────────────────────╯
Why each check matters

No .editorconfig → agent mixes tabs and spaces → CI fails

No type checker → agent can't verify its changes compile

No .env.example → agent guesses env vars → broken config

Large files → context window saturates → hallucinations

Pick ONE Fix

Based on your scan results — choose what helps most

🅰️

.env.example

Document your env vars

30 sec +4 pts
🅱️

pyrightconfig.json

Enable type checking

2 min +6 pts
🅲

.editorconfig

Consistent editor settings

1 min +2 pts

Or 🅳: Configure Ruff properly (~3 min, +6 pts)

Phase 4 — Hands-On

VALIDATE

"Can the agent tell if it did RIGHT?"

~30 minutes

Scan Your Repo — Axis 3 🔍

Where most repos bleed points

$ /agent-ready scan . ╭──────────────────────────────────────╮ │ ✅ AXIS 3: VALIDATE │ │ ════════════════════════════════ │ │ │ │ Score: 28/100 ████████░░░░░░░ 28% │ │ │ │ ✅ pytest configured [tool.pytest] +6│ │ ✅ test files found 12 files +5 │ │ ✅ poetry.lock present +2 │ │ ❌ CI workflow missing -4 │ │ ❌ coverage configured no -4 │ │ ❌ SECURITY.md missing -2 │ │ ❌ pre-commit missing -2 │ ╰──────────────────────────────────────╯

The pattern is everywhere: repos test the code, but the codebase doesn't test itself. No CI, no coverage, no pre-commit. The agent has nothing to lean on.

CONTROVERSIAL TAKE

The Testing Paradox

Tests are NOT for finding bugs.

Tests are for giving AGENTS a safety net.

Without Tests
  1. Agent writes code
  2. You manually verify everything
  3. Bottleneck. Fatigue. Mistakes.
With Tests
  1. Agent writes code
  2. Agent runs tests → Pass ✓
  3. You review PR confidently

Tests multiply your effectiveness with agents.

The CI File 🔄

Copy. Adapt. Commit. Done.

.github/workflows/ci.yml
# Copy this template to your repo name: CI on: [push, pull_request] jobs: quality: runs-on: ubuntu-latest steps: - uses: actions/checkout@v4 - uses: actions/setup-python@v5 with: { python-version: "3.12" } - run: pip install ruff pytest - run: ruff check . - run: ruff format --check test: runs-on: ubuntu-latest steps: - uses: actions/checkout@v4 - uses: actions/setup-python@v5 - run: pip install -e ".[dev]" - run: pytest tests/ -v --tb=short
+6 pts if present Adapt to your stack
✨✨✨ MAGIC MOMENT #3 — THE PEAK ✨✨✨

Full Scan — All Four Axes

7 dimensions · 4 axes · Portable + Target layers · explained findings

$ /agent-ready scan . ╭─────────────────────────────────────────────╮ │ AGENT READINESS ASSESSMENT │ │ Repo: quicknote · Agents: claude │ │ ═══════════════════════════════════════ │ │ │ │ Overall: 84% │ 🏆 Optimized │ │ │ │ 📝 INSTRUCT ██████████████░░ 88% │ │ 🧭 NAVIGATE █████████████░░░ 82% │ │ ✅ VALIDATE ██████████████░░ 86% │ │ 🛡️ SECURE ████████████░░░░ 79% │ │ │ │ Layers: Portable 76/88 │ │ Target 8/12 (claude) │ │ │ │ Top gap: nav MCP server not wired (manual) │ ╰─────────────────────────────────────────────╯
All four axes, two layers
  • Portable counts for any agent · Target only with --agents
  • SECURE now scored alongside INSTRUCT / NAVIGATE / VALIDATE
  • Every gap below 100 ships a why / consequence / how-to-fix / effort
VISUALIZE YOUR PROGRESS

Your Agent Readiness Radar

📝 Instruct 88%
🧭 Navigate 82%
✅ Validate 86%
🛡️ Secure 79%

Put this in your engineering blog 📝
Share with your team 👥
Track progress over time 📈

The Agent Fixes What The Agent Needs 🤖

The same agent that scans can also generate fixes — contextualized to YOUR project

$ /agent-ready fix 🔍 Loading previous scores... 📊 Gaps identified (sorted by impact): 1. AGENTS.md missing [instruct] +18pt 2. No CI (tests+lint) [validate] +14pt 3. No exec/sandbox policy [secure] +12pt 4. No .env.example / lockfile [secure] +5pt ## Files to Generate ✨ AGENTS.md (+ CLAUDE.md symlink bridge) ✨ docs/agent-execution.md — sandbox policy ✨ .github/workflows/ci.yml — test + lint ✨ .env.example + .gitignore secret patterns Proceed? (y/n) > y ✅ Created: AGENTS.md (tailored to your project) 🔗 Created: CLAUDE.md → AGENTS.md (bridge, no drift) ✅ Created: docs/agent-execution.md, ci.yml, .env.example

Not boilerplate — the agent reads YOUR code and generates contextualized files

TAKEAWAY TIME

Generate Your Report Package 📦

$ /agent-ready report Generating comprehensive report... $ /agent-ready diff Comparing current state vs initial scan... ✅ Wrote .agent-ready/ artifacts $ ls .agent-ready/ agent-ready-report.md agent-ready-scores.json agent-ready-scores.prev.json badge.svg
📄 agent-ready-report.md

Layered report + explained findings & roadmap

🔢 agent-ready-scores.json

schema v2 — the contract for fix/diff

🏷️ badge.svg

Put in your README!

📊 --format html

Self-contained HTML report, works offline

What You Take Home 🎁

🧠

Mental Model

4 axes — INSTRUCT · NAVIGATE · VALIDATE · SECURE — to size up ANY codebase

🔢

A Number

Your codebase's score — and how much you improved it today

🛠️

A Toolkit

6 Agent Skills — scan, fix, report, diff, init — AGENTS.md-first, any agent

Not theory. Not hype.
Your repo. Your score. Your improvements.

NEW PROJECT?

Start Agent-Ready on Day One 🌱

Everything today was brownfield — fixing an existing repo. For a brand-new project, don't accumulate the debt in the first place.

$ /agent-ready init . --agents claude 🌱 Scaffolding a portable-first baseline… ✨ AGENTS.md (< 200 lines) 🔗 CLAUDE.md → AGENTS.md ✨ .env.example + .gitignore secret coverage ✨ docs/agent-execution.md (sandbox policy) ✨ .github/workflows/ci.yml + .pre-commit-config.yaml ✨ specs/TEMPLATE.md Baseline score: 41/100 🟡 Partially Ready

init (greenfield) sets an opinionated baseline · fix (brownfield) remediates by impact. Same rubric, two entry points.

🙏

Thank You!

Questions? Let's discuss.

⭐ Star the Repo 🐛 Found a Bug?

Repo: RisorseArtificiali/agent-ready-skill
License: MIT
Dependencies: rich, pyyaml, jinja2 (that's it!)

Built with ❤️ by Stefano Maestri • PyCon Italia 2026