🤖 PyCon Italia 2026 • Workshop

Measuring
AI-Readiness

A Three-Axis Maturity Model for Agent-Optimized Codebases

📅 May 29, 2026 ⏰ 11:00–13:00 🔧 Hands-on Workshop

Hi, I'm Stefano Maestri

🏗️ Software architect & builder of things that should work

🤖 Working with AI agents since before it was cool (it was still questionable)

📦 Creator of agent-ready-skill — the toolkit we'll use today

🎯 Belief: AI agents are only as good as your codebase's readiness for them

🧠

The Goal Today

Leave with a number, a mental model, and a toolkit you can use on Monday.

What We'll Do Together

🔴

Phase 1

Mental Model
The Axis Framework + Live Demo

15 min

🔵

Phase 2

INSTRUCT
Scan your repo + Quickest win

25 min

🟢

Phase 3

NAVIGATE + VALIDATE
Tooling, tests, CI

50 min

3

Magic Moments

1

Toolkit to reuse

∞

Aha moments

THE PROMISE

In 120 Minutes You'll Have:

📊

A Number

Numerical assessment of YOUR codebase across 3 axes (0–100%)

🧠

A Mental Model

A framework to think about agent-readiness that you can explain to your team

🛠️

A Toolkit

Working scanner, fixer, and report generator you can run on any repo

Not theory. Not hype.
Your repository. Your score. Your improvements.

Raise Your Hand If… 🙋

1️⃣

You've written Python in production

2️⃣

You've used an AI coding agent (Cursor, Claude Code, Copilot, etc.)

3️⃣

An agent has messed up your codebase at least once

🎯

You want to fix that last one.

THE PROBLEM

"It Worked on My Machine"
→
"It Worked in My Prompt"

We went from blaming the environment
to blaming the model temperature.

"Just ask it again with a better prompt" is the new
"Works on my machine — just reinstall Windows."

A True Story 🫠

The Ask

"Add validation to our API endpoint"

What Happened

Added tests… in the wrong directory
Followed linting rules… from 2 years ago
Couldn't find where request schemas live
Used Flask patterns… we're on FastAPI

Root Cause

The agent wasn't stupid.
It was uninformed.

💥

Why Agents Fail

❓

Doesn't Understand

"What does this project even do? What conventions should I follow?"

→ INSTRUCT

🔄

Can't Navigate

"Where do tests go? What's the entry point? Why is this 2000 lines?"

→ NAVIGATE

❌

Can't Validate

"Did my change break anything? Is there CI? Do tests pass?"

→ VALIDATE

Three failure modes → Three axes to measure

KEY INSIGHT

It's Not the Agent's Fault

Agents aren't stupid.
They're uninformed.

An agent landing in your codebase is like a senior developer on day one.

Would you expect them to be productive without:
✅ A README explaining what the project does
✅ Clear coding conventions
✅ Knowledge of where things live
✅ Tests to verify their changes

No? Then why expect it from an agent?

The Three Questions

Q1: Does the agent understand WHAT we want?

Q2: Can the agent find WHERE things are?

Q3: Can the agent tell if it did RIGHT?

…and a 4th question we'll meet shortly: can the agent run SAFELY? 🛡️

📝

Axis 1: INSTRUCT

"Does the agent understand WHAT we want?"

Weight: 28%

📋

D1 · Agent Instructions & Context

AGENTS.md first (cross-vendor), quality over bloat, scoped files. Scores conciseness — bloated instructions hurt.

+18 pts

🏗️

D7 · Spec-Driven Workflow & Docs

specs/ with acceptance criteria, ADRs, issue/PR templates, ARCHITECTURE + comprehension signals.

+10 pts

🧭

Axis 2: NAVIGATE

"Can the agent find its way around?"

Weight: 30%

🗺️

D2 · Navigability & Code Intelligence

Repo map, semantic-nav amenability (typed code), dependency clarity, README, file-size sanity. Depth/naming heuristics retired.

+18 pts

🔧

D5 · Agent Tooling & Capabilities

Standard Skills, bundled scripts, MCP declaration + nav servers (Serena/Sourcegraph) actually wired up.

+12 pts

✅

Axis 3: VALIDATE

"Can the agent tell if it did it RIGHT?"

Weight: 30%

🧪

D3 · Testing & Feedback

Test suite, documented + fast commands, coverage — and feedback quality: descriptive assertions + a type checker the agent can read.

+16 pts

🔄

D4 · CI/CD, Automation & Governance

CI runs tests + lint, automated formatting, pre-commit, governance (CODEOWNERS + Dependabot/Renovate).

+14 pts

PLOT TWIST

🛡️

Axis 4: SECURE

I submitted this talk with three axes. Building v2, one question kept surfacing that none of them answered:
"Can the agent run safely?"

Weight: 12%

📦

Sandbox & Isolation

Committed devcontainer, documented execution policy (LINCE / OS-sandbox / hosted)

🔑

Secret & Supply-Chain Hygiene

.gitignore secrets, .env.example, committed lockfiles, Dependabot

💉

Injection & Permissions

Instructions only in trusted files; restrictive agent deny rules (CVE-2025-59536)

Security & Sandbox = 12% · D6 · the only axis with a single dimension — for now.

Maturity Levels

From chaos to autonomy — 5 stages

L5 Autonomous

≥95%

L4 Optimized

≥85%

L3 Structured

≥70%

L2 Guided

≥55%

L1 Foundational

≥40%

Each level requires 80% of criteria met at that tier.
You can't skip levels — they're sequential.

How Scoring Works

$ /agent-ready scan . ╭─────────────────────────────────────╮ │ 📝 INSTRUCT ████████░░░░ 58% │ │ 🧭 NAVIGATE ██████████░░ 71% │ │ ✅ VALIDATE █████░░░░░░░ 39% │ │ 🛡️ SECURE ███░░░░░░░░░ 25% │ ├─────────────────────────────────────┤ │ Layers (v2): │ │ Portable 54 / 88 │ │ Target-specific n/a (no --agents)│ ╰─────────────────────────────────────╯ Overall: 52% │ 🟡 Partially Ready

Step 1: Agent scans 7 dimensions

Evidence-based: file presence + content quality, not folklore

Step 2: Every sub-criterion tagged

Portable (any agent) vs Target (only with --agents)

Step 3: 7 dims → 4 axes → maturity level

Weighted average, gated at 40/55/70/85/95

Agent-Powered

The agent itself assesses your code — not a static script

Meet the Test Subjects 🐁

🔴

Repo A: Chaos Monkey

❌ No AGENTS.md, no README

❌ No linter, no type checker

❌ No tests, no CI

❌ No sandbox policy, secrets leak

❌ No .env.example, no lockfile

Expected: ~20%

🟢

Repo B: Agent Heaven

✅ Tight AGENTS.md (+ MCP wired)

✅ Ruff + mypy strict, typed code

✅ Tests + coverage + CI

✅ devcontainer + exec policy

✅ .env.example + lockfile + Dependabot

Expected: ~85%

Same language (Python). Same complexity.
Different readiness.

✨ MAGIC MOMENT #1

Same Tool. Different Scores.

Chaos Monkey

23%

Level 1: Foundational

Agent Heaven

81%

Level 4: Optimized

The difference isn't the agent.
It's the codebase.

Phase 2 — Hands-On

📝 INSTRUCT

"Does the agent understand WHAT we want?"

~25 minutes

Your Turn 🔍

Scan YOUR repository — Axis 1 only

$ /agent-ready scan . ↑ Scanning your repo... (results appear here)

No repo?

Use repos/demo-bad/ or demo-good/

Stuck?

Ask a neighbor or raise hand 🙋

QUICKEST WIN

One File = biggest INSTRUCT jump

Ask Claude to write your AGENTS.md → bridge CLAUDE.md with a symlink

> Read this project and write a concise AGENTS.md > (< 200 lines): overview, build/test/lint, structure, > conventions, a short safe-to-run note. # AGENTS.md — quicknote ## What this project is …one paragraph ## Build · test · lint pip install -e ".[dev]" · pytest · ruff ## Conventions double quotes, type hints, snake_case ## Pitfalls all persistence via storage.py ## Safe to run pytest, ruff check, git status $ ln -s AGENTS.md CLAUDE.md # bridge, no drift

Let the agent draft it → review → save. Concise beats bloated (v2 penalizes walls of text).

✨ MAGIC MOMENT #2

Before → After

BEFORE

35%

→

AFTER

63%

+28 points!

"That's it? One file?"
Yes. Agent readiness isn't about rewriting your codebase.
It's about giving agents the right information.

Phase 3 — Hands-On

🧭 NAVIGATE

"Can the agent find its way around?"

~20 minutes

Scan Your Repo — Axis 2 🔍

Same tool. New flag. Different lens.

$ /agent-ready scan . ╭──────────────────────────────────────╮ │ 🧭 AXIS 2: NAVIGATE │ │ ════════════════════════════════ │ │ │ │ Score: 52/100 ██████████░░░░ 52% │ │ │ │ ✅ .editorconfig found +2 │ │ ✅ pyproject.toml found +2 │ │ ✅ ruff configured [tool] +4 │ │ ❌ pyrightconfig.json missing -4 │ │ ❌ .env.example missing -2 │ │ ❌ Dockerfile missing -2 │ │ ⚠️ Large files (>300ln) 3 found -2 │ ╰──────────────────────────────────────╯

Why each check matters

No .editorconfig → agent mixes tabs and spaces → CI fails

No type checker → agent can't verify its changes compile

No .env.example → agent guesses env vars → broken config

Large files → context window saturates → hallucinations

Pick ONE Fix

Based on your scan results — choose what helps most

🅰️

.env.example

Document your env vars

30 sec +4 pts

🅱️

pyrightconfig.json

Enable type checking

2 min +6 pts

🅲

.editorconfig

Consistent editor settings

1 min +2 pts

Or 🅳: Configure Ruff properly (~3 min, +6 pts)

Phase 4 — Hands-On

✅ VALIDATE

"Can the agent tell if it did RIGHT?"

~30 minutes

Scan Your Repo — Axis 3 🔍

Where most repos bleed points

$ /agent-ready scan . ╭──────────────────────────────────────╮ │ ✅ AXIS 3: VALIDATE │ │ ════════════════════════════════ │ │ │ │ Score: 28/100 ████████░░░░░░░ 28% │ │ │ │ ✅ pytest configured [tool.pytest] +6│ │ ✅ test files found 12 files +5 │ │ ✅ poetry.lock present +2 │ │ ❌ CI workflow missing -4 │ │ ❌ coverage configured no -4 │ │ ❌ SECURITY.md missing -2 │ │ ❌ pre-commit missing -2 │ ╰──────────────────────────────────────╯

The pattern is everywhere: repos test the code, but the codebase doesn't test itself. No CI, no coverage, no pre-commit. The agent has nothing to lean on.

CONTROVERSIAL TAKE

The Testing Paradox

Tests are NOT for finding bugs.

Tests are for giving AGENTS a safety net.

Without Tests

Agent writes code
You manually verify everything
Bottleneck. Fatigue. Mistakes.

With Tests

Agent writes code
Agent runs tests → Pass ✓
You review PR confidently

Tests multiply your effectiveness with agents.

The CI File 🔄

Copy. Adapt. Commit. Done.

.github/workflows/ci.yml

# Copy this template to your repo name: CI on: [push, pull_request] jobs: quality: runs-on: ubuntu-latest steps: - uses: actions/checkout@v4 - uses: actions/setup-python@v5 with: { python-version: "3.12" } - run: pip install ruff pytest - run: ruff check . - run: ruff format --check test: runs-on: ubuntu-latest steps: - uses: actions/checkout@v4 - uses: actions/setup-python@v5 - run: pip install -e ".[dev]" - run: pytest tests/ -v --tb=short

+6 pts if present Adapt to your stack

✨✨✨ MAGIC MOMENT #3 — THE PEAK ✨✨✨

Full Scan — All Four Axes

7 dimensions · 4 axes · Portable + Target layers · explained findings

$ /agent-ready scan . ╭─────────────────────────────────────────────╮ │ AGENT READINESS ASSESSMENT │ │ Repo: quicknote · Agents: claude │ │ ═══════════════════════════════════════ │ │ │ │ Overall: 84% │ 🏆 Optimized │ │ │ │ 📝 INSTRUCT ██████████████░░ 88% │ │ 🧭 NAVIGATE █████████████░░░ 82% │ │ ✅ VALIDATE ██████████████░░ 86% │ │ 🛡️ SECURE ████████████░░░░ 79% │ │ │ │ Layers: Portable 76/88 │ │ Target 8/12 (claude) │ │ │ │ Top gap: nav MCP server not wired (manual) │ ╰─────────────────────────────────────────────╯

All four axes, two layers

Portable counts for any agent · Target only with --agents
SECURE now scored alongside INSTRUCT / NAVIGATE / VALIDATE
Every gap below 100 ships a why / consequence / how-to-fix / effort

VISUALIZE YOUR PROGRESS

Your Agent Readiness Radar

📝 Instruct 88%

🧭 Navigate 82%

✅ Validate 86%

🛡️ Secure 79%

Put this in your engineering blog 📝
Share with your team 👥
Track progress over time 📈

The Agent Fixes What The Agent Needs 🤖

The same agent that scans can also generate fixes — contextualized to YOUR project

$ /agent-ready fix 🔍 Loading previous scores... 📊 Gaps identified (sorted by impact): 1. AGENTS.md missing [instruct] +18pt 2. No CI (tests+lint) [validate] +14pt 3. No exec/sandbox policy [secure] +12pt 4. No .env.example / lockfile [secure] +5pt ## Files to Generate ✨ AGENTS.md (+ CLAUDE.md symlink bridge) ✨ docs/agent-execution.md — sandbox policy ✨ .github/workflows/ci.yml — test + lint ✨ .env.example + .gitignore secret patterns Proceed? (y/n) > y ✅ Created: AGENTS.md (tailored to your project) 🔗 Created: CLAUDE.md → AGENTS.md (bridge, no drift) ✅ Created: docs/agent-execution.md, ci.yml, .env.example

Not boilerplate — the agent reads YOUR code and generates contextualized files

TAKEAWAY TIME

Generate Your Report Package 📦

$ /agent-ready report Generating comprehensive report... $ /agent-ready diff Comparing current state vs initial scan... ✅ Wrote .agent-ready/ artifacts $ ls .agent-ready/ agent-ready-report.md agent-ready-scores.json agent-ready-scores.prev.json badge.svg

📄 agent-ready-report.md

Layered report + explained findings & roadmap

🔢 agent-ready-scores.json

schema v2 — the contract for fix/diff

🏷️ badge.svg

Put in your README!

📊 --format html

Self-contained HTML report, works offline

What You Take Home 🎁

🧠

Mental Model

4 axes — INSTRUCT · NAVIGATE · VALIDATE · SECURE — to size up ANY codebase

🔢

A Number

Your codebase's score — and how much you improved it today

🛠️

A Toolkit

6 Agent Skills — scan, fix, report, diff, init — AGENTS.md-first, any agent

Not theory. Not hype.
Your repo. Your score. Your improvements.

NEW PROJECT?

Start Agent-Ready on Day One 🌱

Everything today was brownfield — fixing an existing repo. For a brand-new project, don't accumulate the debt in the first place.

$ /agent-ready init . --agents claude 🌱 Scaffolding a portable-first baseline… ✨ AGENTS.md (< 200 lines) 🔗 CLAUDE.md → AGENTS.md ✨ .env.example + .gitignore secret coverage ✨ docs/agent-execution.md (sandbox policy) ✨ .github/workflows/ci.yml + .pre-commit-config.yaml ✨ specs/TEMPLATE.md Baseline score: 41/100 🟡 Partially Ready

init (greenfield) sets an opinionated baseline · fix (brownfield) remediates by impact. Same rubric, two entry points.

🙏

Thank You!

Questions? Let's discuss.

⭐ Star the Repo 🐛 Found a Bug?

Repo: RisorseArtificiali/agent-ready-skill
License: MIT
Dependencies: rich, pyyaml, jinja2 (that's it!)

Built with ❤️ by Stefano Maestri • PyCon Italia 2026

MeasuringAI-Readiness

Hi, I'm Stefano Maestri

The Goal Today

What We'll Do Together

Phase 1

Phase 2

Phase 3

In 120 Minutes You'll Have:

A Number

A Mental Model

A Toolkit

Raise Your Hand If… 🙋

"It Worked on My Machine" → "It Worked in My Prompt"

A True Story 🫠

Why Agents Fail

Doesn't Understand

Can't Navigate

Can't Validate

It's Not the Agent's Fault

The Three Questions

Axis 1: INSTRUCT

D1 · Agent Instructions & Context

D7 · Spec-Driven Workflow & Docs

Axis 2: NAVIGATE

D2 · Navigability & Code Intelligence

D5 · Agent Tooling & Capabilities

Axis 3: VALIDATE

D3 · Testing & Feedback

D4 · CI/CD, Automation & Governance

Axis 4: SECURE

Sandbox & Isolation

Secret & Supply-Chain Hygiene

Injection & Permissions

Maturity Levels

How Scoring Works

Meet the Test Subjects 🐁

Repo A: Chaos Monkey

Repo B: Agent Heaven

Same Tool. Different Scores.

Chaos Monkey

Agent Heaven

📝 INSTRUCT

Your Turn 🔍

One File = biggest INSTRUCT jump

Before → After

🧭 NAVIGATE

Scan Your Repo — Axis 2 🔍

Pick ONE Fix

.env.example

pyrightconfig.json

.editorconfig

✅ VALIDATE

Scan Your Repo — Axis 3 🔍

The Testing Paradox

The CI File 🔄

Full Scan — All Four Axes

Your Agent Readiness Radar

The Agent Fixes What The Agent Needs 🤖

Generate Your Report Package 📦

What You Take Home 🎁

Mental Model

A Number

A Toolkit

Start Agent-Ready on Day One 🌱

Thank You!

Measuring
AI-Readiness

"It Worked on My Machine"
→
"It Worked in My Prompt"