07 Jun 2026

Agent engineering: Pydantic AI

table of contents

This is the fifth take on the same news reader. The Claude Agent SDK version was a single Python file with tools and sub-agents. The Pi SDK version was a single TypeScript file with an event-based session. This time I used Pydantic AI, the agent framework from the Pydantic team.

Pydantic AI’s pitch is that since every AI library already depends on Pydantic for validation, you might as well build agents directly on it. The result feels like FastAPI for agents: decorators, type annotations, and automatic schema generation.

Structured output as the core idea

In previous versions, the agent called save_news_item as a side effect during execution. Neither Claude Code nor the Agent SDK let you define an output schema for sub-agents, so side-effect tools were the only way to get structured data out. Here, the agent returns a validated Pydantic model, and Python handles persistence afterward:

from pydantic import BaseModel, Field
from pydantic_ai import Agent

class NewsItem(BaseModel):
    title: str = Field(description="Article headline")
    url: str = Field(description="Link to the article")
    source: str = Field(description="Site name, e.g. 'Hacker News' or 'Lobsters'")
    tags: list[str] = Field(description="Topic tags")
    summary: str = Field(description="One-sentence summary")
    discussion_url: str | None = Field(default=None, description="Discussion page URL")

class ScraperResult(BaseModel):
    items: list[NewsItem]
    report: str = Field(description="Brief bullet-list report of what was scraped")

When you set output_type=ScraperResult on an agent, Pydantic AI validates the response against this schema and retries automatically if it doesn’t match.

Tools and sub-agents

The sub-agent that scrapes a single site has one tool:

site_scraper = Agent(
    "anthropic:claude-haiku-4-5",
    output_type=list[NewsItem],
    instructions=dedent("""\
        You scrape a single news site. Fetch the front page, identify items
        relevant to Python, developer tools, AI/ML, and software architecture.
        Return a list of NewsItem objects. Skip items about business/funding,
        social media drama, or unrelated topics.
    """),
)

@site_scraper.tool_plain
async def web_fetch(url: str) -> str:
    """Fetch a web page and return its content as markdown."""
    hostname = urlparse(url).hostname
    if hostname not in ALLOWED_DOMAINS:
        raise ValueError(f"Domain not allowed: {hostname}")
    resp = await http_client.get(url, follow_redirects=True)
    resp.raise_for_status()
    return MarkdownConverter(strip=["img", "script", "style"]).convert(resp.text)

An Agent is a stateless configuration object. Each .run() call starts a fresh conversation, so the same site_scraper handles both sites without any shared state between runs.

The coordinator agent doesn’t know which sites exist upfront. It has a scrape_site tool that delegates to the sub-agent:

coordinator = Agent(
    "anthropic:claude-haiku-4-5",
    output_type=ScraperResult,
    retries=3,
    instructions=dedent("""\
        You coordinate news scraping. For each site the user asks about, call
        scrape_site with the URL. Deduplicate results across sites if needed,
        then return a ScraperResult with the combined items and a brief report.

        Available sites: Hacker News (https://news.ycombinator.com),
        Lobsters (https://lobste.rs).
    """),
)

@coordinator.tool
async def scrape_site(ctx: RunContext[None], url: str) -> list[NewsItem]:
    """Scrape a news site and return relevant items."""
    result = await site_scraper.run(f"Fetch and process: {url}", usage=ctx.usage)
    return result.output

The main function just runs the coordinator:

result = await coordinator.run(
    "Scrape Hacker News and Lobsters for relevant tech news.",
)

If you change the prompt to “Scrape only Lobsters”, the coordinator calls scrape_site once. The agent decides the delegation, not the orchestration code.

Tool restriction

In the Claude Agent SDK, you pass an allowed_tools list with domain-scoped patterns like "WebFetch(domain:lobste.rs)". In Pi, you pass a tools allowlist. Both are subtractive: the agent starts with access to everything, and you narrow it down.

Pydantic AI works the other way. An agent only has the tools you register on it. The sub-agent has web_fetch and nothing else. There’s no built-in tool set to restrict because there’s nothing to subtract from.

The domain allowlist inside web_fetch is the remaining safety layer. Even if page content contains an injection attempt, the sub-agent can only fetch from two domains and must return list[NewsItem].

Comparing the SDKs

	Claude Agent SDK	Pi SDK	Pydantic AI
Language	Python	TypeScript	Python
Entry point	`query()` async generator	`createAgentSession()`	`agent.run()`
Tool definition	`@tool` + Pydantic schema	`defineTool()` + TypeBox	`@agent.tool` decorator
Structured output	Manual parsing	Not built-in	`output_type=MyModel` with auto-retry
Sub-agents	`AgentDefinition` objects	Agent-as-tool¹	Agent-as-tool
System prompt	`prompt` parameter	`systemPromptOverride` callback	`instructions` string
Tool restriction	`allowed_tools` list	`tools` allowlist	Only registered tools exist

The Claude Agent SDK has built-in sub-agent primitives. Pi and Pydantic AI both use agent-as-tool: you nest an agent call inside a tool function.

The full project is on GitHub.

Pi’s SDK doesn’t have a dedicated sub-agent primitive, but a tool can create a nested session with createAgentSession(). I didn’t use it in the Pi SDK version to keep things simple. ↩︎

Roman Imankulov

Agent engineering: Pydantic AI

Structured output as the core idea

Tools and sub-agents

Tool restriction

Comparing the SDKs