Python Agent Control Plane – Clean Architecture Blueprint

Mark Kendall
Feb 8
3 min read

## Purpose

Design a durable, enterprise-grade agent platform that avoids past orchestration failures (BizTalk-style opacity) while enabling safe, scalable AI-driven workflows.

---

## Core Principles (Non-Negotiable)

1. **Agents are products**

- Versioned

- Tested

- Observable

- Owned

2. **LLMs reason, code executes**

- No direct side effects without deterministic code paths

3. **State is explicit**

- Every run has IDs, steps, inputs, outputs

- Replayable and inspectable

4. **Everything is diffable**

- Git is the source of truth

- UI is a view, not authority

5. **Failure is designed**

- Timeouts, retries, fallbacks, escalation paths

---

## High-Level Architecture

### Entry Layer

**Agent Gateway (FastAPI)**

from fastapi import FastAPI, HTTPException, Depends

from pydantic import BaseModel

from typing import List, Optional

app = FastAPI(title="Python Agent Control Plane")

# 1. DATA MODELS (Registry/Knowledge Management)

class WorkRequest(BaseModel):

task_id: str

intent: str

payload: dict

# 2. POLICY ENGINE (Mocking the Policy & Permissions block)

def check_permissions(request: WorkRequest):

# In a real app, this queries the 'Policy & Permissions' block

if "admin" in request.intent.lower():

raise HTTPException(status_code=403, detail="Unauthorized Agent Action")

return True

# 3. ORCHESTRATOR (The logic brain)

class Orchestrator:

def route_to_agent(self, request: WorkRequest):

# Logic to pick an agent from the 'Registry'

return {"status": "Executing", "agent": "Python_Agent_01", "task": request.intent}

orchestrator = Orchestrator()

# 4. AGENT GATEWAY (The Entry Point)

@app.post("/gateway/dispatch")

async def dispatch_work(request: WorkRequest, authorized: bool = Depends(check_permissions)):

"""

This endpoint represents the 'Agent Gateway' in your diagram.

It validates, checks policy, and hands off to the Orchestrator.

"""

result = orchestrator.route_to_agent(request)

return result

- Accepts work requests

- Issues `run_id`

- AuthN/AuthZ

- Rate limits and quotas

### Control Plane

**Run Registry**

- Stores run metadata

- Tracks lifecycle state

- Links audit and observability data

**Policy & Permissions**

- Tool allowlists

- Environment boundaries

- Cost and token budgets

- Data classification rules

---

## Orchestration Layer

**State Machine–Driven Orchestrator**

- Explicit step types:

- Plan

- ToolCall

- Validate

- Decide

- HumanApproval

- Retry / Fallback

- Complete / Fail

- Deterministic transitions

- No hidden state

Recommended:

- LangGraph or custom FSM

- Celery / Arq / Kafka workers

---

## Agent Execution

**Python Agents**

- Stateless by default

- Receive:

- Context

- Constraints

- Tool contracts

- Return structured outputs only

Agents never:

- Self-assign permissions

- Bypass policy

- Persist hidden memory

---

## Tooling Layer

**Tool Adapters**

- Wrap all external systems

- Enforce:

- Schemas

- Idempotency

- Timeouts

- Retries with backoff

- Circuit breakers

Tools must be independently testable without LLMs.

---

## Memory & Knowledge

### Short-Term Memory

- Per-run state

- Stored in Postgres or Redis

### Long-Term Knowledge

- Explicit facts only

- Vector search is optional, not default

- No hallucinated persistence

---

## Observability & Governance

**Required Signals**

- Traces: every LLM + tool call

- Logs: structured JSON with `run_id`

- Metrics:

- latency

- failure rate

- retries

- cost

**Audit**

- Who initiated

- What changed

- When and why

OpenTelemetry from day one.

---

## Agent Contract

### Input Schema

- task

- context (structured)

- constraints (time, cost, scope)

- output_spec (JSON schema)

### Output Schema

- result

- evidence

- actions_taken

- warnings

---

## Anti-BizTalk Guardrails

- No visual-only logic

- No implicit state

- No “platform magic”

- Manual fallback path always exists

- Engineers can debug with logs + code alone

---

## Recommended Initial Build Order

1. Agent Gateway + Run Registry

2. Core Orchestrator FSM

3. 3 Production Tool Adapters

4. Full Observability

5. Offline Evaluation Harness

---

## Final Reminder

Agents don’t remove engineering.

They **concentrate it**.

Judgment, ownership, and clarity remain the system’s load-bearing walls.

Python Agent Control Plane – Clean Architecture Blueprint

Recent Posts

Comments

Subscribe Form