
Python Agent Control Plane – Clean Architecture Blueprint
- Mark Kendall
- Feb 8
- 3 min read
Python Agent Control Plane – Clean Architecture Blueprint
## Purpose
Design a durable, enterprise-grade agent platform that avoids past orchestration failures (BizTalk-style opacity) while enabling safe, scalable AI-driven workflows.
---
## Core Principles (Non-Negotiable)
1. **Agents are products**
- Versioned
- Tested
- Observable
- Owned
2. **LLMs reason, code executes**
- No direct side effects without deterministic code paths
3. **State is explicit**
- Every run has IDs, steps, inputs, outputs
- Replayable and inspectable
4. **Everything is diffable**
- Git is the source of truth
- UI is a view, not authority
5. **Failure is designed**
- Timeouts, retries, fallbacks, escalation paths
---
## High-Level Architecture
### Entry Layer
**Agent Gateway (FastAPI)**
from fastapi import FastAPI, HTTPException, Depends
from pydantic import BaseModel
from typing import List, Optional
app = FastAPI(title="Python Agent Control Plane")
# 1. DATA MODELS (Registry/Knowledge Management)
class WorkRequest(BaseModel):
task_id: str
intent: str
payload: dict
# 2. POLICY ENGINE (Mocking the Policy & Permissions block)
def check_permissions(request: WorkRequest):
# In a real app, this queries the 'Policy & Permissions' block
if "admin" in request.intent.lower():
raise HTTPException(status_code=403, detail="Unauthorized Agent Action")
return True
# 3. ORCHESTRATOR (The logic brain)
class Orchestrator:
def route_to_agent(self, request: WorkRequest):
# Logic to pick an agent from the 'Registry'
return {"status": "Executing", "agent": "Python_Agent_01", "task": request.intent}
orchestrator = Orchestrator()
# 4. AGENT GATEWAY (The Entry Point)
@app.post("/gateway/dispatch")
async def dispatch_work(request: WorkRequest, authorized: bool = Depends(check_permissions)):
"""
This endpoint represents the 'Agent Gateway' in your diagram.
It validates, checks policy, and hands off to the Orchestrator.
"""
result = orchestrator.route_to_agent(request)
return result
- Accepts work requests
- Issues `run_id`
- AuthN/AuthZ
- Rate limits and quotas
### Control Plane
**Run Registry**
- Stores run metadata
- Tracks lifecycle state
- Links audit and observability data
**Policy & Permissions**
- Tool allowlists
- Environment boundaries
- Cost and token budgets
- Data classification rules
---
## Orchestration Layer
**State Machine–Driven Orchestrator**
- Explicit step types:
- Plan
- ToolCall
- Validate
- Decide
- HumanApproval
- Retry / Fallback
- Complete / Fail
- Deterministic transitions
- No hidden state
Recommended:
- LangGraph or custom FSM
- Celery / Arq / Kafka workers
---
## Agent Execution
**Python Agents**
- Stateless by default
- Receive:
- Context
- Constraints
- Tool contracts
- Return structured outputs only
Agents never:
- Self-assign permissions
- Bypass policy
- Persist hidden memory
---
## Tooling Layer
**Tool Adapters**
- Wrap all external systems
- Enforce:
- Schemas
- Idempotency
- Timeouts
- Retries with backoff
- Circuit breakers
Tools must be independently testable without LLMs.
---
## Memory & Knowledge
### Short-Term Memory
- Per-run state
- Stored in Postgres or Redis
### Long-Term Knowledge
- Explicit facts only
- Vector search is optional, not default
- No hallucinated persistence
---
## Observability & Governance
**Required Signals**
- Traces: every LLM + tool call
- Logs: structured JSON with `run_id`
- Metrics:
- latency
- failure rate
- retries
- cost
**Audit**
- Who initiated
- What changed
- When and why
OpenTelemetry from day one.
---
## Agent Contract
### Input Schema
- task
- context (structured)
- constraints (time, cost, scope)
- output_spec (JSON schema)
### Output Schema
- result
- evidence
- actions_taken
- warnings
---
## Anti-BizTalk Guardrails
- No visual-only logic
- No implicit state
- No “platform magic”
- Manual fallback path always exists
- Engineers can debug with logs + code alone
---
## Recommended Initial Build Order
1. Agent Gateway + Run Registry
2. Core Orchestrator FSM
3. 3 Production Tool Adapters
4. Full Observability
5. Offline Evaluation Harness
---
## Final Reminder
Agents don’t remove engineering.
They **concentrate it**.
Judgment, ownership, and clarity remain the system’s load-bearing walls.
Comments