Where do the agents live?
- Mark Kendall
- Sep 14
- 4 min read
Where do the agents live?
Think “workers on the mesh,” not a new control plane.
• Lambda agents for bursty, short tasks (validation, enrichment, routing hints, DLQ triage).
• Containerized agents (Spring Boot / Python) on EKS/ECS for long-running tools (schema discovery, bulk remediation, test-data synthesis, complex orchestrations).
• Sidecar/companion agents next to critical microservices to advise/guard (policy, PII scrubs, cost/SLA hints).
• Human-in-the-loop stations (a tiny UI or Slack bot) that agents escalate to when confidence is low.
Why have them?
Because integration teams juggle variability: schemas evolve, rules drift, edge cases pop up, DLQs fill, and manual triage burns cycles. Agents:
• Reduce toil (auto-map fields, propose fixes, empty DLQs).
• Adapt faster (learn patterns from observability + docs).
• Improve quality (pre-flight checks before publishing to target topics).
• Keep humans in control (confidence thresholds & approvals).
What do they do (concrete roles)
Use your canonical topics and add focused agents. Example topic names match your style.
1. Inlet Quality Agent
Listens on eip.nb.locations.in (or any *.in).
• Validates against canonical schema, checks required business rules.
• If fixable: auto-patch common issues (trim, type cast), annotate reason, forward to *.validated.
• If risky: send to *.review and ping approvers.
2. Schema Mapper Agent
Trigger: eip.nb.locations.validated.
• Pulls live describes (e.g., Salesforce /describe, ServiceNow table metadata).
• Proposes a transform spec (JSONata/JOLT/MapStruct hints) from Canonical → Target.
• Emits spec to eip.sf.locations.transform.proposed; upon approval, stores versioned transform and posts to eip.sf.locations.ready.
3. Enrichment Agent
Trigger: .validated or .ready.
• Fills lookups (e.g., region, cost center, geo normalize), dedup checks, adds lineage tags.
• Writes to *.enriched with provenance.
4. Routing & Orchestration Agent
Trigger: any .ready|.enriched.
• Decides fan-out: Salesforce + ServiceNow + Data Lake?
• Publishes to eip.sf.in, eip.sn.in, eip.lake.in, etc., adding correlation IDs and SLO hints.
5. SLA/Backpressure Agent
Listens across topics; watches throughput, lag, error ratios.
• If lag ↑: nudges concurrency on Lambdas or scales consumers; can shed non-critical flows by deferring to *.defer.
• Posts advisories to Ops Slack.
6. Policy/Privacy Guard Agent
Sits before any egress topic.
• Redacts PII not permitted for that target; enforces “allowed actions” policies.
• Blocks & escalates when rules conflict.
7. DLQ Triage Agent
Trigger: eip.*.dlq.
• Clusters failures by root cause, proposes fixes (e.g., add mapping for buildingSubtype).
• Can auto-replay a sampled subset; if success rate > threshold, replays the batch; otherwise opens a ticket with a ready-made PR/transform snippet.
8. Explainer/Audit Agent
Triggered by correlation ID.
• Produces a step-by-step “why did this record end up here?” trail from logs, headers, and decisions.
• Great for audits and debugging.
9. Test-Data & Replay Agent
• Generates anonymized fixtures from prod shapes.
• Replays against new transforms in a shadow topic (*.shadow) and scores diff vs current prod behavior.
How they interact with your existing SNS + Canonical Services
[Source Adapter] --(raw)--> eip.nb.locations.in
|
v
[Inlet Quality Agent] --(ok)--> eip.nb.locations.validated
| \--(needs-review)--> eip.nb.locations.review
v
[Schema Mapper Agent] --(proposed map)--> eip.sf.locations.transform.proposed
| (approved) |
v v
(versioned transform store) [Human-in-loop]
|
v
[Enrichment Agent] ---> eip.nb.locations.enriched
|
v
[Routing Agent] ---> eip.sf.in, eip.sn.in, eip.datalake.in
|
+--> [Policy Guard Agent] (final checks) --> targets
|
+--> [SLA/Backpressure Agent] (observes/acts across topics)
[DLQ Triage Agent] <--- eip.*.dlq
[Explainer Agent] <--- query by correlationId/*
[Test-Data Agent] <--- shadow topics & CI/CD gates
Guardrails (so “agentic” doesn’t go rogue)
• Tool-use only: agents call approved tools (transform registry, mapping store, CI, ticketing, deploy) behind service accounts; no arbitrary internet calls.
• Policies first: every “act” requires a policy allowlist; higher-risk acts need human approval.
• Confidence thresholds: auto-fix below X% confidence → always escalate.
• Versioned everything: transforms, prompts, rules, and decisions carry versions + correlation IDs.
• Canary & shadow: new transforms run in shadow for N messages before promotion.
Minimal reference deployment on AWS (fits your hybrid)
• Backbone: SNS (you have it) + optional SQS for per-agent work queues; keep Kafka/MSK where you already have it.
• Compute:
• Lambdas for Quality/Policy/Explainer light-work.
• EKS/ECS services for Schema Mapper, DLQ Triage, Test-Data (often stateful/longer).
• Shared services:
• Transform registry (S3 + DynamoDB + signed URLs).
• Observability (CloudWatch + OpenSearch/Grafana; correlationId standard).
• Approval UI (tiny React app or Slack workflow).
• Secrets/IAM boundaries (KMS, least privilege roles).
• CI/CD:
• Every transform/agent change is code; PR → shadow replay → gate → promote; DLQ agent can open PRs with suggested patches.
What does “agentic” buy your Canonical Adapters?
• Adapters stay dumb & fast. Agents shoulder variability: mapping drift, data quality, routing decisions.
• Adaptive ops. Agents notice lag/cost/error spikes and act (scale, defer, route) within policy.
• Self-healing pipelines. DLQ fills? Triage agent groups, fixes, replays safely.
• Faster onboarding. New target? Mapper agent drafts 80% of the mapping from canonical → target, you approve and ship.
• Explainability. Explainer agent gives auditors a readable trail.
Quick start (incremental)
1. Pick one high-pain flow (e.g., Nautobot → Salesforce Locations).
2. Add Inlet Quality Agent (Lambda) + DLQ Triage Agent (container).
3. Introduce shadow topics and Explainer Agent with correlation IDs.
4. Move the Schema Mapper Agent into the loop; require human approval at first.
5. Add Policy Guard before egress.
6. Measure: DLQ rate, time-to-fix, manual touches, lead time.
If you want, I can turn this into a one-page reference diagram (PNG) and a short “agent contracts” spec (event shapes, headers, confidence/approval fields) tailored to your eip.*topics.

Comments