Where do the agents live?

Mark Kendall
Sep 14
4 min read

Where do the agents live?

Think “workers on the mesh,” not a new control plane.

• Lambda agents for bursty, short tasks (validation, enrichment, routing hints, DLQ triage).

• Containerized agents (Spring Boot / Python) on EKS/ECS for long-running tools (schema discovery, bulk remediation, test-data synthesis, complex orchestrations).

• Sidecar/companion agents next to critical microservices to advise/guard (policy, PII scrubs, cost/SLA hints).

• Human-in-the-loop stations (a tiny UI or Slack bot) that agents escalate to when confidence is low.

Why have them?

Because integration teams juggle variability: schemas evolve, rules drift, edge cases pop up, DLQs fill, and manual triage burns cycles. Agents:

• Reduce toil (auto-map fields, propose fixes, empty DLQs).

• Adapt faster (learn patterns from observability + docs).

• Improve quality (pre-flight checks before publishing to target topics).

• Keep humans in control (confidence thresholds & approvals).

What do they do (concrete roles)

Use your canonical topics and add focused agents. Example topic names match your style.

1. Inlet Quality Agent

Listens on eip.nb.locations.in (or any *.in).

• Validates against canonical schema, checks required business rules.

• If fixable: auto-patch common issues (trim, type cast), annotate reason, forward to *.validated.

• If risky: send to *.review and ping approvers.

2. Schema Mapper Agent

Trigger: eip.nb.locations.validated.

• Pulls live describes (e.g., Salesforce /describe, ServiceNow table metadata).

• Proposes a transform spec (JSONata/JOLT/MapStruct hints) from Canonical → Target.

• Emits spec to eip.sf.locations.transform.proposed; upon approval, stores versioned transform and posts to eip.sf.locations.ready.

3. Enrichment Agent

Trigger: .validated or .ready.

• Fills lookups (e.g., region, cost center, geo normalize), dedup checks, adds lineage tags.

• Writes to *.enriched with provenance.

4. Routing & Orchestration Agent

Trigger: any .ready|.enriched.

• Decides fan-out: Salesforce + ServiceNow + Data Lake?

• Publishes to eip.sf.in, eip.sn.in, eip.lake.in, etc., adding correlation IDs and SLO hints.

5. SLA/Backpressure Agent

Listens across topics; watches throughput, lag, error ratios.

• If lag ↑: nudges concurrency on Lambdas or scales consumers; can shed non-critical flows by deferring to *.defer.

• Posts advisories to Ops Slack.

6. Policy/Privacy Guard Agent

Sits before any egress topic.

• Redacts PII not permitted for that target; enforces “allowed actions” policies.

• Blocks & escalates when rules conflict.

7. DLQ Triage Agent

Trigger: eip.*.dlq.

• Clusters failures by root cause, proposes fixes (e.g., add mapping for buildingSubtype).

• Can auto-replay a sampled subset; if success rate > threshold, replays the batch; otherwise opens a ticket with a ready-made PR/transform snippet.

8. Explainer/Audit Agent

Triggered by correlation ID.

• Produces a step-by-step “why did this record end up here?” trail from logs, headers, and decisions.

• Great for audits and debugging.

9. Test-Data & Replay Agent

• Generates anonymized fixtures from prod shapes.

• Replays against new transforms in a shadow topic (*.shadow) and scores diff vs current prod behavior.

How they interact with your existing SNS + Canonical Services

[Source Adapter] --(raw)--> eip.nb.locations.in

[Inlet Quality Agent] --(ok)--> eip.nb.locations.validated

| \--(needs-review)--> eip.nb.locations.review

[Schema Mapper Agent] --(proposed map)--> eip.sf.locations.transform.proposed

| (approved) |

v v

(versioned transform store) [Human-in-loop]

[Enrichment Agent] ---> eip.nb.locations.enriched

[Routing Agent] ---> eip.sf.in, eip.sn.in, eip.datalake.in

+--> [Policy Guard Agent] (final checks) --> targets

+--> [SLA/Backpressure Agent] (observes/acts across topics)

[DLQ Triage Agent] <--- eip.*.dlq

[Explainer Agent] <--- query by correlationId/*

[Test-Data Agent] <--- shadow topics & CI/CD gates

Guardrails (so “agentic” doesn’t go rogue)

• Tool-use only: agents call approved tools (transform registry, mapping store, CI, ticketing, deploy) behind service accounts; no arbitrary internet calls.

• Policies first: every “act” requires a policy allowlist; higher-risk acts need human approval.

• Confidence thresholds: auto-fix below X% confidence → always escalate.

• Versioned everything: transforms, prompts, rules, and decisions carry versions + correlation IDs.

• Canary & shadow: new transforms run in shadow for N messages before promotion.

Minimal reference deployment on AWS (fits your hybrid)

• Backbone: SNS (you have it) + optional SQS for per-agent work queues; keep Kafka/MSK where you already have it.

• Compute:

• Lambdas for Quality/Policy/Explainer light-work.

• EKS/ECS services for Schema Mapper, DLQ Triage, Test-Data (often stateful/longer).

• Shared services:

• Transform registry (S3 + DynamoDB + signed URLs).

• Observability (CloudWatch + OpenSearch/Grafana; correlationId standard).

• Approval UI (tiny React app or Slack workflow).

• Secrets/IAM boundaries (KMS, least privilege roles).

• CI/CD:

• Every transform/agent change is code; PR → shadow replay → gate → promote; DLQ agent can open PRs with suggested patches.

What does “agentic” buy your Canonical Adapters?

• Adapters stay dumb & fast. Agents shoulder variability: mapping drift, data quality, routing decisions.

• Adaptive ops. Agents notice lag/cost/error spikes and act (scale, defer, route) within policy.

• Self-healing pipelines. DLQ fills? Triage agent groups, fixes, replays safely.

• Faster onboarding. New target? Mapper agent drafts 80% of the mapping from canonical → target, you approve and ship.

• Explainability. Explainer agent gives auditors a readable trail.

Quick start (incremental)

1. Pick one high-pain flow (e.g., Nautobot → Salesforce Locations).

2. Add Inlet Quality Agent (Lambda) + DLQ Triage Agent (container).

3. Introduce shadow topics and Explainer Agent with correlation IDs.

4. Move the Schema Mapper Agent into the loop; require human approval at first.

5. Add Policy Guard before egress.

6. Measure: DLQ rate, time-to-fix, manual touches, lead time.

If you want, I can turn this into a one-page reference diagram (PNG) and a short “agent contracts” spec (event shapes, headers, confidence/approval fields) tailored to your eip.*topics.

Where do the agents live?

Recent Posts

Comments

Subscribe Form