top of page
Search

reasoning framework your devs can run every time they pick up a new event-driven requirement (source adapter → canonical → target adapter; Kafka in the middle; DLQ ready)

  • Writer: Mark Kendall
    Mark Kendall
  • 2 days ago
  • 5 min read

Love it — you’ve got the infra and the code patterns down. Here’s a lightweight, repeatable reasoning framework your devs can run every time they pick up a new event-driven requirement (source adapter → canonical → target adapter; Kafka in the middle; DLQ ready). It’s designed to be fast to fill, hard to get wrong, and easy to review.


The 12-Box Adapter Canvas (fill this before coding)

1. Business Outcome & KPI


• What changes for the user/system?

• KPI/SLO: latency ___ ms, success rate %, cost ceiling $/month.

• Owner/on-call: ______


2. Trigger & Source


• Event type(s): create/update/delete/batch/command.

• Ingress: API Gateway | Kafka | SQS | EventBridge | Cron.

• Ordering requirement? (per customer/order/etc.) Key: ______


3. Upstream Contract


• Schema/spec link + version.

• Required vs optional fields.

• Max payload size & rate (p95 TPS).

• Idempotency key available? (eventId, externalId)


4. Canonical Model


• Canonical object: e.g., TMF.ProductOrder.v5

• Versioning plan (back/forward compat).

• Field map table (source→canonical, defaults, units).


5. Validation Rules


• Syntactic: JSON/schema validation, enums, formats.

• Semantic: cross-field rules, referential checks, duplicates.

• PII redaction/masking policy.


6. Transform & Enrichment


• Lookups (cache/DB/service).

• Derived fields.

• Timezone/units normalization.


7. Routing & Topics


• Publish topic(s): eip.<domain>.<event>.in

• Partition key & header conventions (tenant, correlationId, schemaVer).

• Fan-out rules (who consumes, why).


8. Idempotency & Exactly-Once Story


• Idempotency store/strategy (key: ______; TTL: ___ days).

• Side-effect guards (read-before-write? conditional upserts?).

• Outbox / transactional publish (if needed).


9. Failure Taxonomy & DLQ


• Retryable vs non-retryable vs poison.

• Retry policy (attempts/backoff/timeout).

• DLQ topic naming: eip.<domain>.<stage>.dlq + reason code schema.

• Replay procedure (who runs it, how).


10. Performance & Cost


• Lambda memory/timeout targets.

• Batch size/prefetch/concurrency caps (per partition key if ordering).

• p95/p99 targets; load-test plan.


11. Security & Compliance


• AuthN/Z (JWT, mTLS, IAM).

• Secrets (provider, rotation, scope).

• Data retention & audit fields.

• Least-privilege IAM topic/table policies.


12. Observability & Ops


• Logs: structured JSON w/ correlationId, eventId.

• Metrics: accepts/validations/failures/latency/cost.

• Tracing: span boundaries (ingress → transform → publish).

• Dashboards/alerts (thresholds + runbooks).

• Rollout/rollback (canary %, feature flags).



Map the Canvas → Implementation (mechanical steps)

1. Contracts First

• Check in upstream schema + canonical schema + mapping table.

• Add version gates: reject unknown future versions or tolerate?

2. Choose Event Source

• Ordered stream per key → Kafka consumer Lambda (partitioned).

• Request/response → API Gateway Lambda.

• Fan-out timers → EventBridge Scheduler.

3. Name & Configure Topics

eip.<domain>.<entity>.<verb>.in → canonical;

<...>.dlq for each processing stage; headers carry schemaVer, correlationId.

4. Idempotency Guard

• Pick key (eventId/externalId+source).

• Implement “seen” check (fast store) + safe upsert on target.

5. Lambda Knobs (per NFR)

• Timeout = p99 path + margin.

• Memory = transform CPU needs; tune cost/latency.

• Batch size & concurrency: preserve ordering per key.

• Retries: align to taxonomy.

6. Observability

• Emit accept, validate_fail, transform_fail, publish_ok, publish_fail.

• One dashboard per adapter; SLO alerts wire to on-call.

7. Security

• Scope secrets to adapter; least-privilege topic ACLs.

• Mask PII in logs; tag data classes.

8. Deploy & Prove

• Contract tests (schema compat), mapping tests, golden payloads, chaos tests (bad enums, missing fields).

• Load test to target TPS with real partition key skews.

• Canary on a small traffic slice; bake; then ramp.



Fast Decision Helpers


Trigger chooser

• Need strict ordering per customer/order? → Kafka w/ partition key.

• Synchronous API to caller? → API Gateway + immediate validate/publish.

• Scheduled jobs/batches? → EventBridge Scheduler.


DLQ policy

• Put full payload + error envelope (stage, reason, stack hash, firstSeen, attemptCount).

• Alert on DLQ creation > N/min; provide replay tool that re-publishes after fix.


Retries

• Validation errors → no retry, direct DLQ.

• Transient downstream/partition throttling → N retries w/ exponential backoff.

• Unknown errors → small N, then DLQ; open an incident if rate spikes.


Ordering vs parallelism

• If ordering matters, set consumer concurrency to number of partitions and keep per-key affinity; never open >1 concurrent worker on same partition.



Definition of Done (adapter)

• Canvas completed & reviewed.

• Schemas + mapping committed & versioned.

• Golden test vectors for happy/sad paths.

• Idempotency guard proven with duplicate replay.

• DLQ message contract & replay runbook documented.

• Dashboards + alerts deployed and firing in test.

• Load test at p95 TPS with 2× burst passes SLO.

• Rollback plan tested (disable consumer/canary flip).

• Security review (secrets, PII, IAM).

• Cost estimate documented (≈ $/million messages).



Fill-in Templates (drop into your repo)


1) Adapter Canvas (Markdown)


# Adapter: <Domain>-<Entity>-<Verb>

Owner: <team> • On-call: <rotor> • SLO: <latency/availability> • Target Cost:<$>


## 1. Business Outcome & KPI

<one paragraph + KPI table>


## 2. Trigger & Source

Type: <API/Kafka/...> • Ordering: <none|per key> (key: <...>) • Expected TPS: <...>


## 3. Upstream Contract

Spec: <link+version> • Required fields: <...> • Max size: <...> • Idempotency key: <...>


## 4. Canonical Model

Object: <...> v<...> • Compat policy: <back/forward>

Mapping table: source.field → canonical.field (default, transform)


## 5. Validation Rules

- Syntax: <json schema link>

- Semantic: <rules>

- PII: <masking>


## 6. Transform & Enrichment

Lookups: <...> • Derived fields: <...>


## 7. Routing & Topics

Publish to: `eip.<domain>.<entity>.<verb>.in`

Headers: correlationId, schemaVer, tenant, idempotencyKey


## 8. Idempotency & Side-Effects

Key: <...> • Store: <...> • Strategy: <first-write-wins|upsert|compare-and-swap>


## 9. Failure & DLQ

Taxonomy: <...> • Retries: <...> • DLQ: `eip.<domain>.<stage>.dlq` • Replay runbook: <link>


## 10. Performance & Cost

Lambda: memory <...>, timeout <...>, batch <...>, concurrency <...>

Targets: p95 <...> ms, p99 <...> ms • Est. $/MM msgs: <...>


## 11. Security & Compliance

Auth: <...> • IAM: <...> • Secrets: <...> • Retention: <...>


## 12. Observability & Ops

Logs: <fields> • Metrics: <list> • Traces: <spans> • Alerts: <thresholds> • Rollout: <canary/flags>


2) Mapping Table (CSV for quick diffs)


source_field,canonical_field,required,default,transform,notes

id,order.id,required,,,

timestamp,order.occurredAt,required,,parseIsoToUtc(),

...


3) Error Envelope (JSON contract)


{

  "correlationId": "<uuid>",

  "stage": "validate|transform|publish|target",

  "reasonCode": "SCHEMA_MISSING_FIELD|DOWNSTREAM_429|UNKNOWN",

  "message": "<human readable>",

  "attempt": 3,

  "firstSeen": "2025-09-10T20:00:00Z",

  "payload": { /* original */ },

  "context": { "schemaVer":"v5", "tenant":"acme" }

}


4) Idempotency Key Rule (YAML)


idempotency:

  key: "${sourceSystem}:${eventType}:${externalId}"

  store: "dynamodb|redis|kafka-header-index"

  ttlDays: 30

  mode: "first-write-wins"




Mini Example (worked in 60 seconds)


Use case: TMF ProductOrder.create from Source A → canonical → publish to Kafka → Target B consumes and opens/updates Salesforce Case.

• Ordering: per customerId (partition key = customerId).

• Idempotency: ${source}:${eventType}:${orderId} in a 30-day store.

• Validation: TMF v5 JSON Schema + semantic rule “lineItems[].quantity > 0”.

• Topics:

eip.orders.validate.dlq, eip.orders.transform.dlq, eip.orders.target.dlq

• Lambda knobs: timeout 10s, memory 512MB, batch 100, reserved concurrency = number of partitions expected to be hot (protects target).

• Metrics: accepts/sec, validations_failed/sec, dlq/sec, p95 end-to-end latency, cost/msg.

• Alerts: DLQ > 5/min for 5 min; p95 > SLO for 10 min; missing traffic for 15 min.



How to use this in practice

1. Print the 12-Box Canvas, fill it in your design issue.

2. Commit the schemas + mapping CSV + error envelope + idempotency YAML next to your Lambda code.

3. In PR review, require: canvas present, golden vectors, DLQ contract, dashboard link.

4. After merge, run the canary and prove replay-from-DLQ works before full ramp.


If you want, I can drop these templates into a ZIP with a starter README and folder structure next.

 
 
 

Recent Posts

See All

Comments


Post: Blog2_Post

Subscribe Form

Thanks for submitting!

©2020 by LearnTeachMaster DevOps. Proudly created with Wix.com

bottom of page