
Production Is a Contract, Not a Deployment
- Mark Kendall
- 2 minutes ago
- 4 min read
Production Is a Contract, Not a Deployment
How App Teams, DevOps, and Cloud Teams Get Ownership Right (and Why So Many Don’t)
There’s a moment every engineering team hits where something uncomfortable happens.
Everything is deployed.
Pods are running.
Kafka is flowing.
Data is persistent.
Dashboards are green.
And yet… it still doesn’t feel like production.
If you’ve ever had that feeling, this article is for you.
This is not a story about tools.
It’s a story about responsibility, boundaries, and why production systems fail long before they fail technically.
The Setup: When “It Runs” Isn’t Enough
Let’s start with a real scenario many teams will recognize:
Applications deployed to Kubernetes
Stateful services (like databases) running with persistence
Events flowing through Kafka
Logs and metrics visible
Test and PROD environments formally “stood up”
On paper, everything is there.
But then questions start surfacing:
Why are we still port-forwarding to test things?
Where is the front door to this system?
How are these services secured?
Who owns ingress, auth, and certificates?
Why does PROD feel fragile even though it’s “live”?
This is the moment where teams either level up — or quietly start accumulating operational debt.
The Core Insight: Production Is a Contract
Production is not a namespace.
Production is not an environment variable.
Production is not a Helm release.
Production is a contract between teams.
That contract answers four questions clearly:
How do clients enter the system?
How is access controlled?
Who can see and manage traffic?
Who is on the hook when something goes wrong at 2am?
If those answers are fuzzy, production is fuzzy — no matter how stable the pods look.
Where Teams Go Wrong (and Why It Keeps Happening)
Most failures here are not technical. They’re organizational.
Common failure patterns include:
App teams are asked to “just wire it up”
DevOps teams become accidental owners of architecture
Cloud teams avoid owning edge responsibility
Everyone assumes someone else is handling security and ingress
The result is predictable:
Auth logic duplicated across services
Inconsistent exposure patterns
No single place to observe traffic
Kafka or downstream systems absorbing uncontrolled load
Production incidents that nobody quite owns
None of this happens because people are careless.
It happens because ownership was never made explicit.
The Three-Team Reality (Whether We Admit It or Not)
Every production system has three distinct concerns, even if the org chart doesn’t show it.
1. Application Team — Business & Data Ownership
This is where business value lives.
They own:
Business logic
REST and event contracts
Kafka producers and consumers
Data models and persistence requirements
Health, readiness, and liveness signals
They do not own:
TLS termination
Ingress infrastructure
Authentication systems
Certificates and rotation
Edge rate limiting
Their responsibility ends at the service boundary.
Their deliverable is intent, not infrastructure.
2. Cloud / Platform Team — Production Safety Ownership
This is where production stability and security live.
They own:
Kubernetes ingress controllers
External exposure rules
Authentication and identity integration
TLS certificates and lifecycle
Network and security policies
Edge-level observability
They decide how services are safely exposed.
They are accountable when production is attacked, overloaded, or misused.
3. DevOps / Enablement Team — Flow & Execution Ownership
This is the layer that keeps everything moving.
They own:
CI/CD pipelines
Environment promotion (dev → test → prod)
Helm, Kustomize, or template systems
Secret injection and configuration wiring
Rollout and rollback strategies
They do not own:
Security architecture
Ingress strategy
Authentication models
DevOps enables delivery.
They do not define production boundaries.
When DevOps is forced to make architecture decisions, it’s usually because upstream ownership is unclear.
The API Gateway Question (and the Real Debate Behind It)
Many mature production systems use an API gateway — and for good reason.
A gateway provides:
A single entry point
TLS termination
Authentication and authorization
Rate limiting and traffic protection
Centralized observability
But here’s the critical distinction:
The requirement is not “an API gateway.”
The requirement is the capabilities a gateway provides.
When teams resist gateways, they’re usually resisting operational complexity, not the concept.
The Friendly Alternative: Ingress + External Authentication
When platform maturity is still evolving, a simpler and perfectly valid approach exists.
This approach uses:
A Kubernetes ingress controller as the front door
TLS termination at ingress
External authentication (OIDC / JWT validation)
Path-based routing to services
Centralized logging and metrics
This still delivers:
A controlled entry point
Security outside application code
Environment consistency
Clear ownership boundaries
This is not avoiding a gateway.
It is implementing gateway behavior in a lighter form.
And importantly: this remains a platform responsibility, not an application one.
Learn → Teach → Master (Applied to Production)
Learn
You recognize that running workloads ≠ production readiness.
Teach
You articulate requirements without prescribing tools.
You define ownership boundaries clearly.
Master
You reach mastery when:
Platform teams own the edge
App teams focus on business value
Environments behave consistently
Production incidents are calm, boring, and contained
Mastery doesn’t show up in YAML.
It shows up in operations.
Why Strong Teams Get This Right
Strong teams:
Treat production as a shared contract
Make ownership explicit
Separate intent from implementation
Invest in platform capabilities early
Weaker teams:
Blur responsibility lines
Avoid hard conversations
Push platform risk onto app teams
Discover gaps only during outages
The difference isn’t talent.
It’s clarity.
The Pivot Question Every Team Should Ask
Before calling something “production,” ask:
“If this system misbehaves tonight, who owns stopping it, securing it, and explaining it?”
If the answer isn’t obvious, the work isn’t done — and that’s okay.
That realization is progress.
Final Thought
Production systems fail less because of bad code and more because of unclear ownership.
You don’t need perfect tooling.
You need clear contracts.
When teams agree on who owns what — and why — the technology almost always follows.
That’s how you move from learning, to teaching, to mastering.

Comments