top of page
Search

Production Is a Contract, Not a Deployment

  • Writer: Mark Kendall
    Mark Kendall
  • 2 minutes ago
  • 4 min read


Production Is a Contract, Not a Deployment




How App Teams, DevOps, and Cloud Teams Get Ownership Right (and Why So Many Don’t)



There’s a moment every engineering team hits where something uncomfortable happens.


Everything is deployed.

Pods are running.

Kafka is flowing.

Data is persistent.

Dashboards are green.


And yet… it still doesn’t feel like production.


If you’ve ever had that feeling, this article is for you.


This is not a story about tools.

It’s a story about responsibility, boundaries, and why production systems fail long before they fail technically.





The Setup: When “It Runs” Isn’t Enough



Let’s start with a real scenario many teams will recognize:


  • Applications deployed to Kubernetes

  • Stateful services (like databases) running with persistence

  • Events flowing through Kafka

  • Logs and metrics visible

  • Test and PROD environments formally “stood up”



On paper, everything is there.


But then questions start surfacing:


  • Why are we still port-forwarding to test things?

  • Where is the front door to this system?

  • How are these services secured?

  • Who owns ingress, auth, and certificates?

  • Why does PROD feel fragile even though it’s “live”?



This is the moment where teams either level up — or quietly start accumulating operational debt.





The Core Insight: Production Is a Contract



Production is not a namespace.

Production is not an environment variable.

Production is not a Helm release.


Production is a contract between teams.


That contract answers four questions clearly:


  1. How do clients enter the system?

  2. How is access controlled?

  3. Who can see and manage traffic?

  4. Who is on the hook when something goes wrong at 2am?



If those answers are fuzzy, production is fuzzy — no matter how stable the pods look.





Where Teams Go Wrong (and Why It Keeps Happening)



Most failures here are not technical. They’re organizational.


Common failure patterns include:


  • App teams are asked to “just wire it up”

  • DevOps teams become accidental owners of architecture

  • Cloud teams avoid owning edge responsibility

  • Everyone assumes someone else is handling security and ingress



The result is predictable:


  • Auth logic duplicated across services

  • Inconsistent exposure patterns

  • No single place to observe traffic

  • Kafka or downstream systems absorbing uncontrolled load

  • Production incidents that nobody quite owns



None of this happens because people are careless.

It happens because ownership was never made explicit.





The Three-Team Reality (Whether We Admit It or Not)



Every production system has three distinct concerns, even if the org chart doesn’t show it.





1. Application Team — Business & Data Ownership



This is where business value lives.


They own:


  • Business logic

  • REST and event contracts

  • Kafka producers and consumers

  • Data models and persistence requirements

  • Health, readiness, and liveness signals



They do not own:


  • TLS termination

  • Ingress infrastructure

  • Authentication systems

  • Certificates and rotation

  • Edge rate limiting



Their responsibility ends at the service boundary.


Their deliverable is intent, not infrastructure.





2. Cloud / Platform Team — Production Safety Ownership



This is where production stability and security live.


They own:


  • Kubernetes ingress controllers

  • External exposure rules

  • Authentication and identity integration

  • TLS certificates and lifecycle

  • Network and security policies

  • Edge-level observability



They decide how services are safely exposed.


They are accountable when production is attacked, overloaded, or misused.





3. DevOps / Enablement Team — Flow & Execution Ownership



This is the layer that keeps everything moving.


They own:


  • CI/CD pipelines

  • Environment promotion (dev → test → prod)

  • Helm, Kustomize, or template systems

  • Secret injection and configuration wiring

  • Rollout and rollback strategies



They do not own:


  • Security architecture

  • Ingress strategy

  • Authentication models



DevOps enables delivery.

They do not define production boundaries.


When DevOps is forced to make architecture decisions, it’s usually because upstream ownership is unclear.





The API Gateway Question (and the Real Debate Behind It)



Many mature production systems use an API gateway — and for good reason.


A gateway provides:


  • A single entry point

  • TLS termination

  • Authentication and authorization

  • Rate limiting and traffic protection

  • Centralized observability



But here’s the critical distinction:


The requirement is not “an API gateway.”

The requirement is the capabilities a gateway provides.


When teams resist gateways, they’re usually resisting operational complexity, not the concept.





The Friendly Alternative: Ingress + External Authentication



When platform maturity is still evolving, a simpler and perfectly valid approach exists.


This approach uses:


  • A Kubernetes ingress controller as the front door

  • TLS termination at ingress

  • External authentication (OIDC / JWT validation)

  • Path-based routing to services

  • Centralized logging and metrics



This still delivers:


  • A controlled entry point

  • Security outside application code

  • Environment consistency

  • Clear ownership boundaries



This is not avoiding a gateway.

It is implementing gateway behavior in a lighter form.


And importantly: this remains a platform responsibility, not an application one.





Learn → Teach → Master (Applied to Production)



Learn

You recognize that running workloads ≠ production readiness.


Teach

You articulate requirements without prescribing tools.

You define ownership boundaries clearly.


Master

You reach mastery when:


  • Platform teams own the edge

  • App teams focus on business value

  • Environments behave consistently

  • Production incidents are calm, boring, and contained



Mastery doesn’t show up in YAML.

It shows up in operations.





Why Strong Teams Get This Right



Strong teams:


  • Treat production as a shared contract

  • Make ownership explicit

  • Separate intent from implementation

  • Invest in platform capabilities early



Weaker teams:


  • Blur responsibility lines

  • Avoid hard conversations

  • Push platform risk onto app teams

  • Discover gaps only during outages



The difference isn’t talent.

It’s clarity.





The Pivot Question Every Team Should Ask



Before calling something “production,” ask:


“If this system misbehaves tonight, who owns stopping it, securing it, and explaining it?”


If the answer isn’t obvious, the work isn’t done — and that’s okay.


That realization is progress.





Final Thought



Production systems fail less because of bad code and more because of unclear ownership.


You don’t need perfect tooling.

You need clear contracts.


When teams agree on who owns what — and why — the technology almost always follows.


That’s how you move from learning, to teaching, to mastering.





 
 
 

Recent Posts

See All
An Engineering Operating System for the Age of AI

Learn → Teach → Master An Engineering Operating System for the Age of AI We are living through another tooling revolution. AI can now generate code, tests, documentation—even architectural diagrams. B

 
 
 

Comments

Rated 0 out of 5 stars.
No ratings yet

Add a rating
Post: Blog2_Post

Subscribe Form

Thanks for submitting!

©2020 by LearnTeachMaster DevOps. Proudly created with Wix.com

bottom of page