top of page
Search

Simpson’s Paradox and LLM Token Efficiency

  • Writer: Mark Kendall
    Mark Kendall
  • 11 minutes ago
  • 3 min read

Simpson’s Paradox and LLM Token Efficiency




Why Aggregated Context Hurts Accuracy — and How to Fix It in Microservices





Most engineers have heard of Simpson’s Paradox.


Fewer engineers realize it’s happening every day inside their LLM calls.


And even fewer realize it’s quietly costing them money.


This article isn’t about hype.

It’s about architecture discipline.





What Is Simpson’s Paradox?



Simpson’s Paradox is a statistical phenomenon where:


A trend appears within separate groups of data — but reverses or disappears when those groups are combined.


In other words:


Aggregation changes meaning.


When you merge different contexts, signals distort.

The “whole” can tell a different story than the “parts.”


That lesson applies directly to LLM usage.





The LLM Version of Simpson’s Paradox



When calling a large language model, we often:


  • Send entire chat histories

  • Include logs, failed attempts, brainstorming

  • Mix planning, building, and reviewing into one thread

  • Let context grow indefinitely



Over time, the model starts reasoning over:


  • Summarized history

  • Mixed intents

  • Conflicting constraints

  • Irrelevant tokens



And output quality drops.


Not because the model is bad.


Because we aggregated semantic contexts that shouldn’t have been merged.


Just like Simpson’s Paradox.





Why This Matters Technically



LLMs charge by token in API scenarios:


  • Input tokens

  • Output tokens



If you send 15,000 tokens of prior context with every call, you pay for that every time.


But cost is only half the issue.


The deeper issue is semantic dilution:


  • The model cannot distinguish critical constraints from conversational noise.

  • It may overweight irrelevant information.

  • It may generalize across contexts that should be separated.



This leads to subtle quality degradation — often mistaken for “the model getting worse.”


In reality, it’s context collapse.





The Solution: Scoped LLM Calls



Instead of treating an LLM like a memory container, treat it like a compute node.


Break workflows into structured phases:


  1. Planning

  2. Implementation

  3. Review

  4. Testing



Each phase receives only the necessary information.


Nothing more.


This:


  • Reduces token usage

  • Improves determinism

  • Prevents semantic mixing

  • Lowers cost at scale



It is separation of concerns applied to AI.





A Practical Python Microservice Example



Below is a simplified FastAPI microservice that demonstrates scoped LLM calls.


Each stage sends only what it needs — no accumulated history.

from fastapi import FastAPI

from openai import OpenAI


app = FastAPI()


# Initialize client (replace with your provider if needed)

client = OpenAI(api_key="YOUR_API_KEY")


PROMPTS = {

    "plan": """You are a planning agent.

Goal:

{goal}


Provide a structured plan with clear steps.

""",


    "implement": """You are an implementation agent.

Here is the approved plan:


{plan}


Produce the implementation based strictly on this plan.

""",


    "review": """You are a reviewer.

Here is the produced implementation:


{implementation}


Evaluate correctness, risks, and improvements.

"""

}



def llm_call(prompt: str, model="gpt-4o-mini", max_tokens=800):

    """

    Stateless LLM call.

    Only the prompt for this specific phase is sent.

    """

    response = client.chat.completions.create(

        model=model,

        messages=[

            {"role": "system", "content": "Be precise and concise."},

            {"role": "user", "content": prompt}

        ],

        temperature=0.2,

        max_tokens=max_tokens,

    )

    return response.choices[0].message.content



@app.post("/execute-task")

def execute_task(goal: str):

    # Phase 1: Planning

    plan_prompt = PROMPTS["plan"].format(goal=goal)

    plan_output = llm_call(plan_prompt)


    # Phase 2: Implementation

    implement_prompt = PROMPTS["implement"].format(plan=plan_output)

    implementation_output = llm_call(implement_prompt)


    # Phase 3: Review

    review_prompt = PROMPTS["review"].format(implementation=implementation_output)

    review_output = llm_call(review_prompt)


    return {

        "goal": goal,

        "plan": plan_output,

        "implementation": implementation_output,

        "review": review_output

    }





Why This Saves Tokens



Each call:


  • Sends only a single phase’s input.

  • Does not resend earlier conversational clutter.

  • Avoids accumulating irrelevant tokens.



If your planning phase is 400 tokens, and your implementation phase is 1,000 tokens, you’re not re-sending the entire history with every call.


At scale, that becomes real cost savings.


More importantly, it improves clarity.





Where This Matters Most



This approach becomes critical when:


  • Running agents inside CI pipelines

  • Automating refactoring

  • Orchestrating multiple AI microservices

  • Operating at enterprise scale

  • Calling models thousands of times per day



It is less important for casual chat use.


But it is essential for production systems.





The Architectural Lesson



Simpson’s Paradox teaches:


Aggregation can distort truth.


In LLM systems:


Aggregated context can distort reasoning.


The fix is not a bigger context window.


The fix is disciplined context boundaries.





Final Thought



The next generation of AI architecture will not be defined by:


  • Larger prompts

  • Longer chat threads

  • “Smarter” magic sessions



It will be defined by:


  • Scoped reasoning

  • Separation of concerns

  • Token discipline

  • Deterministic orchestration



Just like good engineering always has been.




If nothing else, remember this:


Don’t let aggregated context fool your model.


Keep it scoped.

 
 
 

Recent Posts

See All
Things to Consider in the AI Age

Things to Consider in the AI Age Everyone is talking about AI adoption. Budgets are being approved. Pilots are being launched. Tools are being purchased. After many years in IT — and many years as a s

 
 
 

Comments

Rated 0 out of 5 stars.
No ratings yet

Add a rating
Post: Blog2_Post

Subscribe Form

Thanks for submitting!

©2020 by LearnTeachMaster DevOps. Proudly created with Wix.com

bottom of page