HubAI
Back to Engineering
Context Engineering

How We Engineered Context Windows That Don't Degrade

LLM context windows are finite, expensive, and degrade under load. We discovered that chat-based tools waste 50K-200K tokens per scaffold — most of it noise. Here's how HubAI treats context like memory in a high-performance system.

Context Engineering LLM Optimization Token Efficiency

The Problem We Found

Early in development, we profiled how chat-based code generation tools consume tokens. The numbers were staggering.

A typical CRUD backend scaffold — one entity with basic operations — consumed 50,000 to 200,000+ tokens through a chat-based tool. The conversation would start strong: the first few files generated cleanly. But by file eight or nine, quality visibly degraded. Variable names became inconsistent. Patterns that were followed earlier got ignored. The model started “forgetting” decisions it made minutes ago.

We traced the root cause to a fundamental architectural flaw: every chat-based tool treats context like a chat history. The entire conversation — every prompt, every response, every correction — accumulates in a single context window. As that window fills, the model’s attention fragments across an ever-growing pile of tokens, most of which are irrelevant to the current task.

This is not a model limitation. It is an engineering failure.

Why Chat-Based Tools Get This Wrong

Chat-based tools inherit their architecture from chatbots. A chatbot maintains a conversation history because context matters — you need to remember what the user said three messages ago. But code generation is not a conversation. It is a pipeline.

When you ask a chat tool to generate a database schema, it does not need to “remember” the frontend component it generated earlier. It needs the entity definitions, the field types, and the relationship constraints. That is roughly 2,000 tokens of signal buried in 50,000 tokens of noise.

The consequences compound:

  • Attention fragmentation — The model spreads its limited attention budget across the entire history. Signal-to-noise ratio drops with every message.
  • Context bleed — Database design decisions leak into frontend generation. Backend error handling patterns contaminate API route generation. Domains mix because there is no boundary.
  • Quality degradation — By the time you reach the 15th file in a scaffold, the model is operating with a context window that is 80% irrelevant noise. Output quality drops measurably.
  • Cost explosion — Every token in the context window costs money. Paying for 200K tokens when you need 8K is a 25x waste.

How We Solved It

We built HubAI’s context management around a single principle: treat context like memory in a high-performance system, not like a chat history.

Capped Token Budgets

Every agent in HubAI operates within a strict token budget of approximately 8,000 tokens. No agent is ever given more context than it can productively use.

This is enforced architecturally, not by convention. The ContextManager tracks token consumption per agent and refuses to load additional context beyond the budget. If an agent needs more information, it must request it through a tool — which means the request is scoped and intentional.

Agent Token Budgets:
┌─────────────────────┬──────────┐
│ Requirement Agent   │ ~8K max  │
│ Database Architect  │ ~8K max  │
│ Frontend Architect  │ ~8K max  │
│ QA Agent            │ ~8K max  │
│ Code Generator      │ 0 tokens │ ← deterministic, no LLM
└─────────────────────┴──────────┘
Total: ~32K tokens (architecture)
       + 0 tokens (code generation)

Compare this to a chat-based tool consuming 200K tokens for the same output. That is not an optimization — it is a fundamentally different architecture.

Just-in-Time Loading

Agents in HubAI do not receive pre-loaded context dumps. Instead, they discover and load data on demand through tools.

When the Database Architect needs to understand the entity relationships in a project, it does not receive the entire project specification. It calls a tool to retrieve the specific entities it needs to reason about. When the Frontend Architect needs the database schema, it receives a condensed summary of the relevant tables and fields — not the full design conversation.

This follows the principle of progressive disclosure: let the agent discover and load only what it needs for the current step. A lightweight reference (a file path, a stored query, an entity ID) is always cheaper than the full payload.

Isolated Agent Contexts

Each agent in HubAI has its own context window. There is no shared context between agents.

When the Requirement Agent finishes its analysis, it does not pass its full context to the Database Architect. It passes a condensed summary — roughly 1-2K tokens of structured output that captures the architectural decisions, entity definitions, and constraints. The Database Architect receives this summary in a clean context window, free from the noise of requirement analysis.

The MessageBus coordinates this inter-agent communication. Agents exchange structured messages, not raw context. This eliminates the “game of telephone” problem where each handoff degrades information quality.

Semantic Compaction

For long-horizon tasks — projects with dozens of entities and complex relationships — even isolated contexts can accumulate state. HubAI uses SemanticMemory to handle this:

  • Summarize and compress completed phases while preserving architectural decisions and unresolved issues
  • Clear old tool call results that are no longer relevant to the current phase
  • Persist critical decisions in the KnowledgeBase — a structured store outside the context window that agents can query on demand

This means an agent working on the 50th entity in a large project operates with the same context quality as an agent working on the first.

The Key Insight

Context is not a chat history. It is a scarce resource with diminishing returns. When Andrej Karpathy coined the term “context engineering” earlier this year, it gave a name to what we had been practicing for months — the discipline of treating every token as a scarce resource. Every token you add to an agent’s context window consumes attention budget and degrades the signal-to-noise ratio.

The winning strategy is counterintuitive: give agents less context, not more. A small, high-signal set of tokens outperforms a large, noisy context window every time. The smallest set that maximizes the chance of the desired outcome is always the right answer.

What This Means for Enterprise Teams

MetricChat-Based ToolsHubAI
Tokens per scaffold50K – 200K+~8K (architecture) + 0 (generation)
Quality at scaleDegrades with context sizeConstant — isolated contexts
Cost per generationHigh — paying for noiseLow — paying only for signal
Consistency across filesDeclines after ~10 filesIdentical across any number

The context engineering principle is not just about efficiency. It is about reliability at scale. When your context management is engineered — budgeted, scoped, isolated, and compacted — the 100th file in a generation run is as clean as the first.

Same AI model. Engineered context. Predictable results at any scale.