HubAI
Back to Engineering
Long-Horizon Resilience

Making AI Agents That Never Lose Their Place

Context resets are the silent killer of AI agent productivity. A long session fills the context window, the agent restarts, and all progress is lost. We built a two-agent pattern with structured progress tracking that makes every session resumable.

Long-Running Agents Session Management Progress Tracking

The Problem We Found

We hit this problem during a large-scale generation test. A project with 30+ entities, complex relationships, and both backend and frontend scaffolding. The pipeline ran for several minutes, consuming tokens across multiple agents — and then the context window filled up.

The generation restarted from scratch. All progress was lost.

This was not a crash. It was a design limitation. The agents had no memory outside their context windows. When a context window filled, the agent had two options: degrade quality by operating with a bloated context, or reset and lose everything.

Both options are unacceptable for enterprise use. A generation pipeline that cannot handle large projects is not a generation pipeline — it is a demo. As enterprises moved from agent pilots to production deployments through 2025, this failure mode became industry-wide — not just our problem.

We studied the failure patterns:

  • One-shotting — Attempting to generate an entire large project in a single context window. Quality degrades progressively as the window fills.
  • Lost decisions — Architectural decisions made in phase 2 are forgotten by phase 7. The agent makes contradictory decisions because it cannot remember its own reasoning.
  • Half-implemented features — When a session ends mid-generation, the output is incomplete — some entities have resolvers but no schemas, some components reference services that were not generated.
  • No resumability — Every new session starts from zero. There is no way to say “continue from where you left off.”

How We Solved It

The Two-Agent Pattern

HubAI uses a two-agent pattern for long-horizon tasks:

Initializer Agent (first run):

  • Sets up the environment — project structure, configuration files, dependency manifests
  • Creates the feature list — a structured JSON file tracking every generation target with its status
  • Creates the progress file — a structured record of completed work, pending decisions, and known issues
  • Makes an initial git commit — the clean starting point

Coding Agent (every subsequent run):

  • Reads the progress file and git log to understand recent work
  • Picks the highest-priority incomplete feature from the feature list
  • Makes incremental progress — one feature at a time
  • Updates the progress file and makes a descriptive git commit
  • Ends with clean state — code that could be merged to main

The key discipline: never attempt to one-shot complex tasks. Each session works on one feature. Each session ends with working, committed code. Each session leaves enough context for the next session to resume without confusion.

Structured Feature Tracking

The feature list is structured JSON — not Markdown, not prose, not comments in code. JSON because it is machine-readable, unambiguous, and diff-friendly:

{
  "features": [
    {
      "id": "user-entity",
      "name": "User entity with auth fields",
      "status": "completed",
      "files": ["src/models/User.ts", "src/resolvers/UserResolver.ts"],
      "completedAt": "2025-11-15T10:30:00Z"
    },
    {
      "id": "order-entity",
      "name": "Order entity with User relationship",
      "status": "in_progress",
      "dependencies": ["user-entity"],
      "notes": "One-to-many relationship with User. Needs virtual field for orderTotal."
    },
    {
      "id": "product-entity",
      "name": "Product entity with Category embedding",
      "status": "pending",
      "dependencies": ["order-entity"]
    }
  ]
}

Rules for the feature list:

  • Never remove or edit feature definitions — only update the status field. Removing features leads to missing or buggy functionality.
  • Never mark a feature as complete without verification — run the actual generated code, check the actual output, verify the actual relationships.
  • Dependencies are explicit — the agent knows which features must be complete before starting the next one.

Progress Files as Cross-Session Memory

The progress file serves as the agent’s memory across sessions. It contains:

  • What was completed — Which features, which files, what architectural decisions were made
  • What is in progress — The current feature, what has been done so far, what remains
  • Known issues — Bugs found, edge cases identified, decisions that need revisiting
  • Architectural decisions — Why specific patterns were chosen, trade-offs considered, constraints discovered

When a new session starts, the agent reads this file first. It does not need to re-analyze the entire project. It does not need to re-discover decisions. It picks up exactly where the last session left off.

Clean State Discipline

Every session must end with clean state — code that could be merged to main. This means:

  • No half-implemented features in the codebase
  • No broken imports or unresolved references
  • No generated files that reference entities not yet generated
  • A descriptive git commit that explains what was accomplished

If a session cannot complete a feature cleanly, the agent reverts to the last clean state. Partial progress that breaks the build is worse than no progress at all.

The Getting Up to Speed Protocol

At the start of each new session, the agent follows a strict protocol:

  1. Orient — Read the progress file and git log to understand what happened in previous sessions
  2. Select — Read the feature list and pick the highest-priority incomplete feature
  3. Verify — Run the existing code (dev server, tests, smoke check) to confirm previous work is intact
  4. Fix first — If anything is broken, fix it before starting new work
  5. Then build — Only after existing code is verified, start on the new feature

Step 3 is critical. It catches regressions that might have been introduced by external changes, dependency updates, or environment differences. Skipping verification means building on a potentially broken foundation.

End-to-End Verification

Features are only marked as complete after careful testing. Not “the code looks right” — actually running it.

For web applications, this means browser-level verification. Does the generated API endpoint respond correctly? Does the frontend component render with the right data? Does the form validation match the schema constraints?

HubAI can integrate with testing tools to automate this verification. But the principle is the same regardless of tooling: only mark a feature as passing after end-to-end verification. Trust the test, not the glance.

What This Means at Scale

A 30-entity project that would overflow a single context window becomes a sequence of 30 manageable sessions. Each session handles one entity with full context, full attention, and full quality. The feature list tracks progress. The progress file maintains continuity. Git provides the safety net.

Approach30-Entity ProjectQuality Profile
One-shot (single context)Context overflow at entity ~12Degrades progressively
Multi-session (HubAI)30 clean sessions, one per entityConstant quality

The total token consumption is similar — you still need to reason about 30 entities. But the distribution is different. Instead of 30 entities competing for attention in one bloated context, each entity gets a clean context with full attention. Quality stays constant because context quality stays constant.

The Key Insight

Long-horizon AI tasks fail not because of model limitations, but because of architectural limitations. A model with a 200K context window is not 200K tokens of useful capacity — it is roughly 8-16K tokens of peak performance with rapidly diminishing returns.

The solution is not bigger context windows. It is structured progress tracking that lets the agent maintain continuity across sessions, each operating at peak context quality.

Work on one feature at a time. End every session with code that could be merged to main. Use structured progress files and git as cross-session memory. Never mark a feature done without end-to-end verification. These are not constraints — they are the disciplines that make long-horizon AI agents reliable.