HubAI
Back to Engineering
Structured Reasoning

Teaching AI Agents to Think Before They Act

We discovered that agents rush to action without reasoning over tool results. In long chains of tool calls, the model acts on stale or incomplete information. A zero-side-effect 'think' tool solved the problem.

Structured Reasoning Think Tool Planning

The Problem We Found

We were debugging a subtle issue in HubAI’s pipeline. The Database Architect would analyze entity relationships, produce a schema, and hand it off to the Frontend Architect. Occasionally — maybe 15% of the time — the Frontend Architect would generate components that did not match the schema.

The error was not in the schema. It was not in the Frontend Architect’s prompt. It was in the transition between phases.

When the Frontend Architect received the database schema, it immediately started generating components. No pause. No verification that the schema was complete. No check that all relationships were accounted for. It went straight from receiving input to producing output.

In the 85% of cases where the schema was straightforward, this worked fine. In the 15% where the schema had complex relationships — embedded documents referencing other entities, virtual fields computed from multiple sources — the agent missed details because it never stopped to reason about them.

We saw the same pattern in other places:

  • The QA Agent would receive a validation result and immediately proceed without checking whether all validation criteria were covered
  • The Orchestrator would spawn workers without verifying that the execution plan accounted for all dependencies
  • Agents in long tool-call chains would act on results from three calls ago without re-evaluating whether those results were still relevant given new information

The pattern was consistent: agents rush to action. Given a tool result, the default behavior is to act on it immediately rather than reason about it first.

Why This Happens

LLMs are trained to be helpful. When they receive information, their training biases them toward producing a response — not toward pausing to think. In a chat context, this is the right behavior. In an agentic context with multi-step tool use, it is dangerous.

The problem is most acute in long chains of tool calls where each step depends on previous results. The agent’s “working memory” is its context window, and by step 7 or 8, earlier tool results have been pushed far enough back that the model’s attention to them has diminished. Without a deliberate pause to consolidate understanding, the agent operates on an increasingly degraded picture of its own progress.

Extended thinking — as popularized by reasoning models like OpenAI’s o-series and DeepSeek-R1 — helps with initial planning — before the first tool call, the model can reason about what to do. But extended thinking happens before tool calls, not between them. After receiving tool results, the model jumps straight to the next action.

The Solution: A Think Tool

We added a tool with zero side effects to HubAI’s agent toolkit. It has one parameter:

const thinkTool = {
  name: 'think',
  description: 'A reasoning tool with no side effects. Use this to pause ' +
    'and reason about tool results before taking action. Does not modify ' +
    'any state. Use when: you need to verify information is complete, ' +
    'check compliance with rules, analyze tool output before proceeding, ' +
    'or plan the next sequence of actions.',
  inputSchema: z.object({
    thought: z.string().describe('Your reasoning about the current situation'),
  }),
  execute: async (input: { thought: string }) => {
    return { result: 'Reasoning noted. Proceed with your next action.' };
  },
};

The handler does nothing. It logs the thought to the trace (useful for debugging) and returns a neutral acknowledgment. The value is not in the tool’s output — it is in forcing the model to articulate its reasoning between tool calls.

When Agents Think

We guide agents to use the think tool at specific points:

Before acting on complex tool results:

“I received the database schema. Let me verify: all 5 entities are present, relationships between User and Order are defined as one-to-many, the embedded Address document has all required fields. The virtual orderCount field on User correctly references the Order collection. Everything checks out.”

Before making decisions with compliance implications:

“The schema includes a role field with RBAC values. I need to verify that the auth middleware configuration matches the role definitions. Checking: admin, editor, viewer are defined in both the schema and the auth config. The superadmin role exists in auth but not in the schema — this needs to be flagged.”

When reasoning across multiple tool results:

“I have results from three tool calls: the entity definition, the relationship map, and the validation rules. Let me consolidate: the Product entity has a category field that references Category — this needs a foreign key resolver in the GraphQL schema. The validation rules require category to be non-null, which means the resolver must handle the case where the referenced Category is deleted.”

The PlanningModule

The think tool handles micro-level reasoning between tool calls. For macro-level reasoning, HubAI uses the PlanningModule.

Before any execution begins, the Orchestrator creates a structured execution plan:

Plan: Generate fullstack scaffold for e-commerce project
├── Phase 1: Requirements Analysis (Requirement Agent)
│   └── Output: Structured requirements doc
├── Phase 2: Database Design (Database Architect)
│   └── Dependency: Phase 1 output
│   └── Output: Validated schema
├── Phase 3: Frontend Architecture (Frontend Architect)
│   └── Dependency: Phase 2 output
│   └── Output: Component tree + layouts
├── Phase 4: QA Validation (QA Agent)
│   └── Dependencies: Phase 2 + Phase 3 outputs
│   └── Output: Validation report
└── Phase 5: Code Generation (Code Generator)
    └── Dependency: Phase 4 output (validated artifacts)
    └── Output: Production code files

The plan is inspectable — you can see exactly what will happen before it happens. It is cancellable — any phase can be stopped without corrupting the pipeline. And it is resumable — if a phase fails, the pipeline can restart from that phase with corrections.

This separation — planning before execution — is how production systems work. The PlanningModule creates the plan. The ExecutionEngine executes it. The QA Agent validates it. Three distinct phases, three distinct concerns.

The Results

After adding the think tool and structured planning:

  • Transition errors (agent misinterpreting output from a previous phase) dropped from 15% to under 3%
  • Complex relationship handling improved significantly — agents consistently caught edge cases they previously missed
  • Pipeline reliability for projects with 10+ entities went from ~80% clean output to ~95%

The cost was modest: think tool usage adds roughly 200-400 output tokens per phase (the agent’s reasoning text). Against the thousands of tokens saved by avoiding error recovery, this is a clear win.

The Key Insight

The most expensive mistake an AI agent can make is acting on incomplete information. A wrong action pollutes the context, triggers error recovery, and compounds through subsequent phases.

The think tool is cheap insurance. It costs a few hundred tokens of output. It prevents cascading failures that cost thousands of tokens to recover from. And it produces a reasoning trace that makes debugging possible when things do go wrong.

Separate planning from execution. Think before acting. Verify before proceeding. These are not limitations on AI agents — they are the disciplines that make AI agents reliable enough for production use.