HubAI
Back to Engineering
Tool Architecture

Designing Tools That AI Agents Actually Use Correctly

Poorly designed tools are the #1 source of agent failure. We discovered that if a human can't choose which tool to use from the description alone, an agent can't either. Here's how we engineer tools as contracts between deterministic systems and non-deterministic agents.

Tool Design MCP Zod Agent UX

The Problem We Found

When we first built HubAI’s tool system, we followed the common pattern: wrap every useful function as a tool, give it a name and description, and let the agent figure out which one to use.

The agent did not figure it out.

We had tools like getEntityData, fetchEntitySchema, loadEntityConfig, and readEntityDefinition. To a human reading the names, they sounded similar but different. To the agent, they were interchangeable — and it would call the wrong one 30% of the time, then waste tokens recovering from the error.

We had tools that returned raw database dumps — 5,000+ tokens of JSON when the agent needed three fields. The context window filled with irrelevant data, and subsequent tool calls degraded because the agent was reasoning over noise.

We had error messages like "Error: invalid parameter". The agent would retry with the same invalid parameter, get the same error, and loop until its token budget was exhausted.

Every one of these failures traced back to the same root cause: we designed tools for humans, not for agents. The Model Context Protocol (MCP) gave the industry a shared standard for tool interfaces, but standardization alone does not prevent bad tool design. Tools are not functions that agents call. They are contracts between deterministic systems and non-deterministic agents. Designing them requires a fundamentally different mindset.

The Principles We Learned

Minimal Tool Sets

Our first tool registry had 23 tools. After profiling agent behavior, we cut it to 9.

The problem with large tool sets is not that agents cannot parse them — it is that selection quality degrades with options. When an agent has 23 tools to choose from, it spends tokens reasoning about which tool to use instead of doing work. Overlapping tools create ambiguity. Ambiguity causes wrong selections. Wrong selections waste tokens on error recovery.

The rule is simple: if a human cannot choose which tool to use from the description alone, an agent cannot either.

We consolidated aggressively:

  • listEntities + getEntityDetails + loadEntityRelationships became getEntityContext — one tool that returns exactly what the agent needs for entity reasoning
  • readProjectConfig + getProjectSettings + loadProjectMetadata became getProjectContext — one tool with a scope parameter
  • validateSchema + checkSchemaConsistency + verifySchemaRelationships became validateSchema — one tool with a depth parameter (quick, standard, thorough)

Each consolidated tool subdivides tasks like a human would and reduces context from intermediate outputs.

Self-Contained Tools with Typed Inputs

Every tool in HubAI is built on BaseMCPTool, which enforces:

abstract class BaseMCPTool {
  abstract name: string;
  abstract description: string;
  abstract inputSchema: ZodSchema;

  async execute(input: unknown): Promise<ToolResult> {
    const parsed = this.inputSchema.safeParse(input);
    if (!parsed.success) {
      return this.formatError(parsed.error);
    }
    return this.safeExecute(parsed.data);
  }

  abstract safeExecute(input: ValidatedInput): Promise<ToolResult>;
}

Zod schemas enforce typed inputs at runtime. If the agent passes entityName: 123 where a string is expected, the tool rejects it immediately with a clear message — not a cryptic runtime error three function calls deep.

The description is written as if for a new hire joining the team: all implicit context is explicit, query formats are documented, terminology is defined, and relationships between parameters are spelled out.

Token-Efficient Results

Every tool response in HubAI is designed to fit within an agent’s token budget. This means:

  • Filtered data — Tools return only fields relevant to the agent’s current task. An entity tool for the Database Architect returns fields, types, and indexes. The same entity tool for the Frontend Architect returns labels, display formats, and form rules.
  • Aggregated summaries — Instead of returning 50 raw records, tools return summaries with counts, key patterns, and notable exceptions.
  • Capped responses — Tool responses have maximum token limits. If a response exceeds the cap, it is truncated with a steering message: “Results truncated. Use more specific filters to narrow results.”
  • No opaque IDs — Internal UUIDs are resolved to semantic labels. The agent sees entityName: "User" not entityId: "a3f8c2e1-...". This eliminates retrieval hallucinations where the agent confabulates IDs it half-remembers from context.

Actionable Error Messages

When a tool fails, the error message must help the agent self-correct in one attempt. We replaced generic errors with structured guidance:

❌ Before:
"Error: invalid parameter"

✅ After:
"Parameter 'entityType' must be one of: 'collection', 'embedded',
'virtual'. Received: 'table'. If you meant a MongoDB collection,
use 'collection'."

Every error includes:

  • What went wrong (specific parameter, specific value)
  • What was expected (valid options, correct format)
  • A hint for recovery (the most likely intended input)

This reduced error-recovery token waste by over 70% in our evaluations.

Progressive Disclosure

Tools in HubAI are discoverable, not pre-loaded. The agent does not receive all 9 tool definitions in its initial context. Instead:

  • Phase-specific tools are loaded when the agent enters that phase
  • The ToolRegistry provides a discovery interface — agents can query what tools are available for their current task
  • Tool definitions are loaded on demand, keeping the agent’s context clean

This follows the same just-in-time principle we use for context management: load what you need, when you need it, and nothing more.

What This Means in Practice

MetricBefore (23 tools)After (9 tools)
Tool selection accuracy~70%~96%
Error recovery tokens~2,000 per error~300 per error
Context consumed by tool results~40% of budget~12% of budget
Agent task completion rate~78%~95%

The improvement was not from better models or more sophisticated prompting. It was entirely from better tool design.

The Key Insight

Tools are the interface between your deterministic system and your non-deterministic agent. Every dollar spent on tool ergonomics pays back tenfold in agent performance.

Design tools like you are designing an API for a brilliant but literal colleague who has never seen your codebase. Make implicit context explicit. Return only what matters. Fail with helpful guidance. And keep the tool set small enough that choosing the right tool is obvious.

If a human cannot choose which tool to use from the description alone, an agent cannot either. Tool design deserves the same rigor, the same testing, and the same iteration as the best APIs you have ever shipped.