Skip to content

Cutting Context by 60%: TOON Format + Tool RAG

January 2026

We measured the combined effect of two optimizations in Agentic Forge: TOON format for compact tool results, and Tool RAG for dynamic tool discovery. Together, they reduce context size by 60%.

The Experiment

We ran the same weather query through three configurations using google/gemini-3-flash-preview:

  1. Baseline — All 19 tools loaded, JSON responses
  2. TOON only — All 19 tools loaded, TOON responses
  3. TOON + RAG — Dynamic tool discovery, TOON responses

Query: "What is the weather like in Islamabad, Pakistan today?"

Results

Token Comparison Chart

ConfigurationContext SizeSavings
Baseline (all tools + JSON)12,142 chars
TOON only (all tools + TOON)12,074 chars0.6%
TOON + RAG (with overhead)4,799 chars60.5%

The results show that Tool RAG provides the majority of the savings, while TOON contributes a smaller but consistent reduction on tool results.

Note: The RAG figure includes the overhead from the extra round-trip (search_tools call and result in conversation history).

Breaking Down the Savings

Tool Definitions: Where RAG Shines

The biggest context consumers are tool definitions. Each tool's name, description, and parameter schema takes tokens—and most tools go unused in any given request.

ConfigurationTools LoadedSize
All tools1911,704 chars (~2,926 tokens)
RAG initial1 (search_tools)370 chars (~92 tokens)
RAG after discovery64,104 chars (~1,026 tokens)

With RAG, the initial context contains only search_tools. After semantic search, 5 weather-related tools are discovered and loaded. Total tool definitions drop by 65%.

RAG Round-Trip Overhead

Tool RAG requires two model calls instead of one: the first to discover tools, the second to use them. We accounted for this in our comparison by including the conversation history overhead from the first call:

ComponentSize
search_tools call101 chars
search_tools result wrapper224 chars
Total overhead325 chars (~81 tokens)

The 60% savings figure includes this overhead—it's not a best-case number that ignores the extra round-trip.

Tool Results: Where TOON Helps

TOON format provides modest but consistent savings on structured data:

FormatSizeSavings
JSON438 chars
TOON370 chars15.5%

For a simple weather response, that's 68 characters saved. The savings here are modest because the weather data is small. With larger responses—database query results, API responses with many records, or nested configuration objects—the ~16% savings compounds significantly. A 10KB JSON response would save ~1.6KB per call.

JSON:

json
{"location": "Islamabad, Pakistan", "coordinates": [33.72148, 73.04329], "temperature": 7.6, ...}

TOON:

location: "Islamabad, Pakistan"
coordinates[2]: 33.72148,73.04329
temperature: 7.6
...

Why This Matters

1. Lower API Costs

LLM API pricing varies widely—from $0.15/M tokens for flash models to $15+/M for frontier reasoning models—but the math is simple: fewer tokens means lower bills. A 60% reduction compounds across every request.

2. Longer Conversations

Context windows are finite. By using ~4,800 characters for tools instead of 12,000, you have over 7,000 more characters for conversation history before hitting limits.

3. Better Tool Selection

Research shows LLMs perform worse when presented with many tools. Tool RAG surfaces only relevant tools, improving selection accuracy. The ToolRAG paper demonstrated 3x improvement in tool accuracy.

4. Scales with Tool Count

These savings grow with your tool library:

Tools AvailableAll LoadedRAG (avg 5 discovered)Savings
10~6,000 chars~2,500 chars58%
20~12,000 chars~2,700 chars78%
50~30,000 chars~3,000 chars90%

Implementation

Both optimizations are available in Forge Armory:

TOON Format:

  • Send Accept: text/toon header with MCP requests
  • Tool results return in TOON notation instead of JSON

Tool RAG:

  • Use /mcp?mode=rag endpoint
  • Receive only search_tools meta-tool initially
  • Search discovers relevant tools for your task

Combine them for maximum efficiency:

GET /mcp?mode=rag
Accept: text/toon

Trade-offs

TOON:

  • Requires client-side parsing (though most LLMs understand it natively)
  • Best for flat/tabular data; deeply nested structures see less benefit

Tool RAG:

  • Adds one round-trip for tool discovery (handled by auto-continue)
  • Small latency increase (~200ms for semantic search)
  • Less beneficial for small tool sets where the RAG overhead (~325 chars) may exceed savings

Conclusion

For agents with more than a handful of tools, Tool RAG provides substantial context savings with minimal overhead. TOON format adds incremental savings on tool results. Together—accounting for the RAG round-trip overhead—they reduced our test context by 60%, from 12,142 to 4,799 characters.

The optimizations are independent and can be adopted separately based on your needs.

Source Code

Previous Posts


This is part of a series on building Agentic Forge.

Building efficient AI agents