Tool RAG
Dynamic tool selection using semantic search. Instead of loading all tools into context, retrieve only the relevant ones for each query.
The Problem
Context Bloat
Each tool definition consumes tokens:
Tool: get_weather
Description: Get current weather for a location...
Parameters:
- location (string): The city name...
- units (enum): celsius/fahrenheit...
≈ 50-100 tokens per toolWith 50 tools = 2,500-5,000 tokens just for tool definitions.
Performance Degradation
Research shows tool accuracy decreases as tool count increases:
| Tool Count | Success Rate |
|---|---|
| 5 tools | 95% |
| 20 tools | 85% |
| 50 tools | 70% |
| 100+ tools | <60% |
The Solution
Benefits
Research from Red Hat and AWS shows:
- 3x improvement in tool invocation accuracy
- ~50% reduction in prompt token usage
- Scales to thousands of tools without degradation
How It Works
1. Tool Registration
When tools are registered, their descriptions are embedded:
python
async def register_tool(self, tool: ToolDefinition):
# Generate embedding from description
embedding = await self.embed(
f"{tool.name}: {tool.description}"
)
# Store in vector index
await self.index.upsert(
id=tool.name,
vector=embedding,
metadata=tool.to_dict()
)2. Query Processing
When a query comes in, embed it and search:
python
async def search(self, query: str, top_k: int = 10):
# Embed the query
query_embedding = await self.embed(query)
# Search vector index
results = await self.index.search(
query_embedding,
top_k=top_k
)
# Return tool definitions
return [r.metadata for r in results]3. Integration with Armory
Tool RAG integrates at the gateway level:
python
class Armory:
async def handle_tools_list(self, request):
if self.tool_rag_enabled and request.context:
# Return only relevant tools
tools = await self.tool_rag.search(
request.context,
top_k=10
)
else:
# Return all tools
tools = await self.registry.list_all()
return toolsEmbedding Strategies
| Strategy | What to Embed | Pros/Cons |
|---|---|---|
| Description only | Tool description text | Simple, may miss nuance |
| Description + params | Description + parameter names | More context, better matching |
| Synthetic queries | Generated example queries | Best accuracy, more compute |
Embedding Models
| Model | Dimensions | Speed | Cost |
|---|---|---|---|
| text-embedding-3-small | 1536 | Fast | $0.02/1M tokens |
| text-embedding-3-large | 3072 | Medium | $0.13/1M tokens |
| all-MiniLM-L6-v2 (local) | 384 | Very Fast | Free |
| BGE-large (local) | 1024 | Medium | Free |
Configuration
yaml
# In armory.yaml
tool_rag:
enabled: true
embedding_model: "text-embedding-3-small"
default_top_k: 10
similarity_threshold: 0.5 # Filter low-relevance matches
cache_embeddings: true # Cache query embeddingsMetrics
python
class ToolRAGMetrics:
def recall_at_k(self, query, expected_tools, k):
"""What % of needed tools were retrieved?"""
retrieved = self.search(query, top_k=k)
return len(set(retrieved) & set(expected_tools)) / len(expected_tools)
def precision_at_k(self, query, expected_tools, k):
"""What % of retrieved tools were actually needed?"""
retrieved = self.search(query, top_k=k)
return len(set(retrieved) & set(expected_tools)) / len(retrieved)
def mrr(self, query, expected_tool):
"""Mean Reciprocal Rank - how early does the right tool appear?"""
retrieved = self.search(query, top_k=20)
for i, tool in enumerate(retrieved):
if tool == expected_tool:
return 1.0 / (i + 1)
return 0.0