Tool Use / Function Calling
Extend LLM capabilities by giving it access to external tools and functions it can invoke to take actions or fetch real-time data.
Overview
Tool Use (also called Function Calling) lets an LLM go beyond text generation by invoking external tools — APIs, databases, calculators, code interpreters, or any callable function. The LLM decides which tool to use, what arguments to pass, and then incorporates the result into its response.
This pattern transforms LLMs from passive text generators into active agents that can interact with the world.
When to Use
- LLM needs real-time data (weather, stock prices, search results)
- Tasks require precise computation (math, date calculations)
- You want the LLM to take actions (send emails, create records)
- Multi-step workflows where the LLM needs to gather information incrementally
Architecture
sequenceDiagram
participant User
participant LLM
participant Router as Tool Router
participant T1 as Calculator
participant T2 as Search API
participant T3 as Database
User->>LLM: "What's the weather in NYC and convert 72°F to °C?"
LLM->>Router: call: weather_api(location="NYC")
Router->>T2: GET /weather?city=NYC
T2-->>Router: {"temp": "72°F", "condition": "sunny"}
Router-->>LLM: Result: 72°F, sunny
LLM->>Router: call: calculator(expr="(72-32)*5/9")
Router->>T1: calculate
T1-->>Router: 22.22
Router-->>LLM: Result: 22.22°C
LLM-->>User: "NYC is 72°F (22.2°C) and sunny!"
How It Works
- Define Tools: Describe available functions with names, descriptions, and parameter schemas
- LLM Decides: Given a user request, the LLM generates a structured tool call (JSON)
- Execute: Your code parses the tool call and executes the actual function
- Return Results: Feed the result back to the LLM for final response generation
- Iterate: The LLM may chain multiple tool calls before responding
Implementation
Gotchas & Best Practices
The LLM generates tool arguments as text. Always validate and sanitize before executing.
Never pass LLM output directly to eval(), SQL queries, or shell commands in production.
Poorly described tools lead to wrong tool selection. Write descriptions like API docs — be precise about what the tool does, what inputs it expects, and what it returns.
More tools ≠ better. With too many tools (>15-20), LLMs struggle to pick the right one. Group related tools, use routing, or implement a tool-selection layer.
Tools fail — APIs time out, inputs are invalid. Return clear error messages to the LLM so it can retry with different parameters or explain the failure to the user.
Adding example inputs/outputs to tool descriptions dramatically improves tool use accuracy.
Example: "Calculate math expression. Example: calculator('2 + 2') → '4'"
Variations
- Single-turn — One tool call per request
- Multi-turn — LLM chains multiple tool calls iteratively
- Parallel — Multiple independent tool calls executed simultaneously
- Nested — Tool results trigger further tool calls
- Human-in-the-loop — Require approval for sensitive tool calls
Further Reading
- OpenAI Function Calling Guide
- Anthropic Tool Use Docs
- Gorilla LLM — Fine-tuned for tool use