Function Calling Best Practices: LLMs That Actually Use APIs Correctly

Function calling (also known as tool calling) provides a powerful mechanism for large language models to interface with external systems, databases, and APIs. When implemented correctly, it extends LLM capabilities beyond their training data, enabling dynamic data retrieval, action execution, and complex workflow automation. However, production implementations frequently encounter reliability issues ranging from hallucinated parameters to schema violations that can crash applications or trigger unintended operations.

What is Function Calling?

Function calling is a multi-step interaction pattern between an LLM and external systems where the model generates structured API calls based on user prompts. According to OpenAI’s documentation, the pattern involves five high-level steps: providing tool definitions to the model, receiving tool call requests, executing the function logic on the application side, returning results to the model, and receiving the final response¹.

A function is defined by its JSON schema, which specifies the function name, description, and parameters. When the model determines that a function should be called, it responds with a JSON object containing the arguments for the function rather than executing the function itself². This architectural separation ensures that the application remains in control of actual execution, providing a critical security boundary.

Anthropic describes two categories of tools: client tools that execute on your systems (requiring your implementation), and server tools that execute on Anthropic’s servers (like web search)³. This distinction matters for security planning—client tools can access internal systems but require careful input validation, while server tools operate within Anthropic’s sandboxed environment.

How Does Function Calling Work?

The technical implementation follows a conversational pattern that maintains state across multiple API calls. When a user sends a prompt that might require external data, the application sends both the user message and available tool definitions to the LLM.

The model then assesses whether any tools can help with the query. If so, it constructs a properly formatted tool use request with a stop_reason of tool_use³. The application extracts the tool name and input, executes the actual function code, and returns results in a new user message containing a tool_result content block.

Consider this practical example from OpenAI’s cookbook: a weather function might take location and format parameters. When a user asks “What’s the weather in Paris?”, the model generates arguments like {"location": "Paris", "format": "celsius"} rather than making up a temperature value⁴. The application executes the actual API call, receives real data, and returns it to the model for synthesis into a natural language response.

Why Does Function Calling Matter?

The significance of function calling extends beyond simple data retrieval. According to Anthropic’s research with dozens of teams building LLM agents across industries, the most successful implementations use simple, composable patterns rather than complex frameworks².

Function calling enables three critical capabilities:

Data freshness: LLMs have knowledge cutoff dates. Function calling allows access to real-time information—stock prices, weather, calendar availability, and database records.

Action execution: Beyond reading data, functions can trigger actions like sending emails, creating calendar events, processing refunds, or updating database records.

Workflow orchestration: Complex multi-step processes can be broken down into discrete functions that the LLM orchestrates dynamically based on context. In multi-agent systems, function calling becomes the communication layer between specialized agents, each exposing capabilities as callable tools.

Common Failure Modes and Reliability Patterns

Production function calling implementations face several predictable failure modes that require defensive engineering.

Schema Violations and Type Errors

Without structured outputs, LLMs can generate malformed JSON responses or invalid tool inputs. Anthropic’s documentation identifies specific issues: parsing errors from invalid JSON syntax, missing required fields, inconsistent data types, and schema violations requiring error handling and retries⁵.

The solution is constrained decoding through structured outputs. Anthropic’s structured outputs feature guarantees schema-compliant responses by enforcing valid JSON syntax, type-safe fields, and required field presence at the API level⁵. [Updated March 2026] For tool calls specifically, add strict: true to your tool definition to guarantee that Claude’s tool call parameters exactly match your schema — the model cannot produce tokens that would violate it. OpenAI offers equivalent capabilities through their structured outputs mode, which ensures responses conform exactly to supplied JSON schemas⁶.

Hallucinated Parameters

LLMs may invent parameter values when user input is ambiguous. The OpenAI cookbook demonstrates this with a system prompt instruction: “Don’t make assumptions about what values to plug into functions. Ask for clarification if a user request is ambiguous”⁴. When asked “What’s the weather like today?” without location context, the model properly asks for city and temperature unit preference rather than guessing.

Error Handling Patterns

Microsoft’s Azure documentation outlines a three-step error handling approach: call the API with functions and user input, use the model’s response to call your API or function, then call the API again including the function response⁷. However, this basic pattern needs enhancement for production reliability.

Retry with exponential backoff: Network failures and transient errors require automatic retry mechanisms. The OpenAI cookbook implements this using the tenacity library with @retry(wait=wait_random_exponential(multiplier=1, max=40), stop=stop_after_attempt(3)) decorators⁴.

Graceful degradation: When functions fail, the application should provide informative error messages back to the model rather than crashing. This allows the LLM to explain the limitation to users or attempt alternative approaches.

Validation layers: Implement server-side validation of all parameters before executing functions. Never trust LLM-generated inputs to be safe or correctly typed.

Schema Design Best Practices

Effective function schemas balance expressiveness with reliability. Based on patterns from OpenAI, Anthropic, and LangChain documentation, here are proven approaches:

Function Naming Conventions

Use descriptive, action-oriented names that clearly indicate what the function does. Prefer get_current_weather over weather or fetch_data. The name appears in the model’s context and influences its selection decisions.

Parameter Design

Approach	Benefits	Trade-offs
Required parameters	Explicit data requirements	Fails if information missing
Optional with defaults	Graceful degradation	May produce suboptimal results
Enum constraints	Prevents invalid values	Limited flexibility
Nested objects	Complex data structures	Harder for models to generate correctly

Based on the function calling patterns observed across implementations, required parameters with clear descriptions yield the most reliable results¹³.

Description Quality

Parameter descriptions should include:

What the parameter represents
Expected format with examples
Constraints or valid ranges
How to infer the value from context

Example from Anthropic’s documentation: "description": "The city and state, e.g. San Francisco, CA"³. This pattern—value description followed by concrete example—helps models generate correctly formatted inputs.

Parallel vs. Sequential Function Calls

Modern LLMs support parallel function calling, allowing multiple independent function calls in a single response. This reduces latency for operations that don’t depend on each other—like fetching weather for multiple cities simultaneously⁷.

However, not all operations can be parallelized. Sequential calling is required when:

One function’s output is another’s input
Operations have side effects that must complete in order
Rate limits or resource constraints require throttling

Comparison: Native APIs vs. Framework Abstractions

Developers face a choice between using LLM APIs directly or adopting frameworks like LangChain that abstract the function calling pattern.

Factor	Native API	Framework (LangChain)
Control	Full visibility into prompts and responses	Higher-level abstractions may obscure details
Debugging	Direct access to all request/response data	Tracing tools like LangSmith provide visibility
Flexibility	Implement any custom logic	Constrained to framework patterns
Learning curve	Requires understanding API specifics	Faster initial development
Portability	Provider-specific code	Standardized interface across providers

Anthropic’s guidance is clear: “We suggest that developers start by using LLM APIs directly: many patterns can be implemented in a few lines of code. If you do use a framework, ensure you understand the underlying code. Incorrect assumptions about what’s under the hood are a common source of customer error”². This same philosophy applies to prompt engineering patterns — understanding the underlying mechanisms before layering abstractions on top produces more reliable results.

API Evolution: What Changed in 2025–2026

The function calling landscape shifted significantly over the past year. If you’re referencing documentation written before mid-2025, several specifics are now outdated.

OpenAI: `functions` is deprecated, `tools` is the standard

OpenAI deprecated the functions and function_call Chat Completions parameters in late 2023 in favor of tools and tool_choice. Code that still uses functions will trigger a deprecation warning, and the two parameters cannot be provided simultaneously. [Updated March 2026] The bigger shift is the Responses API, launched March 2025 as OpenAI’s agent-native successor to Chat Completions. It runs an agentic loop internally — the model can call multiple tools (web search, code interpreter, file search, your custom functions, and remote MCP servers) within a single API request. For new agent-oriented projects, the Responses API is the better starting point.

Anthropic: `strict: true` replaces prompt-based schema enforcement

The article’s note about prompting the model to “ask for clarification rather than guess” remains valid as a behavioral guardrail, but it’s no longer your primary schema defense. Adding strict: true to a tool definition in the Claude API compiles your JSON schema into a grammar and restricts token generation at inference time — the model structurally cannot produce a parameter value that violates your schema. This is enforced at the API level, not via instruction-following. [Updated March 2026] Structured outputs are now generally available for Claude Sonnet 4.5, Opus 4.5, and Haiku 4.5 with no beta header required.

Enforcement layer	Mechanism	Failure mode if skipped
`strict: true` (Anthropic) / structured outputs (OpenAI)	Grammar-constrained token generation	Type mismatches, missing required fields
System prompt instructions	Model behavior guidance	Ignored under context pressure or ambiguity
Server-side validation	Application-layer type checking	Last safety net — should not be your only defense

All three layers are complementary. Schema-level enforcement catches structural errors; prompt instructions shape behavioral choices; server-side validation guards against edge cases.

The Model Context Protocol (MCP) Standard

The Model Context Protocol (MCP) represents a rapidly maturing standard for connecting AI applications to external systems. Described as “like a USB-C port for AI applications,” MCP provides a standardized way to connect AI applications to data sources, tools, and workflows⁸.

MCP reduces development complexity by providing:

Standardized tool definitions that work across compatible applications
Growing ecosystem of third-party integrations (over 5,000 servers as of early 2026)
Simplified client implementation patterns

[Updated March 2026] MCP has moved decisively beyond Anthropic’s ecosystem. OpenAI’s Responses API now supports remote MCP servers natively — you can point the API at any public MCP server with a few lines of code, and the model discovers its tools automatically. This cross-vendor adoption changes the calculus: reusable MCP tool definitions now work across Claude, OpenAI, and other compatible runtimes without per-provider adapters. For organizations building multiple AI applications, the investment in an MCP server has a wider return than it did a year ago. See the MCP Registry: GitHub’s Play to Become the App Store for AI Tools for the current state of the ecosystem.

Production Checklist

Before deploying function calling to production, verify:

Frequently Asked Questions

Q: What’s the difference between function calling and tool use? A: These terms refer to the same capability. OpenAI and most of the industry use “function calling,” while Anthropic uses “tool use.” Both describe the pattern where LLMs generate structured API calls based on provided schemas. The underlying mechanism is identical—only the terminology differs.

Q: How do I prevent LLMs from hallucinating function parameters? A: Use three strategies: (1) Write detailed parameter descriptions with examples, (2) Enable structured outputs to enforce schema compliance, (3) Include system prompts instructing the model to ask for clarification rather than guess when information is ambiguous. Server-side validation provides a final safety net.

Q: Should I use LangChain or call LLM APIs directly? A: Start with direct API calls to understand the underlying patterns. Anthropic recommends this approach for most teams. Frameworks like LangChain add value when you need standardized interfaces across multiple providers or built-in tracing capabilities. Understand the abstractions before adopting them.

Q: How do I handle function call failures gracefully? A: Implement a three-layer approach: schema validation to catch malformed inputs before execution, try-catch blocks around function execution with exponential backoff for retries, and informative error messages returned to the LLM so it can explain issues to users. Never expose internal error details to end users.

Q: Can I use function calling with local LLMs? A: Yes, but support varies by model. Llama 3.1, Llama 3.2, Mistral, and Qwen 2.5 support function calling through Ollama’s tool calling interface (v0.8+). LM Studio provides a GUI-driven alternative. Note that Ollama currently does not support streaming tool calls or the tool_choice parameter — features available in cloud APIs — so verify your specific requirements before committing to a local stack.

Q: Should I use Chat Completions or the OpenAI Responses API for tool calling? A: [Updated March 2026] For new agentic projects, prefer the Responses API. It handles the tool loop internally across multiple tool types — your custom functions, web search, code interpreter, and remote MCP servers — in a single API request. Chat Completions remains valid for simpler integrations or where you need fine-grained turn-by-turn control. Note that the Responses API is the successor; the older Assistants API has been deprecated and shuts down in August 2026.

Function calling represents one of the most powerful capabilities in modern LLMs—when implemented correctly. The difference between a prototype and production system often comes down to defensive engineering: comprehensive schemas, robust error handling, and validation at every boundary. As the ecosystem matures with standards like MCP and improved structured outputs, the reliability gap between experimental demos and production systems continues to narrow.

OpenAI. “Function Calling Guide.” https://platform.openai.com/docs/guides/function-calling ↩ ↩²
Anthropic. “Building Effective Agents.” https://www.anthropic.com/research/building-effective-agents ↩ ↩² ↩³
Anthropic. “Tool Use Overview.” https://docs.anthropic.com/en/docs/build-with-claude/tool-use ↩ ↩² ↩³ ↩⁴
OpenAI Cookbook. “Function Calling with the Chat Completions API.” https://github.com/openai/openai-cookbook ↩ ↩² ↩³
Anthropic. “Structured Outputs.” https://docs.anthropic.com/en/docs/build-with-claude/structured-outputs ↩ ↩²
OpenAI. “Structured Outputs Guide.” https://platform.openai.com/docs/guides/structured-outputs ↩
Microsoft Azure. “Function Calling with Azure OpenAI Service.” https://learn.microsoft.com/en-us/azure/ai-services/openai/how-to/function-calling ↩ ↩²
Model Context Protocol. “Introduction to MCP.” https://modelcontextprotocol.io ↩