Stop parsing prose with regex. Learn how to make an LLM return clean JSON on a schema you control, and how to let it call your functions safely with the request/execute/return loop.
You ask the model to "extract the customer's name, email, and order total" and it replies: "Sure! The customer is Jane Doe (jane@acme.io) and her order came to $429." Lovely for a human. Useless for your code. Now you are writing a regex to pull the email out, a second one for the dollar amount, and a third to handle the time it decided to answer in Spanish. Every prompt tweak breaks your parser.
The fix is to stop treating the model as a chatbot and start treating it as a function that returns data. Modern LLM APIs give you two tools for this: structured output (the model must answer in a JSON shape you define) and tool calling (the model can ask *your* code to run a function and use the result). Get these right and the LLM becomes a reliable component you can wire into real software.
Who this is for
You have made a basic LLM API call (see [Working with the LLM API](/blog/working-with-the-llm-api)) and now you need machine-readable answers or you want the model to actually *do* things, query a database, call an API, send an email. No ML background required. We use Python, but the concepts map to any SDK.
Two mental models
Structured output is a form the model must fill in. Tool calling is a set of buttons the model is allowed to press.
Both are about constraining the model. Left unconstrained, it generates the most plausible-sounding text. That is great for an essay and terrible for an integration. A form has labelled boxes and the model has to put something valid in each one. A button does nothing until pressed, and when pressed, *your* code runs, not the model's imagination.
A paper form with labelled fields and checkboxesJSON schema / structured output, the model must return the exact shape
Required fields you cannot leave blankSchema 'required' array, the model cannot omit them
A row of buttons on a dashboardTool definitions, named functions the model may invoke
Pressing a button triggers a machine, not the operatorYour backend executes the tool; the model never runs code itself
Two ways to constrain a free-text model into something your code can trust.
The tool-calling loop
The single most misunderstood thing about tool calling: the model does not run your code. It only emits a request that says "please call get_weather with city=Amsterdam." Your program receives that request, runs the real function, and hands the result back. The model then continues with that fact in hand. It is a conversation that loops until the model has everything it needs to answer.
One full turn of the tool-calling loop. The model and your code take turns until a final answer falls out.
1
You send the prompt plus tool definitions
Each API call includes the user message AND a list of tools the model is allowed to use, name, description, and a JSON schema for the arguments.
2
The model decides
It either answers directly, or it returns a 'tool_use' block: the tool name and the arguments it wants, as structured JSON. No prose answer yet.
3
Your code executes the tool
You match the tool name to a real Python function, validate the arguments, and run it. The model is paused, waiting.
4
You return the result
Append the tool result to the conversation (linked to the call's id) and send the whole thing back to the model.
5
The model continues, or loops again
With the result in context it may answer, or request another tool. You repeat until it stops asking and produces a final answer.
Four levels of reliability
There is a ladder from "hope it parses" to "guaranteed valid." Climb only as high as you need, strict schemas and tools cost more tokens and more setup. Match the technique to the job.
Technique
What you get
Reliability
Use it when
Free text
Natural-language prose
Low, you parse strings yourself
Human reads the output directly
JSON mode
Valid JSON, but any shape
Medium, parses, fields not guaranteed
Quick prototypes, loose shapes
Strict JSON schema
JSON matching your exact schema
High, fields, types, enums enforced
Extraction, classification, data into a DB
Tool calling
Validated args for named functions
High, plus the model can act, not just answer
The model must DO something (query, send, fetch)
From least to most constrained. Higher rows are cheaper and looser; lower rows are reliable and machine-ready.
JSON mode is not schema enforcement
Plain JSON mode guarantees the output *parses*, not that it has the keys you expect. If you need `email` to always be present and a string, use a strict **schema**, not just JSON mode. Many a 2am bug is a missing field that "usually" appears.
Code: a tool definition and the loop
Here is a complete, minimal tool-calling loop. The tool is described with a JSON schema for its arguments. The loop keeps running the model until it stops asking for tools, with a hard cap so a confused model can never spin forever.
tool_loop.py
python
import json
import anthropic
client = anthropic.Anthropic()
# 1. Describe the tool: name, when to use it, and an argument schema.
TOOLS = [
{
"name": "get_weather",
"description": "Get the current temperature for a city, in Celsius.",
"input_schema": {
"type": "object",
"properties": {
"city": {"type": "string", "description": "City name, e.g. 'Amsterdam'"},
},
"required": ["city"],
},
}
]
# 2. The real function the model is allowed to trigger.defget_weather(city: str) -> dict:
# In real life: call a weather API. Here we fake it.return {"city": city, "tempC": 14, "summary": "cloudy"}
TOOL_FNS = {"get_weather": get_weather}
defrun(prompt: str, max_turns: int = 5) -> str:
messages = [{"role": "user", "content": prompt}]
for _ inrange(max_turns): # bounded loop, never unbounded!
resp = client.messages.create(
model="claude-sonnet-4-5",
max_tokens=1024,
tools=TOOLS,
messages=messages,
)
if resp.stop_reason != "tool_use":
# No tool requested -> this is the final answer.return"".join(b.text for b in resp.content if b.type == "text")
# 3. Run every tool the model asked for, collect the results.
messages.append({"role": "assistant", "content": resp.content})
results = []
for block in resp.content:
if block.type != "tool_use":
continue
fn = TOOL_FNS.get(block.name)
if fn isNone:
output = {"error": f"unknown tool {block.name}"}
else:
try:
output = fn(**block.input) # validate in real code!except Exception as e:
output = {"error": str(e)}
results.append({
"type": "tool_result",
"tool_use_id": block.id,
"content": json.dumps(output),
})
# 4. Feed the results back and loop.
messages.append({"role": "user", "content": results})
return"Stopped: hit max_turns without a final answer."print(run("What's the weather in Amsterdam? Should I take an umbrella?"))
Notice three things: the assistant's tool-call message is appended before the result, each result is keyed to its tool_use_id, and the loop has a max_turns ceiling. Drop any one of those and the loop breaks subtly.
Code: structured output with a schema
When you do not need the model to *act*, you just want clean data, use a structured-output schema. Define the shape, ask the model to fill it, then validate before you trust it. Pydantic makes the schema and the validation the same source of truth.
extract.py
python
import json
import anthropic
from pydantic import BaseModel, EmailStr, ValidationError
client = anthropic.Anthropic()
# The form the model must fill in.classOrder(BaseModel):
customer_name: str
email: EmailStr
total_usd: float
priority: str # one of: low | normal | high
SCHEMA = {
"type": "object",
"properties": {
"customer_name": {"type": "string"},
"email": {"type": "string", "format": "email"},
"total_usd": {"type": "number"},
"priority": {"type": "string", "enum": ["low", "normal", "high"]},
},
"required": ["customer_name", "email", "total_usd", "priority"],
}
# Trick: expose the schema AS a tool and force the model to call it.# The 'arguments' it produces are your structured output.defextract(text: str) -> Order:
resp = client.messages.create(
model="claude-sonnet-4-5",
max_tokens=512,
tools=[{
"name": "record_order",
"description": "Record the extracted order details.",
"input_schema": SCHEMA,
}],
tool_choice={"type": "tool", "name": "record_order"}, # force it
messages=[{"role": "user", "content": f"Extract the order:\n{text}"}],
)
raw = next(b.input for b in resp.content if b.type == "tool_use")
# NEVER skip this step. The schema guides the model; it does not# guarantee semantics. Validate before the data touches your DB.try:
returnOrder(**raw)
except ValidationError as e:
raiseValueError(f"Model returned invalid order: {e}")
order = extract("Jane Doe (jane@acme.io) placed a rush order totalling $429.")
print(order.model_dump())
Common mistakes that cost hours
No validation. A schema *guides* the model; it does not certify the data is sane. A total_usd of -1 or an email of "n/a" can still come back. Always re-validate (Pydantic, zod, JSON-Schema) before the value reaches your database or an irreversible action.
Trusting hallucinated arguments. The model can invent a plausible-looking user_id that does not exist, or a city you never support. Treat tool arguments like untrusted user input: check ranges, enums, and existence before you act on them.
Unbounded tool loops. Without a max_turns cap, a confused model can call the same tool forever, burning tokens and money. Always bound the loop and log when you hit the ceiling.
Vague tool descriptions. "Gets data" tells the model nothing about *when* to use it. Write the description like a docstring for a junior dev: what it does, what each argument means, and when NOT to call it.
Forgetting the result id. Each tool_result must reference the tool_use_id of the call it answers. Mismatch them and the model loses the thread of which result belongs to which request.
Side effects on retry. If the model calls send_email twice (it happens), you double-send. Make irreversible tools idempotent or require a human confirmation step.
Takeaways
The whole article in seven lines
Treat the LLM as a function that returns data, not a chatbot that returns prose.
**Structured output** = a form the model must fill in. **Tool calling** = buttons it may press.
JSON mode guarantees it *parses*; a strict **schema** guarantees the fields and types.
The model never runs your code, it emits a request, your code executes, you feed the result back.
The loop is: prompt+tools → tool call → execute → return result → continue → final answer.
Always **validate** model output before it touches a database or an irreversible action.
Always **bound** the tool loop with a max-turns ceiling.
Where to go next
Structured output and tool calling are the two primitives that turn an LLM from a text generator into a software component. The natural next step is to chain many tool calls together with memory and a goal, that is what an agent is.
Came in cold? Start with Working with the LLM API for the request/response basics this article builds on.
Ready to go further?Building AI Agents wires this loop into a goal-driven agent with planning and memory.
Want the full track? Follow the AI Engineer path from foundations to production.
Want to go deeper?
This article covers concepts taught hands-on in the Cloud Engineer and DevOps career paths, with real terminal labs, production scenarios, and structured lessons.