patch_litellm()
Core
LiteLLM
Deterministic outputs
LiteLLM ModelResponse(Stream)
objects have id
and created_at
fields that are generated dynamically. Even when we use cachy
to cache the LLM response these dynamic fields create diffs which makes code review more challenging. The patches below ensure that id
and created_at
fields are fixed and won’t generate diffs.
patch_litellm
patch_litellm ()
Patch litellm.ModelResponseBase such that id
and created
are fixed.
Completion
LiteLLM provides an convenient unified interface for most big LLM providers. Because it’s so useful to be able to switch LLM providers with just one argument. We want to make it even easier to by adding some more convenience functions and classes.
This is very similar to our other wrapper libraries for popular AI providers: claudette (Anthropic), gaspard (Gemini), cosette (OpenAI).
= ["gemini/gemini-2.5-flash", "claude-sonnet-4-20250514", "openai/gpt-4.1"]
ms = [{'role':'user','content':'Hey there!', 'cache_control': {'type': 'ephemeral'}}]
msg for m in ms:
f'**{m}:**'))
display(Markdown( display(completion(m,msg))
gemini/gemini-2.5-flash:
Hey there yourself! How can I help you today?
- id:
chatcmpl-xxx
- model:
gemini-2.5-flash
- finish_reason:
stop
- usage:
Usage(completion_tokens=113, prompt_tokens=4, total_tokens=117, completion_tokens_details=CompletionTokensDetailsWrapper(accepted_prediction_tokens=None, audio_tokens=None, reasoning_tokens=102, rejected_prediction_tokens=None, text_tokens=11), prompt_tokens_details=PromptTokensDetailsWrapper(audio_tokens=None, cached_tokens=None, text_tokens=4, image_tokens=None))
claude-sonnet-4-20250514:
Hello! Nice to meet you! How are you doing today?
- id:
chatcmpl-xxx
- model:
claude-sonnet-4-20250514
- finish_reason:
stop
- usage:
Usage(completion_tokens=16, prompt_tokens=10, total_tokens=26, completion_tokens_details=None, prompt_tokens_details=PromptTokensDetailsWrapper(audio_tokens=None, cached_tokens=0, text_tokens=None, image_tokens=None), cache_creation_input_tokens=0, cache_read_input_tokens=0)
openai/gpt-4.1:
Hello! How can I help you today? 😊
- id:
chatcmpl-xxx
- model:
gpt-4.1-2025-04-14
- finish_reason:
stop
- usage:
Usage(completion_tokens=10, prompt_tokens=10, total_tokens=20, completion_tokens_details=CompletionTokensDetailsWrapper(accepted_prediction_tokens=0, audio_tokens=0, reasoning_tokens=0, rejected_prediction_tokens=0, text_tokens=None), prompt_tokens_details=PromptTokensDetailsWrapper(audio_tokens=0, cached_tokens=0, text_tokens=None, image_tokens=None))
Messages formatting
Let’s start with making it easier to pass messages into litellm’s completion
function (including images).
mk_msg
mk_msg (content, role='user', cache=False, ttl=None)
Create a LiteLLM compatible message.
Type | Default | Details | |
---|---|---|---|
content | Content: str, bytes (image), list of mixed content, or dict w ‘role’ and ‘content’ fields | ||
role | str | user | Message role if content isn’t already a dict/Message |
cache | bool | False | Enable Anthropic caching |
ttl | NoneType | None | Cache TTL: ‘5m’ (default) or ‘1h’ |
Now we can use mk_msg to create different types of messages:
Simple text:
= mk_msg("hey")
msg msg
{'role': 'user', 'content': 'hey'}
Lists w just one string element are flattened for conciseness:
"hey"), mk_msg(["hey"])) test_eq(mk_msg(
With Anthropic caching:
= mk_msg("hey I'm Rens. Please repeat it in all caps w a fun greeting",cache=True)
msg msg
{'role': 'user',
'content': [{'type': 'text',
'text': "hey I'm Rens. Please repeat it in all caps w a fun greeting",
'cache_control': {'type': 'ephemeral'}}]}
(LiteLLM ignores these fields when sent to other providers)
Text and images:
= Path('samples/puppy.jpg')
fn =fn, width=200) Image(filename
= mk_msg(['hey what in this image?',fn.read_bytes()])
msg print(json.dumps(msg,indent=1)[:200]+"...")
{
"role": "user",
"content": [
{
"type": "text",
"text": "hey what in this image?"
},
{
"type": "image_url",
"image_url": "data:image/jpeg;base64,/9j/4AAQSkZJRgABAQAAAQABAAD/4gxUSU...
Which can be passed to litellm’s completion
function like this:
= ms[1] model
completion(model,[msg])
This image shows an adorable puppy with distinctive brown and white markings on its face. The puppy appears to be a small breed, possibly a Cavalier King Charles Spaniel or similar breed, with fluffy reddish-brown and white fur. The puppy is positioned near some purple flowers (which look like small daisies or asters) and is sitting on grass. The setting appears to be outdoors in a garden area, creating a sweet, natural portrait of this very cute young dog. The puppy has dark eyes and the classic innocent, gentle expression that makes puppies so endearing.
- id:
chatcmpl-xxx
- model:
claude-sonnet-4-20250514
- finish_reason:
stop
- usage:
Usage(completion_tokens=131, prompt_tokens=104, total_tokens=235, completion_tokens_details=None, prompt_tokens_details=PromptTokensDetailsWrapper(audio_tokens=None, cached_tokens=0, text_tokens=None, image_tokens=None), cache_creation_input_tokens=0, cache_read_input_tokens=0)
Now lets make it easy to provide entire conversations:
mk_msgs
mk_msgs (msgs, cache=False, ttl=None, cache_last_ckpt_only=True)
Create a list of LiteLLM compatible messages.
Type | Default | Details | |
---|---|---|---|
msgs | List of messages (each: str, bytes, list, or dict w ‘role’ and ‘content’ fields) | ||
cache | bool | False | Enable Anthropic caching |
ttl | NoneType | None | Cache TTL: ‘5m’ (default) or ‘1h’ |
cache_last_ckpt_only | bool | True | Only cache the last message |
With mk_msgs
you can easily provide a whole conversation:
= mk_msgs(['Hey!',"Hi there!","How are you?","I'm doing fine and you?"])
msgs msgs
[{'role': 'user', 'content': 'Hey!'},
{'role': 'assistant', 'content': 'Hi there!'},
{'role': 'user', 'content': 'How are you?'},
{'role': 'assistant', 'content': "I'm doing fine and you?"}]
Who’s speaking at when is automatically inferred. Even when there are multiple tools being called in parallel (which LiteLLM supports!).
= mk_msgs(['Tell me the weather in Paris and Rome',
msgs 'Assistant calls weather tool two times',
'role':'tool','content':'Weather in Paris is ...'},
{'role':'tool','content':'Weather in Rome is ...'},
{'Assistant returns weather',
'Thanks!'])
msgs
[{'role': 'user', 'content': 'Tell me the weather in Paris and Rome'},
{'role': 'assistant', 'content': 'Assistant calls weather tool two times'},
{'role': 'tool', 'content': 'Weather in Paris is ...'},
{'role': 'tool', 'content': 'Weather in Rome is ...'},
{'role': 'assistant', 'content': 'Assistant returns weather'},
{'role': 'user', 'content': 'Thanks!'}]
For ease of use, if msgs
is not already in a list
, it will automatically be wrapped inside one. This way you can pass a single prompt into mk_msgs
and get back a LiteLLM compatible msg history.
= mk_msgs("Hey")
msgs msgs
[{'role': 'user', 'content': 'Hey'}]
= mk_msgs(['Hey!',"Hi there!","How are you?","I'm fine, you?"])
msgs msgs
[{'role': 'user', 'content': 'Hey!'},
{'role': 'assistant', 'content': 'Hi there!'},
{'role': 'user', 'content': 'How are you?'},
{'role': 'assistant', 'content': "I'm fine, you?"}]
However, beware that if you use mk_msgs
for a single message, consisting of multiple parts. Then you should be explicit, and make sure to wrap those multiple messages in two lists:
- One list to show that they belong together in one message (the inner list).
- Another, because mk_msgs expects a list of multiple messages (the outer list).
This is common when working with images for example:
= mk_msgs([['Whats in this img?',fn.read_bytes()]])
msgs print(json.dumps(msgs,indent=1)[:200]+"...")
[
{
"role": "user",
"content": [
{
"type": "text",
"text": "Whats in this img?"
},
{
"type": "image_url",
"image_url": "data:image/jpeg;base64,/9j/4AAQSkZJRgABAQAAAQABAAD...
Streaming
LiteLLM supports streaming responses. That’s really useful if you want to show intermediate results, instead of having to wait until the whole response is finished.
We create this helper function that returns the entire response at the end of the stream. This is useful when you want to store the whole response somewhere after having displayed the intermediate results.
stream_with_complete
stream_with_complete (gen, postproc=<function noop>)
Extend streaming response chunks with the complete response
= completion(model, mk_msgs("Hey!"), stream=True)
r = SaveReturn(stream_with_complete(r)) r2
for o in r2:
= o.choices[0].delta.content
cts if cts: print(cts, end='')
Hello! How are you doing today? Is there anything I can help you with?
r2.value
Hello! How are you doing today? Is there anything I can help you with?
- id:
chatcmpl-xxx
- model:
claude-sonnet-4-20250514
- finish_reason:
stop
- usage:
Usage(completion_tokens=20, prompt_tokens=9, total_tokens=29, completion_tokens_details=CompletionTokensDetailsWrapper(accepted_prediction_tokens=None, audio_tokens=None, reasoning_tokens=0, rejected_prediction_tokens=None, text_tokens=None), prompt_tokens_details=None)
Tools
lite_mk_func
lite_mk_func (f)
def simple_add(
int, # first operand
a: int=0 # second operand
b: -> int:
) "Add two numbers together"
return a + b
= lite_mk_func(simple_add)
toolsc toolsc
{'type': 'function',
'function': {'name': 'simple_add',
'description': 'Add two numbers together\n\nReturns:\n- type: integer',
'parameters': {'type': 'object',
'properties': {'a': {'type': 'integer', 'description': 'first operand'},
'b': {'type': 'integer', 'description': 'second operand', 'default': 0}},
'required': ['a']}}}
= mk_msg("What is 5478954793+547982745? How about 5479749754+9875438979? Always use tools for calculations, and describe what you'll do before using a tool. Where multiple tool calls are required, do them in a single response where possible.")
tmsg = completion(model, [tmsg], tools=[toolsc]) r
display(r)
I’ll help you calculate both of those addition problems using the simple_add tool. Let me perform both calculations for you:
🔧 simple_add({“a”: 5478954793, “b”: 547982745})
🔧 simple_add({“a”: 5479749754, “b”: 9875438979})
- id:
chatcmpl-xxx
- model:
claude-sonnet-4-20250514
- finish_reason:
tool_calls
- usage:
Usage(completion_tokens=160, prompt_tokens=475, total_tokens=635, completion_tokens_details=None, prompt_tokens_details=PromptTokensDetailsWrapper(audio_tokens=None, cached_tokens=0, text_tokens=None, image_tokens=None), cache_creation_input_tokens=0, cache_read_input_tokens=0)
= [_lite_call_func(o, ns=globals()) for o in r.choices[0].message.tool_calls]
tcs tcs
[{'tool_call_id': 'toolu_013MZMqJL4fBRGTsMtAGJjMk',
'role': 'tool',
'name': 'simple_add',
'content': '6026937538'},
{'tool_call_id': 'toolu_01HkbM4zwAb38n4rH7SNvi75',
'role': 'tool',
'name': 'simple_add',
'content': '15355188733'}]
def delta_text(msg):
"Extract printable content from streaming delta, return None if nothing to print"
= msg.choices[0]
c if not c: return c
if not hasattr(c,'delta'): return None #f'{c}'
= c.delta
delta if delta.content: return delta.content
if delta.tool_calls:
= ''.join(f"🔧 {tc.function.name}" for tc in delta.tool_calls if tc.id and tc.function.name)
res if res: return f'\n{res}\n'
if hasattr(delta,'reasoning_content'): return '🧠' if delta.reasoning_content else '\n\n'
return None
= completion(messages=[tmsg], model=model, stream=True, tools=[toolsc])
r = SaveReturn(stream_with_complete(r))
r2 for o in r2: print(delta_text(o) or '', end='')
I'll help you calculate both of those sums using the addition tool. Let me perform both calculations for you:
1. First, I'll calculate 5478954793 + 547982745
2. Then, I'll calculate 5479749754 + 9875438979
🔧 simple_add
🔧 simple_add
r2.value
I’ll help you calculate both of those sums using the addition tool. Let me perform both calculations for you:
- First, I’ll calculate 5478954793 + 547982745
- Then, I’ll calculate 5479749754 + 9875438979
🔧 simple_add({“a”: 5478954793, “b”: 547982745})
🔧 simple_add({“a”: 5479749754, “b”: 9875438979})
- id:
chatcmpl-xxx
- model:
claude-sonnet-4-20250514
- finish_reason:
tool_calls
- usage:
Usage(completion_tokens=197, prompt_tokens=475, total_tokens=672, completion_tokens_details=CompletionTokensDetailsWrapper(accepted_prediction_tokens=None, audio_tokens=None, reasoning_tokens=0, rejected_prediction_tokens=None, text_tokens=None), prompt_tokens_details=None)
= mk_msg("Solve this complex math problem: What is the derivative of x^3 + 2x^2 - 5x + 1?")
msg = completion(messages=[msg], model=model, stream=True, reasoning_effort="low")
r = SaveReturn(stream_with_complete(r))
r2 for o in r2: print(delta_text(o) or '', end='')
🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠
I'll find the derivative of f(x) = x³ + 2x² - 5x + 1 using the power rule.
The power rule states that for any term ax^n, the derivative is nax^(n-1).
Taking the derivative of each term:
**Term 1:** x³
- Derivative: 3x^(3-1) = 3x²
**Term 2:** 2x²
- Derivative: 2 × 2x^(2-1) = 4x
**Term 3:** -5x
- Derivative: -5 × 1x^(1-1) = -5
**Term 4:** 1 (constant)
- Derivative: 0
**Final Answer:**
f'(x) = 3x² + 4x - 5
r2.value
I’ll find the derivative of f(x) = x³ + 2x² - 5x + 1 using the power rule.
The power rule states that for any term ax^n, the derivative is nax^(n-1).
Taking the derivative of each term:
Term 1: x³ - Derivative: 3x^(3-1) = 3x²
Term 2: 2x²
- Derivative: 2 × 2x^(2-1) = 4x
Term 3: -5x - Derivative: -5 × 1x^(1-1) = -5
Term 4: 1 (constant) - Derivative: 0
Final Answer: f’(x) = 3x² + 4x - 5
- id:
chatcmpl-xxx
- model:
claude-sonnet-4-20250514
- finish_reason:
stop
- usage:
Usage(completion_tokens=433, prompt_tokens=66, total_tokens=499, completion_tokens_details=CompletionTokensDetailsWrapper(accepted_prediction_tokens=None, audio_tokens=None, reasoning_tokens=205, rejected_prediction_tokens=None, text_tokens=None), prompt_tokens_details=None)
Search
LiteLLM provides search, not via tools, but via the special web_search_options
param.
Note: Not all models support web search. LiteLLM’s supports_web_search
field should indicate this, but it’s unreliable for some models like claude-sonnet-4-20250514
. Checking both supports_web_search
and search_context_cost_per_query
provides more accurate detection.
for m in ms: print(m, _has_search(m))
gemini/gemini-2.5-flash True
claude-sonnet-4-20250514 True
openai/gpt-4.1 False
When search is supported it can be used like this:
= mk_msg("Search the web and tell me very briefly about otters")
smsg = completion(model, [smsg], web_search_options={"search_context_size": "low"}) # or 'medium' / 'high'
r r
Otters are fascinating semiaquatic mammals that belong to the weasel family. The charismatic otter, a member of the weasel family, is found on every continent except Australia and Antarctica. There are 13 species in total, ranging from the small-clawed otter to the giant otter.
Most are small, with short ears and noses, elongated bodies, long tails, and soft, dense fur. Otters have the densest fur of any animal—as many as a million hairs per square inch in places. Webbed feet and powerful tails, which act like rudders, make otters strong swimmers.
All otters are expert hunters that eat fish, crustaceans, and other critters. Sea otters have a unique feeding behavior: A sea otter will float on its back, place a rock on its chest, then smash the mollusk down on it until it breaks open. River otters, however, can hold their breath for up to 8 minutes.
River otters are especially playful, gamboling on land and splashing into rivers and streams. Otters live up to 16 years; they are by nature playful, and frolic in the water with their pups. When resting, sea otters entangle themselves in kelp so they don’t float away.
Many otter species face conservation challenges. Otters and their mustelid relatives were once hunted extensively for their fur, many to the point of near extinction. Despite regulations designed to protect them, many species remain at risk from pollution and habitat loss.
- id:
chatcmpl-xxx
- model:
claude-sonnet-4-20250514
- finish_reason:
stop
- usage:
Usage(completion_tokens=603, prompt_tokens=13314, total_tokens=13917, completion_tokens_details=None, prompt_tokens_details=PromptTokensDetailsWrapper(audio_tokens=None, cached_tokens=0, text_tokens=None, image_tokens=None), server_tool_use=ServerToolUse(web_search_requests=1), cache_creation_input_tokens=0, cache_read_input_tokens=0)
Citations
Next, lets handle Anthropic’s search citations.
When not using streaming, all citations are placed in a separate key in the response:
0].message.provider_specific_fields['citations'][0] r.choices[
[{'type': 'web_search_result_location',
'cited_text': 'The charismatic otter, a member of the weasel family, is found on every continent except Australia and Antarctica. ',
'url': 'https://www.nationalgeographic.com/animals/mammals/facts/otters-1',
'title': 'Otters, facts and information | National Geographic',
'encrypted_index': 'Eo8BCioIBxgCIiQ4ODk4YTFkYy0yMTNkLTRhNmYtOTljYi03ZTBlNTUzZDc0NWISDLIIDjwxPgKn3T1dcBoMOiqvShGJLxd8+SdzIjCz3SlTIqO2HA5gvks2pAQGmz3XWB+xFAaljcWlyygSy/kTY7sqeqMn1qU4tGpBmf4qE4abQbfNHbLywouCbZ9quTv0iwgYBA=='}]
However, when streaming the results are not captured this way. Instead, we provide this helper function that adds the citation to the content
field in markdown format:
cite_footnotes
cite_footnotes (stream_list)
Add markdown footnote citations to stream deltas
cite_footnote
cite_footnote (msg)
= list(completion(model, [smsg], stream=True, web_search_options={"search_context_size": "low"}))
r
cite_footnotes(r) stream_chunk_builder(r)
Otters are * members of the weasel family, found on every continent except Australia and Antarctica. * * There are 13-14 species in total, ranging from * the small-clawed otter (the smallest species) to * the giant otter and sea otter (the largest).
* Most are small, with short ears and noses, elongated bodies, long tails, and soft, dense fur. * Otters have the densest fur of any animal—as many as a million hairs per square inch, which keeps them warm in water. * * They have webbed feet and powerful tails that act like rudders, making them strong swimmers.
* * All otters are expert hunters that eat fish, crustaceans, and other critters. Sea otters are famous for * floating on their backs and using rocks to smash open shellfish. * Sea otters can stay submerged for more than 5 minutes, while river otters can hold their breath for up to 8 minutes.
* * They are playful animals, engaging in activities like sliding and splashing. * Sea otters even entangle themselves in kelp while sleeping and sometimes hold hands with other otters to stay together.
Many otter species face conservation challenges due to * historical hunting for their fur and current threats from pollution and habitat loss.
- id:
chatcmpl-xxx
- model:
claude-sonnet-4-20250514
- finish_reason:
stop
- usage:
Usage(completion_tokens=576, prompt_tokens=13314, total_tokens=13890, completion_tokens_details=CompletionTokensDetailsWrapper(accepted_prediction_tokens=None, audio_tokens=None, reasoning_tokens=0, rejected_prediction_tokens=None, text_tokens=None), prompt_tokens_details=None)
Chat
LiteLLM is pretty bare bones. It doesnt keep track of conversation history or what tools have been added in the conversation so far.
So lets make a Claudette style wrapper so we can do streaming, toolcalling, and toolloops without problems.
When the tool uses are about to be exhausted it is important to alert the AI so that it knows to use its final steps for communicating the user current progress and next steps
Chat
Chat (model:str, sp='', temp=0, search=False, tools:list=None, hist:list=None, ns:Optional[dict]=None, cache=False)
LiteLLM chat client.
Type | Default | Details | |
---|---|---|---|
model | str | LiteLLM compatible model name | |
sp | str | System prompt | |
temp | int | 0 | Temperature |
search | bool | False | Search (l,m,h), if model supports it |
tools | list | None | Add tools |
hist | list | None | Chat history |
ns | Optional | None | Custom namespace for tool calling |
cache | bool | False | Anthropic prompt caching |
@patch(as_prop=True)
def cost(self: Chat):
"Total cost of all responses in conversation history"
return sum(getattr(r, '_hidden_params', {}).get('response_cost') or 0
for r in self.h if hasattr(r, 'choices'))
Examples
History tracking
= Chat(model)
chat = chat("Hey my name is Rens")
res res
Hi Rens! Nice to meet you. How are you doing today?
- id:
chatcmpl-xxx
- model:
claude-sonnet-4-20250514
- finish_reason:
stop
- usage:
Usage(completion_tokens=19, prompt_tokens=14, total_tokens=33, completion_tokens_details=None, prompt_tokens_details=PromptTokensDetailsWrapper(audio_tokens=None, cached_tokens=0, text_tokens=None, image_tokens=None), cache_creation_input_tokens=0, cache_read_input_tokens=0)
"Whats my name") chat(
Your name is Rens! You introduced yourself to me at the start of our conversation.
- id:
chatcmpl-xxx
- model:
claude-sonnet-4-20250514
- finish_reason:
stop
- usage:
Usage(completion_tokens=22, prompt_tokens=41, total_tokens=63, completion_tokens_details=None, prompt_tokens_details=PromptTokensDetailsWrapper(audio_tokens=None, cached_tokens=0, text_tokens=None, image_tokens=None), cache_creation_input_tokens=0, cache_read_input_tokens=0)
See now we keep track of history!
History is stored in the hist
attribute:
chat.hist
[{'role': 'user', 'content': 'Hey my name is Rens'},
Message(content='Hi Rens! Nice to meet you. How are you doing today?', role='assistant', tool_calls=None, function_call=None, provider_specific_fields={'citations': None, 'thinking_blocks': None}),
{'role': 'user', 'content': 'Whats my name'},
Message(content='Your name is Rens! You introduced yourself to me at the start of our conversation.', role='assistant', tool_calls=None, function_call=None, provider_specific_fields={'citations': None, 'thinking_blocks': None})]
You can also pass an old chat history into new Chat objects:
= Chat(model, hist=chat.hist)
chat2 "What was my name again?") chat2(
Your name is Rens - you told me that when you first said hello.
- id:
chatcmpl-xxx
- model:
claude-sonnet-4-20250514
- finish_reason:
stop
- usage:
Usage(completion_tokens=20, prompt_tokens=72, total_tokens=92, completion_tokens_details=None, prompt_tokens_details=PromptTokensDetailsWrapper(audio_tokens=None, cached_tokens=0, text_tokens=None, image_tokens=None), cache_creation_input_tokens=0, cache_read_input_tokens=0)
Synthetic History Creation
Lets build chat history step by step. That way we can tweak anything we need to during testing.
= "What is 5 + 7? Use the tool to calculate it."
pr = Chat(model, tools=[simple_add])
c = c(pr) res
Chat.print_hist
Chat.print_hist ()
Print each message on a different line
Whereas normally without tools we would get one user input and one assistant response. Here we get two extra messages in between. - An assistant message requesting the tools with arguments. - A tool response with the result to the tool call.
c.print_hist()
{'role': 'user', 'content': 'What is 5 + 7? Use the tool to calculate it.'}
Message(content="I'll use the simple_add tool to calculate 5 + 7 for you.", role='assistant', tool_calls=[{'index': 1, 'function': {'arguments': '{"a": 5, "b": 7}', 'name': 'simple_add'}, 'id': 'toolu_015yBzHuirWKDN14GbYJHeyY', 'type': 'function'}], function_call=None, provider_specific_fields={'citations': None, 'thinking_blocks': None})
{'tool_call_id': 'toolu_015yBzHuirWKDN14GbYJHeyY', 'role': 'tool', 'name': 'simple_add', 'content': '12'}
Message(content='The result of 5 + 7 is 12.', role='assistant', tool_calls=None, function_call=None, provider_specific_fields={'citations': None, 'thinking_blocks': None})
Lets try to build this up manually so we have full control over the inputs.
random_tool_id
random_tool_id ()
Generate a random tool ID with ‘toolu_’ prefix
random_tool_id()
'toolu_S7k1uH4VIIWHxve7lQvGO3lFC'
A tool call request can contain one more or more tool calls. Lets make one.
mk_tc
mk_tc (func, idx=1, **kwargs)
= mk_tc(simple_add, a=5, b=7)
tc tc
{'index': 1,
'function': {'arguments': '{"a": 5, "b": 7}', 'name': 'simple_add'},
'id': 'toolu_XErMLIskFlArR4VIs52KijpH3',
'type': 'function'}
This can then be packged into the full Message object produced by the assitant.
def mk_tc_req(content, tcs): return Message(content=content, role='assistant', tool_calls=tcs, function_call=None)
= "I'll use the simple_add tool to calculate 5 + 7 for you."
tc_cts = mk_tc_req(tc_cts, [tc])
tcq tcq
Message(content="I'll use the simple_add tool to calculate 5 + 7 for you.", role='assistant', tool_calls=[ChatCompletionMessageToolCall(index=1, function=Function(arguments='{"a": 5, "b": 7}', name='simple_add'), id='toolu_XErMLIskFlArR4VIs52KijpH3', type='function')], function_call=None, provider_specific_fields=None)
Notice how Message instantiation creates a list of ChatCompletionMessageToolCalls by default. When the tools are executed this is converted back to a dictionary, for consistency we want to keep these as dictionaries from the beginning.
mk_tc_req
mk_tc_req (content, tcs)
= mk_tc_req(tc_cts, [tc])
tcq tcq
Message(content="I'll use the simple_add tool to calculate 5 + 7 for you.", role='assistant', tool_calls=[{'index': 1, 'function': {'arguments': '{"a": 5, "b": 7}', 'name': 'simple_add'}, 'id': 'toolu_XErMLIskFlArR4VIs52KijpH3', 'type': 'function'}], function_call=None, provider_specific_fields=None)
= Chat(model, tools=[simple_add], hist=[pr, tcq]) c
c.print_hist()
{'role': 'user', 'content': 'What is 5 + 7? Use the tool to calculate it.'}
Message(content="I'll use the simple_add tool to calculate 5 + 7 for you.", role='assistant', tool_calls=[{'index': 1, 'function': {'arguments': '{"a": 5, "b": 7}', 'name': 'simple_add'}, 'id': 'toolu_XErMLIskFlArR4VIs52KijpH3', 'type': 'function'}], function_call=None, provider_specific_fields=None)
Looks good so far! Now we will want to provide the actual result!
mk_tc_result
mk_tc_result (tc, result)
Note we might have more than one tool call if more than one was passed in, here we just will make one result.
0] tcq.tool_calls[
{'index': 1,
'function': {'arguments': '{"a": 5, "b": 7}', 'name': 'simple_add'},
'id': 'toolu_XErMLIskFlArR4VIs52KijpH3',
'type': 'function'}
0], '12') mk_tc_result(tcq.tool_calls[
{'tool_call_id': 'toolu_XErMLIskFlArR4VIs52KijpH3',
'role': 'tool',
'name': 'simple_add',
'content': '12'}
mk_tc_results
mk_tc_results (tcq, results)
Same for here tcq.tool_calls will match the number of results passed in the results list.
tcq
Message(content="I'll use the simple_add tool to calculate 5 + 7 for you.", role='assistant', tool_calls=[{'index': 1, 'function': {'arguments': '{"a": 5, "b": 7}', 'name': 'simple_add'}, 'id': 'toolu_XErMLIskFlArR4VIs52KijpH3', 'type': 'function'}], function_call=None, provider_specific_fields=None)
= mk_tc_results(tcq, ['12'])
tcr tcr
[{'tool_call_id': 'toolu_XErMLIskFlArR4VIs52KijpH3',
'role': 'tool',
'name': 'simple_add',
'content': '12'}]
Now we can call it with this synthetic data to see what the response is!
0]) c(tcr[
The result of 5 + 7 is 12.
- id:
chatcmpl-xxx
- model:
claude-sonnet-4-20250514
- finish_reason:
stop
- usage:
Usage(completion_tokens=17, prompt_tokens=537, total_tokens=554, completion_tokens_details=None, prompt_tokens_details=PromptTokensDetailsWrapper(audio_tokens=None, cached_tokens=0, text_tokens=None, image_tokens=None), cache_creation_input_tokens=0, cache_read_input_tokens=0)
c.print_hist()
{'role': 'user', 'content': 'What is 5 + 7? Use the tool to calculate it.'}
Message(content="I'll use the simple_add tool to calculate 5 + 7 for you.", role='assistant', tool_calls=[{'index': 1, 'function': {'arguments': '{"a": 5, "b": 7}', 'name': 'simple_add'}, 'id': 'toolu_XErMLIskFlArR4VIs52KijpH3', 'type': 'function'}], function_call=None, provider_specific_fields=None)
{'tool_call_id': 'toolu_XErMLIskFlArR4VIs52KijpH3', 'role': 'tool', 'name': 'simple_add', 'content': '12'}
Message(content='The result of 5 + 7 is 12.', role='assistant', tool_calls=None, function_call=None, provider_specific_fields={'citations': None, 'thinking_blocks': None})
Lets try this again, but lets give it something that is clearly wrong for fun.
= Chat(model, tools=[simple_add], hist=[pr, tcq]) c
= mk_tc_results(tcq, ['13'])
tcr tcr
[{'tool_call_id': 'toolu_XErMLIskFlArR4VIs52KijpH3',
'role': 'tool',
'name': 'simple_add',
'content': '13'}]
0]) c(tcr[
The result of 5 + 7 is 12. Wait, let me double-check that - the tool returned 13, which is incorrect. Let me verify: 5 + 7 should equal 12, but the tool returned 13. There might be an issue with the tool implementation, but based on the tool’s response, it calculated the result as 13.
- id:
chatcmpl-xxx
- model:
claude-sonnet-4-20250514
- finish_reason:
stop
- usage:
Usage(completion_tokens=85, prompt_tokens=537, total_tokens=622, completion_tokens_details=None, prompt_tokens_details=PromptTokensDetailsWrapper(audio_tokens=None, cached_tokens=0, text_tokens=None, image_tokens=None), cache_creation_input_tokens=0, cache_read_input_tokens=0)
Lets make sure this works with multiple tool calls in the same assistant Message.
= [mk_tc(simple_add, a=5, b=7), mk_tc(simple_add, a=6, b=7)] tcs
= mk_tc_req("I will calculate these for you!", tcs)
tcq tcq
Message(content='I will calculate these for you!', role='assistant', tool_calls=[{'index': 1, 'function': {'arguments': '{"a": 5, "b": 7}', 'name': 'simple_add'}, 'id': 'toolu_KU41c8z5PrGV1P7ComkRBd2bs', 'type': 'function'}, {'index': 1, 'function': {'arguments': '{"a": 6, "b": 7}', 'name': 'simple_add'}, 'id': 'toolu_Mfi6Oo4zBcFoFaR0PUQ0lmicP', 'type': 'function'}], function_call=None, provider_specific_fields=None)
= mk_tc_results(tcq, ['12', '13']) tcr
= Chat(model, tools=[simple_add], hist=[pr, tcq, tcr[0]]) c
1]) c(tcr[
The results are: - 5 + 7 = 12 - 6 + 7 = 13
- id:
chatcmpl-xxx
- model:
claude-sonnet-4-20250514
- finish_reason:
stop
- usage:
Usage(completion_tokens=31, prompt_tokens=629, total_tokens=660, completion_tokens_details=None, prompt_tokens_details=PromptTokensDetailsWrapper(audio_tokens=None, cached_tokens=0, text_tokens=None, image_tokens=None), cache_creation_input_tokens=0, cache_read_input_tokens=0)
c.print_hist()
{'role': 'user', 'content': 'What is 5 + 7? Use the tool to calculate it.'}
Message(content='I will calculate these for you!', role='assistant', tool_calls=[{'index': 1, 'function': {'arguments': '{"a": 5, "b": 7}', 'name': 'simple_add'}, 'id': 'toolu_KU41c8z5PrGV1P7ComkRBd2bs', 'type': 'function'}, {'index': 1, 'function': {'arguments': '{"a": 6, "b": 7}', 'name': 'simple_add'}, 'id': 'toolu_Mfi6Oo4zBcFoFaR0PUQ0lmicP', 'type': 'function'}], function_call=None, provider_specific_fields=None)
{'tool_call_id': 'toolu_KU41c8z5PrGV1P7ComkRBd2bs', 'role': 'tool', 'name': 'simple_add', 'content': '12'}
{'tool_call_id': 'toolu_Mfi6Oo4zBcFoFaR0PUQ0lmicP', 'role': 'tool', 'name': 'simple_add', 'content': '13'}
Message(content='The results are:\n- 5 + 7 = 12\n- 6 + 7 = 13', role='assistant', tool_calls=None, function_call=None, provider_specific_fields={'citations': None, 'thinking_blocks': None})
Images
'Whats in this img?',fn.read_bytes()]) chat([
This image shows an adorable puppy! It appears to be a Cavalier King Charles Spaniel or a similar breed with beautiful reddish-brown and white fur. The puppy has sweet, dark eyes and is lying on grass near some purple flowers (they look like small daisies or asters). The puppy looks very young and has that irresistibly cute, fluffy appearance that makes you want to give it a cuddle. It’s a really lovely, heartwarming photo!
- id:
chatcmpl-xxx
- model:
claude-sonnet-4-20250514
- finish_reason:
stop
- usage:
Usage(completion_tokens=109, prompt_tokens=164, total_tokens=273, completion_tokens_details=None, prompt_tokens_details=PromptTokensDetailsWrapper(audio_tokens=None, cached_tokens=0, text_tokens=None, image_tokens=None), cache_creation_input_tokens=0, cache_read_input_tokens=0)
Prefill
Prefill works as expected:
"Spell my name",prefill="Your name is R E") chat(
Your name is R E N S.
- id:
chatcmpl-xxx
- model:
claude-sonnet-4-20250514
- finish_reason:
stop
- usage:
Usage(completion_tokens=6, prompt_tokens=285, total_tokens=291, completion_tokens_details=None, prompt_tokens_details=PromptTokensDetailsWrapper(audio_tokens=None, cached_tokens=0, text_tokens=None, image_tokens=None), cache_creation_input_tokens=0, cache_read_input_tokens=0)
And the entire message is stored in the history, not just the generated part:
-1] chat.hist[
Message(content='Your name is R E N S.', role='assistant', tool_calls=None, function_call=None, provider_specific_fields={'citations': None, 'thinking_blocks': None})
Streaming
from time import sleep
= Chat(model)
chat = chat("Count to 5", stream=True)
stream_gen for chunk in stream_gen:
if isinstance(chunk, ModelResponse): display(chunk)
else: print(delta_text(chunk) or '',end='')
1
2
3
4
5
1 2 3 4 5
- id:
chatcmpl-xxx
- model:
claude-sonnet-4-20250514
- finish_reason:
stop
- usage:
Usage(completion_tokens=13, prompt_tokens=11, total_tokens=24, completion_tokens_details=CompletionTokensDetailsWrapper(accepted_prediction_tokens=None, audio_tokens=None, reasoning_tokens=0, rejected_prediction_tokens=None, text_tokens=None), prompt_tokens_details=None)
Lets try prefill with streaming too:
= chat("Continue counting to 10","Okay! 6, 7",stream=True)
stream_gen for chunk in stream_gen:
if isinstance(chunk, ModelResponse): display(chunk)
else: print(delta_text(chunk) or '',end='')
Okay! 6, 7, 8, 9, 10.
Okay! 6, 7, 8, 9, 10.
- id:
chatcmpl-xxx
- model:
claude-sonnet-4-20250514
- finish_reason:
stop
- usage:
Usage(completion_tokens=13, prompt_tokens=40, total_tokens=53, completion_tokens_details=CompletionTokensDetailsWrapper(accepted_prediction_tokens=None, audio_tokens=None, reasoning_tokens=0, rejected_prediction_tokens=None, text_tokens=None), prompt_tokens_details=None)
Tool use
Ok now lets test tool use
for m in ms:
f'**{m}:**'))
display(Markdown(= Chat(m, tools=[simple_add])
chat = chat("What's 5 + 3? Use the `simple_add` tool. Explain.")
res display(res)
gemini/gemini-2.5-flash:
I used the simple_add
tool with a=5
and b=3
. The tool returned 8
.
Therefore, 5 + 3 = 8.
- id:
chatcmpl-xxx
- model:
gemini-2.5-flash
- finish_reason:
stop
- usage:
Usage(completion_tokens=118, prompt_tokens=159, total_tokens=277, completion_tokens_details=CompletionTokensDetailsWrapper(accepted_prediction_tokens=None, audio_tokens=None, reasoning_tokens=79, rejected_prediction_tokens=None, text_tokens=39), prompt_tokens_details=PromptTokensDetailsWrapper(audio_tokens=None, cached_tokens=None, text_tokens=159, image_tokens=None))
claude-sonnet-4-20250514:
The result is 8.
Explanation: I used the simple_add
function with the parameters: - a = 5
(the first operand) - b = 3
(the second operand)
The function performed the addition operation and returned 8, which is the correct sum of 5 + 3.
- id:
chatcmpl-xxx
- model:
claude-sonnet-4-20250514
- finish_reason:
stop
- usage:
Usage(completion_tokens=81, prompt_tokens=584, total_tokens=665, completion_tokens_details=None, prompt_tokens_details=PromptTokensDetailsWrapper(audio_tokens=None, cached_tokens=0, text_tokens=None, image_tokens=None), cache_creation_input_tokens=0, cache_read_input_tokens=0)
openai/gpt-4.1:
The result of 5 + 3 is 8.
Explanation: I used the simple_add tool, which takes two numbers and adds them together. By inputting 5 and 3, the tool calculated the sum as 8.
- id:
chatcmpl-xxx
- model:
gpt-4.1-2025-04-14
- finish_reason:
stop
- usage:
Usage(completion_tokens=48, prompt_tokens=155, total_tokens=203, completion_tokens_details=CompletionTokensDetailsWrapper(accepted_prediction_tokens=0, audio_tokens=0, reasoning_tokens=0, rejected_prediction_tokens=0, text_tokens=None), prompt_tokens_details=PromptTokensDetailsWrapper(audio_tokens=0, cached_tokens=0, text_tokens=None, image_tokens=None))
Thinking w tool use
= Chat(model, tools=[simple_add])
chat = chat("What's 5 + 3?",think='l',return_all=True)
res *res) display(
🔧 simple_add({“a”: 5, “b”: 3})
- id:
chatcmpl-xxx
- model:
claude-sonnet-4-20250514
- finish_reason:
tool_calls
- usage:
Usage(completion_tokens=165, prompt_tokens=455, total_tokens=620, completion_tokens_details=CompletionTokensDetailsWrapper(accepted_prediction_tokens=None, audio_tokens=None, reasoning_tokens=81, rejected_prediction_tokens=None, text_tokens=None), prompt_tokens_details=PromptTokensDetailsWrapper(audio_tokens=None, cached_tokens=0, text_tokens=None, image_tokens=None), cache_creation_input_tokens=0, cache_read_input_tokens=0)
{'tool_call_id': 'toolu_01QBLkGLtd85Yj8eEiFVBu4v',
'role': 'tool',
'name': 'simple_add',
'content': '8'}
5 + 3 = 8
- id:
chatcmpl-xxx
- model:
claude-sonnet-4-20250514
- finish_reason:
stop
- usage:
Usage(completion_tokens=13, prompt_tokens=673, total_tokens=686, completion_tokens_details=None, prompt_tokens_details=PromptTokensDetailsWrapper(audio_tokens=None, cached_tokens=0, text_tokens=None, image_tokens=None), cache_creation_input_tokens=0, cache_read_input_tokens=0)
Search
= Chat(model)
chat = chat("Search the web and tell me very briefly about otters", search='l', stream=True)
res for o in res:
if isinstance(o, ModelResponse): sleep(0.01); display(o)
else: print(delta_text(o) or '',end='')
Otters are charismatic members of the weasel family, found on every continent except Australia and Antarctica. There are 13-14 species in total, ranging from the Asian small-clawed otter (smallest) to the giant otter and sea otter (largest).
These semi-aquatic mammals have elongated bodies, long tails, and soft, dense fur. Otters have the densest fur of any animal—as many as a million hairs per square inch. They're equipped with powerful webbed feet for swimming and seal-like abilities for holding breath underwater, with river otters able to hold their breath for up to 8 minutes.
All otters are expert hunters that eat fish, crustaceans, and other critters. Sea otters have an ingenious method to open shellfish by floating on their backs, placing a rock on their chest, then smashing mollusks against it. River otters are especially playful, gamboling on land and splashing into rivers.
Baby otters stay with their mothers until they're up to a year old, and otters can live up to 16 years. Unfortunately, many species remain at risk from pollution and habitat loss after being extensively hunted for their fur.
Otters are * charismatic members of the weasel family, found on every continent except Australia and Antarctica. * * There are 13-14 species in total, ranging from * the Asian small-clawed otter (smallest) to the giant otter and sea otter (largest).
These semi-aquatic mammals have * elongated bodies, long tails, and soft, dense fur. * Otters have the densest fur of any animal—as many as a million hairs per square inch. They’re * equipped with powerful webbed feet for swimming and seal-like abilities for holding breath underwater, with * * river otters able to hold their breath for up to 8 minutes.
* All otters are expert hunters that eat fish, crustaceans, and other critters. * Sea otters have an ingenious method to open shellfish by floating on their backs, placing a rock on their chest, then smashing mollusks against it. * River otters are especially playful, gamboling on land and splashing into rivers.
* Baby otters stay with their mothers until they’re up to a year old, and * otters can live up to 16 years. Unfortunately, * many species remain at risk from pollution and habitat loss after being extensively hunted for their fur.
- id:
chatcmpl-xxx
- model:
claude-sonnet-4-20250514
- finish_reason:
stop
- usage:
Usage(completion_tokens=551, prompt_tokens=13314, total_tokens=13865, completion_tokens_details=CompletionTokensDetailsWrapper(accepted_prediction_tokens=None, audio_tokens=None, reasoning_tokens=0, rejected_prediction_tokens=None, text_tokens=None), prompt_tokens_details=None)
Multi tool calling
We can let the model call multiple tools in sequence using the max_steps
parameter.
= Chat(model, tools=[simple_add])
chat = chat("What's ((5 + 3)+7)+11? Work step by step", return_all=True, max_steps=5)
res for r in res: display(r)
I’ll solve this step by step using the addition function.
First, let me calculate 5 + 3:
🔧 simple_add({“a”: 5, “b”: 3})
- id:
chatcmpl-xxx
- model:
claude-sonnet-4-20250514
- finish_reason:
tool_calls
- usage:
Usage(completion_tokens=96, prompt_tokens=434, total_tokens=530, completion_tokens_details=None, prompt_tokens_details=PromptTokensDetailsWrapper(audio_tokens=None, cached_tokens=0, text_tokens=None, image_tokens=None), cache_creation_input_tokens=0, cache_read_input_tokens=0)
{'tool_call_id': 'toolu_01PbJNpks9G6o9UYjRs5nd4b',
'role': 'tool',
'name': 'simple_add',
'content': '8'}
Now I’ll add 7 to that result (8 + 7):
🔧 simple_add({“a”: 8, “b”: 7})
- id:
chatcmpl-xxx
- model:
claude-sonnet-4-20250514
- finish_reason:
tool_calls
- usage:
Usage(completion_tokens=88, prompt_tokens=543, total_tokens=631, completion_tokens_details=None, prompt_tokens_details=PromptTokensDetailsWrapper(audio_tokens=None, cached_tokens=0, text_tokens=None, image_tokens=None), cache_creation_input_tokens=0, cache_read_input_tokens=0)
{'tool_call_id': 'toolu_01PAuvDwaBNgM2JbEwJwxov5',
'role': 'tool',
'name': 'simple_add',
'content': '15'}
Finally, I’ll add 11 to that result (15 + 11):
🔧 simple_add({“a”: 15, “b”: 11})
- id:
chatcmpl-xxx
- model:
claude-sonnet-4-20250514
- finish_reason:
tool_calls
- usage:
Usage(completion_tokens=89, prompt_tokens=644, total_tokens=733, completion_tokens_details=None, prompt_tokens_details=PromptTokensDetailsWrapper(audio_tokens=None, cached_tokens=0, text_tokens=None, image_tokens=None), cache_creation_input_tokens=0, cache_read_input_tokens=0)
{'tool_call_id': 'toolu_01KeELPbBSpCePrk4c8EzbyB',
'role': 'tool',
'name': 'simple_add',
'content': '26'}
So working step by step: - 5 + 3 = 8 - 8 + 7 = 15
- 15 + 11 = 26
Therefore, ((5 + 3) + 7) + 11 = 26
- id:
chatcmpl-xxx
- model:
claude-sonnet-4-20250514
- finish_reason:
stop
- usage:
Usage(completion_tokens=68, prompt_tokens=746, total_tokens=814, completion_tokens_details=None, prompt_tokens_details=PromptTokensDetailsWrapper(audio_tokens=None, cached_tokens=0, text_tokens=None, image_tokens=None), cache_creation_input_tokens=0, cache_read_input_tokens=0)
Some models support parallel tool calling. I.e. sending multiple tool call requests in one conversation step.
def multiply(a: int, b: int) -> int:
"Multiply two numbers"
return a * b
= Chat('openai/gpt-4.1', tools=[simple_add, multiply])
chat = chat("Calculate (5 + 3) * (7 + 2)", max_steps=5, return_all=True)
res for r in res: display(r)
🔧 simple_add({“a”: 5, “b”: 3})
🔧 simple_add({“a”: 7, “b”: 2})
- id:
chatcmpl-xxx
- model:
gpt-4.1-2025-04-14
- finish_reason:
tool_calls
- usage:
Usage(completion_tokens=52, prompt_tokens=110, total_tokens=162, completion_tokens_details=CompletionTokensDetailsWrapper(accepted_prediction_tokens=0, audio_tokens=0, reasoning_tokens=0, rejected_prediction_tokens=0, text_tokens=None), prompt_tokens_details=PromptTokensDetailsWrapper(audio_tokens=0, cached_tokens=0, text_tokens=None, image_tokens=None))
{'tool_call_id': 'call_8ZH9yZH3N9TEmaIOmV6H81mU',
'role': 'tool',
'name': 'simple_add',
'content': '8'}
{'tool_call_id': 'call_uDxT76wyUI8WY9ugRPwCnlb9',
'role': 'tool',
'name': 'simple_add',
'content': '9'}
🔧 multiply({“a”:8,“b”:9})
- id:
chatcmpl-xxx
- model:
gpt-4.1-2025-04-14
- finish_reason:
tool_calls
- usage:
Usage(completion_tokens=17, prompt_tokens=178, total_tokens=195, completion_tokens_details=CompletionTokensDetailsWrapper(accepted_prediction_tokens=0, audio_tokens=0, reasoning_tokens=0, rejected_prediction_tokens=0, text_tokens=None), prompt_tokens_details=PromptTokensDetailsWrapper(audio_tokens=0, cached_tokens=0, text_tokens=None, image_tokens=None))
{'tool_call_id': 'call_Dm0jjJPQf2elRy4z7yBhOMf5',
'role': 'tool',
'name': 'multiply',
'content': '72'}
(5 + 3) = 8 and (7 + 2) = 9. Multiplying them together: 8 × 9 = 72.
So, (5 + 3) * (7 + 2) = 72.
- id:
chatcmpl-xxx
- model:
gpt-4.1-2025-04-14
- finish_reason:
stop
- usage:
Usage(completion_tokens=55, prompt_tokens=203, total_tokens=258, completion_tokens_details=CompletionTokensDetailsWrapper(accepted_prediction_tokens=0, audio_tokens=0, reasoning_tokens=0, rejected_prediction_tokens=0, text_tokens=None), prompt_tokens_details=PromptTokensDetailsWrapper(audio_tokens=0, cached_tokens=0, text_tokens=None, image_tokens=None))
See it did the additions in one go!
We don’t want the model to keep running tools indefinitely. Lets showcase how we can force thee model to stop after our specified number of toolcall rounds:
def divide(a: int, b: int) -> float:
"Divide two numbers"
return a / b
= Chat(model, tools=[simple_add, multiply, divide])
chat = chat("Calculate ((10 + 5) * 3) / (2 + 1) step by step.",
res =3, return_all=True,
max_steps="Please wrap-up for now and summarize how far we got.")
final_promptfor r in res: display(r)
I’ll calculate this step by step using the available functions, following the order of operations (parentheses first, then multiplication/division from left to right).
Step 1: Calculate (10 + 5)
🔧 simple_add({“a”: 10, “b”: 5})
- id:
chatcmpl-xxx
- model:
claude-sonnet-4-20250514
- finish_reason:
tool_calls
- usage:
Usage(completion_tokens=116, prompt_tokens=609, total_tokens=725, completion_tokens_details=None, prompt_tokens_details=PromptTokensDetailsWrapper(audio_tokens=None, cached_tokens=0, text_tokens=None, image_tokens=None), cache_creation_input_tokens=0, cache_read_input_tokens=0)
{'tool_call_id': 'toolu_013mAGGxvDDy3ftqSS3u2qPh',
'role': 'tool',
'name': 'simple_add',
'content': '15'}
Step 2: Calculate (2 + 1)
🔧 simple_add({“a”: 2, “b”: 1})
- id:
chatcmpl-xxx
- model:
claude-sonnet-4-20250514
- finish_reason:
tool_calls
- usage:
Usage(completion_tokens=83, prompt_tokens=738, total_tokens=821, completion_tokens_details=None, prompt_tokens_details=PromptTokensDetailsWrapper(audio_tokens=None, cached_tokens=0, text_tokens=None, image_tokens=None), cache_creation_input_tokens=0, cache_read_input_tokens=0)
{'tool_call_id': 'toolu_019KZ7aFBcAoHieWEFX6hYwA',
'role': 'tool',
'name': 'simple_add',
'content': '3'}
Step 3: Multiply 15 * 3
- id:
chatcmpl-xxx
- model:
claude-sonnet-4-20250514
- finish_reason:
stop
- usage:
Usage(completion_tokens=22, prompt_tokens=848, total_tokens=870, completion_tokens_details=None, prompt_tokens_details=PromptTokensDetailsWrapper(audio_tokens=None, cached_tokens=0, text_tokens=None, image_tokens=None), cache_creation_input_tokens=0, cache_read_input_tokens=0)
Tool call exhaustian
= "What is 1+2, and then the result of adding +2, and then +3 to it? Use tools to calculate!"
pr = Chat(model, tools=[simple_add]) c
= c(pr, max_steps=2)
res res
I was only able to complete the first calculation: 1 + 2 = 3.
To complete your request, I would need to make two additional tool calls: 1. Add 2 to the result (3 + 2 = 5) 2. Add 3 to that result (5 + 3 = 8)
The final answer should be 8, but I wasn’t able to use the tools for all the steps. Would you like me to continue with the remaining calculations if you provide more tool uses?
- id:
chatcmpl-xxx
- model:
claude-sonnet-4-20250514
- finish_reason:
stop
- usage:
Usage(completion_tokens=123, prompt_tokens=584, total_tokens=707, completion_tokens_details=None, prompt_tokens_details=PromptTokensDetailsWrapper(audio_tokens=None, cached_tokens=0, text_tokens=None, image_tokens=None), cache_creation_input_tokens=0, cache_read_input_tokens=0)
assert c.hist[-2]['content'] == _final_prompt
Async
If you want to use LiteLLM in a webapp you probably want to use their async function acompletion
. To make that easier we will implement our version of AsyncChat
to complement it. It follows the same implementation as Chat as much as possible:
astream_with_complete
astream_with_complete (agen, postproc=<function noop>)
AsyncChat
AsyncChat (model:str, sp='', temp=0, search=False, tools:list=None, hist:list=None, ns:Optional[dict]=None, cache=False)
LiteLLM chat client.
Type | Default | Details | |
---|---|---|---|
model | str | LiteLLM compatible model name | |
sp | str | System prompt | |
temp | int | 0 | Temperature |
search | bool | False | Search (l,m,h), if model supports it |
tools | list | None | Add tools |
hist | list | None | Chat history |
ns | Optional | None | Custom namespace for tool calling |
cache | bool | False | Anthropic prompt caching |
Examples
Basic example
= AsyncChat(model)
chat await chat("What is 2+2?")
2 + 2 = 4
- id:
chatcmpl-xxx
- model:
claude-sonnet-4-20250514
- finish_reason:
stop
- usage:
Usage(completion_tokens=13, prompt_tokens=14, total_tokens=27, completion_tokens_details=None, prompt_tokens_details=PromptTokensDetailsWrapper(audio_tokens=None, cached_tokens=0, text_tokens=None, image_tokens=None), cache_creation_input_tokens=0, cache_read_input_tokens=0)
With tool calls
async def async_add(a: int, b: int) -> int:
"Add two numbers asynchronously"
await asyncio.sleep(0.1)
return a + b
= AsyncChat(model, tools=[async_add])
chat_with_tools = await chat_with_tools("What is 5 + 7? Use the tool to calculate it.", return_all=True)
res async for r in res: display(r)
I’ll use the async_add tool to calculate 5 + 7 for you.
🔧 async_add({“a”: 5, “b”: 7})
- id:
chatcmpl-xxx
- model:
claude-sonnet-4-20250514
- finish_reason:
tool_calls
- usage:
Usage(completion_tokens=91, prompt_tokens=424, total_tokens=515, completion_tokens_details=None, prompt_tokens_details=PromptTokensDetailsWrapper(audio_tokens=None, cached_tokens=0, text_tokens=None, image_tokens=None), cache_creation_input_tokens=0, cache_read_input_tokens=0)
{'tool_call_id': 'toolu_01X1GDcGawNLq77xytxEGDBw',
'role': 'tool',
'name': 'async_add',
'content': '12'}
The calculation is complete! Using the async_add tool, I calculated that 5 + 7 = 12.
- id:
chatcmpl-xxx
- model:
claude-sonnet-4-20250514
- finish_reason:
stop
- usage:
Usage(completion_tokens=30, prompt_tokens=568, total_tokens=598, completion_tokens_details=None, prompt_tokens_details=PromptTokensDetailsWrapper(audio_tokens=None, cached_tokens=0, text_tokens=None, image_tokens=None), cache_creation_input_tokens=0, cache_read_input_tokens=0)
chat.hist
[{'role': 'user', 'content': 'What is 2+2?'},
Message(content='2 + 2 = 4', role='assistant', tool_calls=None, function_call=None, provider_specific_fields={'citations': None, 'thinking_blocks': None})]
Async Streaming Display
This is what our outputs look like with streaming results:
= AsyncChat(model, tools=[async_add])
chat_with_tools = await chat_with_tools("What is 5 + 7? Use the tool to calculate it.", stream=True)
res async for o in res:
if isinstance(o,ModelResponseStream): print(delta_text(o) or '',end='')
elif isinstance(o,dict): print(o)
I'll use the async_add tool to calculate 5 + 7 for you.
🔧 async_add
{'tool_call_id': 'toolu_01NTrwnCZYd6K7ZeNQtUyx9k', 'role': 'tool', 'name': 'async_add', 'content': '12'}
The calculation is complete! Using the async_add tool, I calculated that 5 + 7 = 12.
We use this one quite a bit so we want to provide some utilities to better format these outputs:
aformat_stream
aformat_stream (rs)
Format the response stream for markdown display.
adisplay_stream
adisplay_stream (rs)
Use IPython.display to markdown display the response stream.
Streaming examples
Now we can demonstrate AsynChat
with stream=True
!
Tool call
= AsyncChat(model, tools=[async_add])
chat = await chat("What is 5 + 7? Use the tool to calculate it.", stream=True)
res await adisplay_stream(res)
async_add({"a": 5, "b": 7})
- 12
The calculation is complete! Using the async_add tool, I calculated that 5 + 7 = 12.
Thinking tool call
= AsyncChat(model)
chat = await chat("Briefly, what's the most efficient way to sort a list of 1000 random integers?", think='l',stream=True)
res await adisplay_stream(res)
🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠
For 1000 random integers, the most efficient approach is to use your programming language’s built-in sort function (like sort()
in Python, Arrays.sort()
in Java, or std::sort()
in C++).
These implementations typically use highly optimized hybrid algorithms like: - Introsort (introspective sort) - starts with quicksort, switches to heapsort for worst-case scenarios - Timsort (Python) - optimized for real-world data patterns
If implementing from scratch, quicksort would be most efficient for random data at this size, with O(n log n) average performance.
For integers specifically, radix sort could also be very fast (O(kn) where k is the number of digits), but built-in functions are still usually the best choice due to their optimization and testing.
Multiple tool calls
1] chat.hist[
Message(content="I'll calculate ((10 + 5) * 3) / (2 + 1) step by step using the available functions.\n\n**Step 1: Calculate (10 + 5)**", role='assistant', tool_calls=[{'function': {'arguments': '{"a": 10, "b": 5}', 'name': 'simple_add'}, 'id': 'toolu_017SE1tgPMBDtS5xLLMGLg9b', 'type': 'function'}], function_call=None, provider_specific_fields=None)
2] chat.hist[
{'tool_call_id': 'toolu_017SE1tgPMBDtS5xLLMGLg9b',
'role': 'tool',
'name': 'simple_add',
'content': '15'}
3] chat.hist[
Message(content='**Step 2: Calculate (2 + 1)**', role='assistant', tool_calls=[{'function': {'arguments': '{"a": 2, "b": 1}', 'name': 'simple_add'}, 'id': 'toolu_013N4boUAcDB1X2h9aNNivug', 'type': 'function'}], function_call=None, provider_specific_fields=None)
Search
= AsyncChat(model, search='l')
chat_stream_tools = await chat_stream_tools("Search the web and tell me very briefly about otters", stream=True)
res await adisplay_stream(res)
Otters are * charismatic members of the weasel family, found on every continent except Australia and Antarctica. * * There are 13-14 species in total, ranging from * the Asian small-clawed otter (smallest) to the giant otter and sea otter (largest).
These semi-aquatic mammals have * elongated bodies, long tails, and soft, dense fur. * Otters have the densest fur of any animal—as many as a million hairs per square inch. They’re * equipped with powerful webbed feet for swimming and seal-like abilities for holding breath underwater, with * * river otters able to hold their breath for up to 8 minutes.
* All otters are expert hunters that eat fish, crustaceans, and other critters. * Sea otters have an ingenious method to open shellfish by floating on their backs, placing a rock on their chest, then smashing mollusks against it. * River otters are especially playful, gamboling on land and splashing into rivers.
* Baby otters stay with their mothers until they’re up to a year old, and * otters can live up to 16 years. Unfortunately, * many species remain at risk from pollution and habitat loss after being extensively hunted for their fur.