patch_litellm()Core
LiteLLM
Deterministic outputs
LiteLLM ModelResponse(Stream) objects have id and created_at fields that are generated dynamically. Even when we use cachy to cache the LLM response these dynamic fields create diffs which makes code review more challenging. The patches below ensure that id and created_at fields are fixed and won’t generate diffs.
patch_litellm
def patch_litellm(
seed:int=0
):
Patch litellm.ModelResponseBase such that id and created are fixed.
Completion
LiteLLM provides an convenient unified interface for most big LLM providers. Because it’s so useful to be able to switch LLM providers with just one argument. We want to make it even easier to by adding some more convenience functions and classes.
This is very similar to our other wrapper libraries for popular AI providers: claudette (Anthropic), gaspard (Gemini), cosette (OpenAI).
# litellm._turn_on_debug()ms = ["gemini/gemini-3-pro-preview", "gemini/gemini-3-flash-preview", sonn, "openai/gpt-5.4"]
model = ms[2]We’ll add a little shortcut to make examples and testing easier here:
def c(msgs, m=model, **kw):
msgs = [msgs] if isinstance(msgs,dict) else listify(msgs)
return completion(m, msgs, **kw)
def _display(*o):
for x in o:
if isinstance(x, (list,tuple)): _display(*x)
elif isinstance(x, dict): display({k:str(v)[:100] for k,v in x.items()})
else: display(x)msg = [{'role':'user','content':'Hey there!', 'cache_control': {'type': 'ephemeral'}}]
for m in ms:
display(Markdown(f'**{m}:**'))
display(completion(m,msg))gemini/gemini-3-pro-preview:
Hey there! How can I help you today?
- id:
chatcmpl-xxx - model:
gemini-3-pro-preview - finish_reason:
stop - usage:
Usage(completion_tokens=223, prompt_tokens=4, total_tokens=227, completion_tokens_details=CompletionTokensDetailsWrapper(accepted_prediction_tokens=None, audio_tokens=None, reasoning_tokens=213, rejected_prediction_tokens=None, text_tokens=10, image_tokens=None, video_tokens=None), prompt_tokens_details=PromptTokensDetailsWrapper(audio_tokens=None, cached_tokens=None, text_tokens=4, image_tokens=None, video_tokens=None), cache_read_input_tokens=None)
gemini/gemini-3-flash-preview:
Hello! How can I help you today?
- id:
chatcmpl-xxx - model:
gemini-3-flash-preview - finish_reason:
stop - usage:
Usage(completion_tokens=50, prompt_tokens=4, total_tokens=54, completion_tokens_details=CompletionTokensDetailsWrapper(accepted_prediction_tokens=None, audio_tokens=None, reasoning_tokens=41, rejected_prediction_tokens=None, text_tokens=9, image_tokens=None, video_tokens=None), prompt_tokens_details=PromptTokensDetailsWrapper(audio_tokens=None, cached_tokens=None, text_tokens=4, image_tokens=None, video_tokens=None), cache_read_input_tokens=None)
claude-sonnet-4-6:
Hey there! How’s it going? What can I help you with today? 😊
- id:
chatcmpl-xxx - model:
claude-sonnet-4-6 - finish_reason:
stop - usage:
Usage(completion_tokens=22, prompt_tokens=10, total_tokens=32, completion_tokens_details=CompletionTokensDetailsWrapper(accepted_prediction_tokens=None, audio_tokens=None, reasoning_tokens=0, rejected_prediction_tokens=None, text_tokens=22, image_tokens=None, video_tokens=None), prompt_tokens_details=PromptTokensDetailsWrapper(audio_tokens=None, cached_tokens=0, text_tokens=10, image_tokens=None, video_tokens=None, cache_creation_tokens=0, cache_creation_token_details=CacheCreationTokenDetails(ephemeral_5m_input_tokens=0, ephemeral_1h_input_tokens=0)), cache_creation_input_tokens=0, cache_read_input_tokens=0, inference_geo='global', speed=None)
openai/gpt-5.4:
Hey! How can I help?
- id:
chatcmpl-xxx - model:
gpt-5.4-2026-03-05 - finish_reason:
stop - usage:
Usage(completion_tokens=10, prompt_tokens=9, total_tokens=19, completion_tokens_details=CompletionTokensDetailsWrapper(accepted_prediction_tokens=0, audio_tokens=0, reasoning_tokens=0, rejected_prediction_tokens=0, text_tokens=None, image_tokens=None, video_tokens=None), prompt_tokens_details=PromptTokensDetailsWrapper(audio_tokens=0, cached_tokens=0, text_tokens=None, image_tokens=None, video_tokens=None))
completion(gpt54, msg)Hey! How can I help?
- id:
chatcmpl-xxx - model:
gpt-5.4 - finish_reason:
stop - usage:
Usage(completion_tokens=11, prompt_tokens=9, total_tokens=20, completion_tokens_details=CompletionTokensDetailsWrapper(accepted_prediction_tokens=None, audio_tokens=None, reasoning_tokens=0, rejected_prediction_tokens=None, text_tokens=None, image_tokens=None, video_tokens=None), prompt_tokens_details=PromptTokensDetailsWrapper(audio_tokens=None, cached_tokens=0, text_tokens=None, image_tokens=None, video_tokens=None))
completion(gpt54m, msg)Hey there! How can I help you today?
- id:
chatcmpl-xxx - model:
gpt-5.4-mini - finish_reason:
stop - usage:
Usage(completion_tokens=14, prompt_tokens=9, total_tokens=23, completion_tokens_details=CompletionTokensDetailsWrapper(accepted_prediction_tokens=None, audio_tokens=None, reasoning_tokens=0, rejected_prediction_tokens=None, text_tokens=None, image_tokens=None, video_tokens=None), prompt_tokens_details=PromptTokensDetailsWrapper(audio_tokens=None, cached_tokens=0, text_tokens=None, image_tokens=None, video_tokens=None))
Generated images are also displayed (not shown here to conserve filesize):
# completion(model='gemini/gemini-2.5-flash-image', messages=[{'role':'user','content':'Draw a simple sketch of a cat'}])FireworksAIConfig.get_provider_info
def get_provider_info(
model
):
Default values all models of this provider support.
Messages formatting
Let’s start with making it easier to pass messages into litellm’s completion function (including images, and pdf files).
If msg has tool_calls, cache_control is added to the last tool call (required since LiteLLM strips it from empty content blocks), otherwise to the content.
stop_reason
def stop_reason(
r
):
Call self as a function.
contents
def contents(
r
):
Get message object from response r.
remove_cache_ckpts
def remove_cache_ckpts(
msg
):
remove cache checkpoints and return msg.
Test with regular content message:
msg_content = {'role': 'user', 'content': [{'type': 'text', 'text': 'hello'}]}
_add_cache_control(msg_content)
test_eq(msg_content['content'][-1].get('cache_control'), {'type': 'ephemeral'})
test_eq(_has_cache(msg_content), True)
remove_cache_ckpts(msg_content)
test_eq(_has_cache(msg_content), False)Test with assistant message with tool_calls:
tcs = [
{'id': 'tc1', 'type': 'function', 'function': {'name': 'test', 'arguments': '{}'}},
{'id': 'tc2', 'type': 'function', 'function': {'name': 'test', 'arguments': '{}'}}
]
msg_tool = {'role': 'assistant', 'content': '', 'tool_calls': tcs}
_add_cache_control(msg_tool)
test_eq(msg_tool['tool_calls'][-1].get('cache_control'), {'type': 'ephemeral'})
test_eq('cache_control' not in msg_tool.get('content', [{}])[-1] if msg_tool.get('content') else True, True) # no cache in content
test_eq(_has_cache(msg_tool), True)
remove_cache_ckpts(msg_tool)
test_eq(_has_cache(msg_tool), False)Test with ChatCompletionMessageToolCall tool call object:
tcs =[
ChatCompletionMessageToolCall(id='tc1', type='function', function=Function(name='test', arguments='{}')),
ChatCompletionMessageToolCall(id='tc2', type='function', function=Function(name='test', arguments='{}'))
]
msg_tc_obj = {'role': 'assistant', 'content': '', 'tool_calls': tcs}
_add_cache_control(msg_tc_obj)
test_eq(getattr(msg_tc_obj['tool_calls'][-1], 'cache_control', None), {'type': 'ephemeral'})
test_eq(_has_cache(msg_tc_obj), True)
remove_cache_ckpts(msg_tc_obj)
test_eq(_has_cache(msg_tc_obj), False)mk_msg
def mk_msg(
content, # Content: str, bytes (image), list of mixed content, or dict w 'role' and 'content' fields
role:str='user', # Message role if content isn't already a dict/Message
cache:bool=False, # Enable Anthropic caching
ttl:NoneType=None, # Cache TTL: '5m' (default) or '1h'
):
Create a LiteLLM compatible message.
Now we can use mk_msg to create different types of messages.
Simple text:
msg = mk_msg("hey")
msg{'role': 'user', 'content': 'hey'}
Which can be passed to litellm’s completion function like this:
res = completion(model, [msg])
resHey! How’s it going? What’s on your mind? 😊
- id:
chatcmpl-xxx - model:
claude-sonnet-4-6 - finish_reason:
stop - usage:
Usage(completion_tokens=19, prompt_tokens=8, total_tokens=27, completion_tokens_details=CompletionTokensDetailsWrapper(accepted_prediction_tokens=None, audio_tokens=None, reasoning_tokens=0, rejected_prediction_tokens=None, text_tokens=19, image_tokens=None, video_tokens=None), prompt_tokens_details=PromptTokensDetailsWrapper(audio_tokens=None, cached_tokens=0, text_tokens=8, image_tokens=None, video_tokens=None, cache_creation_tokens=0, cache_creation_token_details=CacheCreationTokenDetails(ephemeral_5m_input_tokens=0, ephemeral_1h_input_tokens=0)), cache_creation_input_tokens=0, cache_read_input_tokens=0, inference_geo='global', speed=None)
c(msg)Hey! How’s it going? What’s on your mind? 😊
- id:
chatcmpl-xxx - model:
claude-sonnet-4-6 - finish_reason:
stop - usage:
Usage(completion_tokens=19, prompt_tokens=8, total_tokens=27, completion_tokens_details=CompletionTokensDetailsWrapper(accepted_prediction_tokens=None, audio_tokens=None, reasoning_tokens=0, rejected_prediction_tokens=None, text_tokens=19, image_tokens=None, video_tokens=None), prompt_tokens_details=PromptTokensDetailsWrapper(audio_tokens=None, cached_tokens=0, text_tokens=8, image_tokens=None, video_tokens=None, cache_creation_tokens=0, cache_creation_token_details=CacheCreationTokenDetails(ephemeral_5m_input_tokens=0, ephemeral_1h_input_tokens=0)), cache_creation_input_tokens=0, cache_read_input_tokens=0, inference_geo='global', speed=None)
Lists w just one string element are flattened for conciseness:
test_eq(mk_msg("hey"), mk_msg(["hey"]))(LiteLLM ignores these fields when sent to other providers)
Text and images:
img_fn = Path('samples/puppy.jpg')
Image(filename=img_fn, width=200)
msg = mk_msg(['hey what in this image?',img_fn.read_bytes()])
print(json.dumps(msg,indent=1)[:200]+"..."){
"role": "user",
"content": [
{
"type": "text",
"text": "hey what in this image?"
},
{
"type": "image_url",
"image_url": "data:image/jpeg;base64,/9j/4AAQSkZJRgABAQAAAQABAAD/4gxUSU...
c(msg)Image Description
The image shows a Cavalier King Charles Spaniel puppy 🐶
Details:
- The puppy has the breed’s characteristic brown and white coloring
- It’s lying in the grass, looking directly at the camera
- There are purple/lavender flowers (likely asters) beside it
- The setting appears to be a garden
- The puppy has the breed’s typical large, soulful dark eyes and floppy ears
It’s an absolutely adorable photo! 🌸🐾
- id:
chatcmpl-xxx - model:
claude-sonnet-4-6 - finish_reason:
stop - usage:
Usage(completion_tokens=136, prompt_tokens=104, total_tokens=240, completion_tokens_details=CompletionTokensDetailsWrapper(accepted_prediction_tokens=None, audio_tokens=None, reasoning_tokens=0, rejected_prediction_tokens=None, text_tokens=136, image_tokens=None, video_tokens=None), prompt_tokens_details=PromptTokensDetailsWrapper(audio_tokens=None, cached_tokens=0, text_tokens=104, image_tokens=None, video_tokens=None, cache_creation_tokens=0, cache_creation_token_details=CacheCreationTokenDetails(ephemeral_5m_input_tokens=0, ephemeral_1h_input_tokens=0)), cache_creation_input_tokens=0, cache_read_input_tokens=0, inference_geo='global', speed=None)
Let’s also demonstrate this for PDFs
pdf_fn = Path('samples/solveit.pdf')
msg = mk_msg(['Who is the author of this pdf?', pdf_fn.read_bytes()])
c(msg)The author of this PDF is Jeremy Howard, co-founder of fast.ai. He introduces himself in the document with “Hi, I’m Jeremy Howard, from fast.ai.”
- id:
chatcmpl-xxx - model:
claude-sonnet-4-6 - finish_reason:
stop - usage:
Usage(completion_tokens=43, prompt_tokens=1611, total_tokens=1654, completion_tokens_details=CompletionTokensDetailsWrapper(accepted_prediction_tokens=None, audio_tokens=None, reasoning_tokens=0, rejected_prediction_tokens=None, text_tokens=43, image_tokens=None, video_tokens=None), prompt_tokens_details=PromptTokensDetailsWrapper(audio_tokens=None, cached_tokens=0, text_tokens=1611, image_tokens=None, video_tokens=None, cache_creation_tokens=0, cache_creation_token_details=CacheCreationTokenDetails(ephemeral_5m_input_tokens=0, ephemeral_1h_input_tokens=0)), cache_creation_input_tokens=0, cache_read_input_tokens=0, inference_geo='global', speed=None)
Some models like Gemini support audio and video:
wav_data = httpx.get("https://openaiassets.blob.core.windows.net/$web/API/docs/audio/alloy.wav").content
# Audio(wav_data) # uncomment to previewmsg = mk_msg(['What is this audio saying?', wav_data])
completion(ms[1], [msg])The audio is saying: “The sun rises in the east and sets in the west. This simple fact has been observed by humans for thousands of years.”
- id:
chatcmpl-xxx - model:
gemini-3-flash-preview - finish_reason:
stop - usage:
Usage(completion_tokens=124, prompt_tokens=181, total_tokens=305, completion_tokens_details=CompletionTokensDetailsWrapper(accepted_prediction_tokens=None, audio_tokens=None, reasoning_tokens=93, rejected_prediction_tokens=None, text_tokens=31, image_tokens=None, video_tokens=None), prompt_tokens_details=PromptTokensDetailsWrapper(audio_tokens=174, cached_tokens=None, text_tokens=7, image_tokens=None, video_tokens=None), cache_read_input_tokens=None)
vid_data = httpx.get("https://storage.googleapis.com/github-repo/img/gemini/multimodality_usecases_overview/pixel8.mp4").contentmsg = mk_msg(['Concisely, what is happening in this video?', vid_data])
completion(ms[1], [msg])Photographer Saeka Shimada explores the streets of Tokyo at night, demonstrating the Google Pixel 8 Pro’s “Video Boost” and “Night Sight” features to capture high-quality, vibrant footage and photos in low-light environments.
- id:
chatcmpl-xxx - model:
gemini-3-flash-preview - finish_reason:
stop - usage:
Usage(completion_tokens=521, prompt_tokens=5205, total_tokens=5726, completion_tokens_details=CompletionTokensDetailsWrapper(accepted_prediction_tokens=None, audio_tokens=None, reasoning_tokens=472, rejected_prediction_tokens=None, text_tokens=49, image_tokens=None, video_tokens=None), prompt_tokens_details=PromptTokensDetailsWrapper(audio_tokens=None, cached_tokens=None, text_tokens=12, image_tokens=None, video_tokens=5193), cache_read_input_tokens=None)
Caching
Some providers such as Anthropic require manually opting into caching. Let’s try it:
def cpr(i): return f'{i} '*1024 + 'This is a caching test. Report back only what number you see repeated above.'disable_cachy()# msg = mk_msg(cpr(1), cache=True)
# res = c(msg, ms[2])
# resAnthropic has a maximum of 4 cache checkpoints, so we remove previous ones as we go:
# res = c([remove_cache_ckpts(msg), mk_msg(res), mk_msg(cpr(2), cache=True)], ms[2])
# resWe see that the first message was cached, and this extra message has been written to cache:
# res.usage.prompt_tokens_detailsWe can add a bunch of large messages in a loop to see how the number of cached tokens used grows.
We do this for 25 times to ensure it still works for more than >20 content blocks, which is a known anthropic issue.
The code below is commented by default, because it’s slow. Please uncomment when working on caching.
# h = []
# msg = mk_msg(cpr(1), cache=True)
# for o in range(2,25):
# h += [remove_cache_ckpts(msg), mk_msg(res)]
# msg = mk_msg(cpr(o), cache=True)
# res = c(h+[msg])
# detls = res.usage.prompt_tokens_details
# print(o, detls.cached_tokens, detls.cache_creation_tokens, end='; ')enable_cachy(debug=cachy_debug)Reconstructing formatted outputs
Lisette can call multiple tools in a loop. Further down this notebook, we’ll provide convenience functions for formatting such a sequence of toolcalls and responses into one formatted output string.
For now, we’ll show an example and show how to transform such a formatted output string back into a valid LiteLLM history.
fmt_outp = '''
I'll solve this step-by-step, using parallel calls where possible.
<details class='tool-usage-details' markdown='1'>
```json
{
"id": "toolu_01KjnQH2Nsz2viQ7XYpLW3Ta",
"call": { "function": "simple_add", "arguments": { "a": 10, "b": 5 } },
"result": "15"
}
```
</details>
<details class='tool-usage-details' markdown='1'>
```json
{
"id": "toolu_01Koi2EZrGZsBbnQ13wuuvzY",
"call": { "function": "simple_add", "arguments": { "a": 2, "b": 1 } },
"result": "3"
}
```
</details>
Now I need to multiply 15 * 3 before I can do the final division:
<details class='tool-usage-details' markdown='1'>
```json
{
"id": "toolu_0141NRaWUjmGtwxZjWkyiq6C",
"call": { "function": "multiply", "arguments": { "a": 15, "b": 3 } },
"result": "45"
}
```
</details>
<details class='token-usage-details' markdown='1'><summary>Cache hit: 81.8% | Tokens: total=23,276 input=23,158 (+18,910 cached, 0 new) output=118 (reasoning 23)</summary>
`Usage(completion_tokens=118, prompt_tokens=23158, total_tokens=23276, completion_tokens_details=CompletionTokensDetailsWrapper(accepted_prediction_tokens=None, audio_tokens=None, reasoning_tokens=23, rejected_prediction_tokens=None, text_tokens=None, image_tokens=None), prompt_tokens_details=PromptTokensDetailsWrapper(audio_tokens=None, cached_tokens=18910, text_tokens=None, image_tokens=None, cache_creation_tokens=0), cache_creation_input_tokens=0, cache_read_input_tokens=18910)`
</details>
'''We can split into chunks of (text,toolstr,json):
sp = re_tools.split(fmt_outp)
for o in list(chunked(sp, 3, pad=True)): print('- ', o)- ["\nI'll solve this step-by-step, using parallel calls where possible.\n\n", '<details class=\'tool-usage-details\' markdown=\'1\'>\n\n```json\n{\n "id": "toolu_01KjnQH2Nsz2viQ7XYpLW3Ta",\n "call": { "function": "simple_add", "arguments": { "a": 10, "b": 5 } },\n "result": "15"\n}\n```\n\n</details>', None]
- ['{\n "id": "toolu_01KjnQH2Nsz2viQ7XYpLW3Ta",\n "call": { "function": "simple_add", "arguments": { "a": 10, "b": 5 } },\n "result": "15"\n}', '\n\n', '<details class=\'tool-usage-details\' markdown=\'1\'>\n\n```json\n{\n "id": "toolu_01Koi2EZrGZsBbnQ13wuuvzY",\n "call": { "function": "simple_add", "arguments": { "a": 2, "b": 1 } },\n "result": "3"\n}\n```\n\n</details>']
- [None, '{\n "id": "toolu_01Koi2EZrGZsBbnQ13wuuvzY",\n "call": { "function": "simple_add", "arguments": { "a": 2, "b": 1 } },\n "result": "3"\n}', '\n\nNow I need to multiply 15 * 3 before I can do the final division:\n\n']
- ['<details class=\'tool-usage-details\' markdown=\'1\'>\n\n```json\n{\n "id": "toolu_0141NRaWUjmGtwxZjWkyiq6C",\n "call": { "function": "multiply", "arguments": { "a": 15, "b": 3 } },\n "result": "45"\n}\n```\n\n</details>', None, '{\n "id": "toolu_0141NRaWUjmGtwxZjWkyiq6C",\n "call": { "function": "multiply", "arguments": { "a": 15, "b": 3 } },\n "result": "45"\n}']
- ["\n\n<details class='token-usage-details' markdown='1'><summary>Cache hit: 81.8% | Tokens: total=23,276 input=23,158 (+18,910 cached, 0 new) output=118 (reasoning 23)</summary>\n\n`Usage(completion_tokens=118, prompt_tokens=23158, total_tokens=23276, completion_tokens_details=CompletionTokensDetailsWrapper(accepted_prediction_tokens=None, audio_tokens=None, reasoning_tokens=23, rejected_prediction_tokens=None, text_tokens=None, image_tokens=None), prompt_tokens_details=PromptTokensDetailsWrapper(audio_tokens=None, cached_tokens=18910, text_tokens=None, image_tokens=None, cache_creation_tokens=0), cache_creation_input_tokens=0, cache_read_input_tokens=18910)`\n\n</details>\n", None, None]
fmt2hist
def fmt2hist(
outp:str
)->list:
Transform a formatted output into a LiteLLM compatible history
split_tools
def split_tools(
s
):
Split formatted output into (text, summary, tooljson) chunks
See how we can turn that one formatted output string back into a list of Messages:
from pprint import pprinth = fmt2hist(fmt_outp)
pprint(h)[Message(content="I'll solve this step-by-step, using parallel calls where possible.", role='assistant', tool_calls=[ChatCompletionMessageToolCall(function=Function(arguments='{"a": 10, "b": 5}', name='simple_add'), id='toolu_01KjnQH2Nsz2viQ7XYpLW3Ta', type='function')], function_call=None, provider_specific_fields=None),
{'content': '15',
'name': 'simple_add',
'role': 'tool',
'tool_call_id': 'toolu_01KjnQH2Nsz2viQ7XYpLW3Ta'},
Message(content='', role='assistant', tool_calls=[ChatCompletionMessageToolCall(function=Function(arguments='{"a": 2, "b": 1}', name='simple_add'), id='toolu_01Koi2EZrGZsBbnQ13wuuvzY', type='function')], function_call=None, provider_specific_fields=None),
{'content': '3',
'name': 'simple_add',
'role': 'tool',
'tool_call_id': 'toolu_01Koi2EZrGZsBbnQ13wuuvzY'},
Message(content='Now I need to multiply 15 * 3 before I can do the final division:', role='assistant', tool_calls=[ChatCompletionMessageToolCall(function=Function(arguments='{"a": 15, "b": 3}', name='multiply'), id='toolu_0141NRaWUjmGtwxZjWkyiq6C', type='function')], function_call=None, provider_specific_fields=None),
{'content': '45',
'name': 'multiply',
'role': 'tool',
'tool_call_id': 'toolu_0141NRaWUjmGtwxZjWkyiq6C'},
Message(content='.', role='assistant', tool_calls=None, function_call=None, provider_specific_fields=None)]
mk_msgs
We will skip tool use blocks and tool results during caching
Now lets make it easy to provide entire conversations:
mk_msgs
def mk_msgs(
msgs, # List of messages (each: str, bytes, list, or dict w 'role' and 'content' fields)
cache:bool=False, # Enable Anthropic caching
cache_idxs:list=[-1], # Cache breakpoint idxs
ttl:NoneType=None, # Cache TTL: '5m' (default) or '1h'
):
Create a list of LiteLLM compatible messages.
With mk_msgs you can easily provide a whole conversation:
msgs = mk_msgs(['Hey!',"Hi there!","How are you?","I'm doing fine and you?"])
msgs[{'role': 'user', 'content': 'Hey!'},
{'role': 'assistant', 'content': 'Hi there!'},
{'role': 'user', 'content': 'How are you?'},
{'role': 'assistant', 'content': "I'm doing fine and you?"}]
By default the last message will be cached when cache=True:
msgs = mk_msgs(['Hey!',"Hi there!","How are you?","I'm doing fine and you?"], cache=True)
msgs[{'role': 'user', 'content': 'Hey!'},
{'role': 'assistant', 'content': 'Hi there!'},
{'role': 'user', 'content': 'How are you?'},
{'role': 'assistant',
'content': [{'type': 'text',
'text': "I'm doing fine and you?",
'cache_control': {'type': 'ephemeral'}}]}]
test_eq('cache_control' in msgs[-1]['content'][0], True)Alternatively, users can provide custom cache_idxs. Tool call blocks and results are skipped during caching:
msgs = mk_msgs(['Hello!','Hi! How can I help you?','Call some functions!',fmt_outp], cache=True, cache_idxs=[0,-2,-1])
msgs[{'role': 'user',
'content': [{'type': 'text',
'text': 'Hello!',
'cache_control': {'type': 'ephemeral'}}]},
{'role': 'assistant', 'content': 'Hi! How can I help you?'},
{'role': 'user', 'content': 'Call some functions!'},
Message(content="I'll solve this step-by-step, using parallel calls where possible.", role='assistant', tool_calls=[ChatCompletionMessageToolCall(function=Function(arguments='{"a": 10, "b": 5}', name='simple_add'), id='toolu_01KjnQH2Nsz2viQ7XYpLW3Ta', type='function')], function_call=None, provider_specific_fields=None),
{'role': 'tool',
'tool_call_id': 'toolu_01KjnQH2Nsz2viQ7XYpLW3Ta',
'name': 'simple_add',
'content': '15'},
Message(content='', role='assistant', tool_calls=[ChatCompletionMessageToolCall(function=Function(arguments='{"a": 2, "b": 1}', name='simple_add'), id='toolu_01Koi2EZrGZsBbnQ13wuuvzY', type='function')], function_call=None, provider_specific_fields=None),
{'role': 'tool',
'tool_call_id': 'toolu_01Koi2EZrGZsBbnQ13wuuvzY',
'name': 'simple_add',
'content': '3'},
{'content': 'Now I need to multiply 15 * 3 before I can do the final division:',
'role': 'assistant',
'tool_calls': [ChatCompletionMessageToolCall(function=Function(arguments='{"a": 15, "b": 3}', name='multiply'), id='toolu_0141NRaWUjmGtwxZjWkyiq6C', type='function', cache_control={'type': 'ephemeral'})],
'function_call': None,
'provider_specific_fields': None},
{'role': 'tool',
'tool_call_id': 'toolu_0141NRaWUjmGtwxZjWkyiq6C',
'name': 'multiply',
'content': '45'},
{'content': [{'type': 'text',
'text': '.',
'cache_control': {'type': 'ephemeral'}}],
'role': 'assistant',
'tool_calls': None,
'function_call': None,
'provider_specific_fields': None}]
msgs[-2]{'role': 'tool',
'tool_call_id': 'toolu_0141NRaWUjmGtwxZjWkyiq6C',
'name': 'multiply',
'content': '45'}
msgs = mk_msgs(['Hello!','Hi! How can I help you?','Call some functions!',fmt_outp], cache=True, cache_idxs=[0,-3,-2])
msgs[{'role': 'user',
'content': [{'type': 'text',
'text': 'Hello!',
'cache_control': {'type': 'ephemeral'}}]},
{'role': 'assistant', 'content': 'Hi! How can I help you?'},
{'role': 'user', 'content': 'Call some functions!'},
Message(content="I'll solve this step-by-step, using parallel calls where possible.", role='assistant', tool_calls=[ChatCompletionMessageToolCall(function=Function(arguments='{"a": 10, "b": 5}', name='simple_add'), id='toolu_01KjnQH2Nsz2viQ7XYpLW3Ta', type='function')], function_call=None, provider_specific_fields=None),
{'role': 'tool',
'tool_call_id': 'toolu_01KjnQH2Nsz2viQ7XYpLW3Ta',
'name': 'simple_add',
'content': '15'},
{'content': '',
'role': 'assistant',
'tool_calls': [ChatCompletionMessageToolCall(function=Function(arguments='{"a": 2, "b": 1}', name='simple_add'), id='toolu_01Koi2EZrGZsBbnQ13wuuvzY', type='function', cache_control={'type': 'ephemeral'})],
'function_call': None,
'provider_specific_fields': None},
{'role': 'tool',
'tool_call_id': 'toolu_01Koi2EZrGZsBbnQ13wuuvzY',
'name': 'simple_add',
'content': '3'},
{'content': 'Now I need to multiply 15 * 3 before I can do the final division:',
'role': 'assistant',
'tool_calls': [ChatCompletionMessageToolCall(function=Function(arguments='{"a": 15, "b": 3}', name='multiply'), id='toolu_0141NRaWUjmGtwxZjWkyiq6C', type='function', cache_control={'type': 'ephemeral'})],
'function_call': None,
'provider_specific_fields': None},
{'role': 'tool',
'tool_call_id': 'toolu_0141NRaWUjmGtwxZjWkyiq6C',
'name': 'multiply',
'content': '45'},
Message(content='.', role='assistant', tool_calls=None, function_call=None, provider_specific_fields=None)]
msgs[-3]{'content': 'Now I need to multiply 15 * 3 before I can do the final division:',
'role': 'assistant',
'tool_calls': [ChatCompletionMessageToolCall(function=Function(arguments='{"a": 15, "b": 3}', name='multiply'), id='toolu_0141NRaWUjmGtwxZjWkyiq6C', type='function', cache_control={'type': 'ephemeral'})],
'function_call': None,
'provider_specific_fields': None}
msgs[-5]{'content': '',
'role': 'assistant',
'tool_calls': [ChatCompletionMessageToolCall(function=Function(arguments='{"a": 2, "b": 1}', name='simple_add'), id='toolu_01Koi2EZrGZsBbnQ13wuuvzY', type='function', cache_control={'type': 'ephemeral'})],
'function_call': None,
'provider_specific_fields': None}
test_eq('cache_control' in msgs[0]['content'][0], True)Tool result blocks are skipped and cache control is placed into tool calls:
test_eq('cache_control' in msgs[-5]['tool_calls'][0], True)
test_eq('cache_control' in msgs[-3]['tool_calls'][0], True)L(msgs).map(remove_cache_ckpts)
test_eq(any(L(msgs).map(_has_cache)), False)Who’s speaking at when is automatically inferred. Even when there are multiple tools being called in parallel (which LiteLLM supports!).
msgs = mk_msgs(['Tell me the weather in Paris and Rome',
'Assistant calls weather tool two times',
{'role':'tool','content':'Weather in Paris is ...'},
{'role':'tool','content':'Weather in Rome is ...'},
'Assistant returns weather',
'Thanks!'])
msgs[{'role': 'user', 'content': 'Tell me the weather in Paris and Rome'},
{'role': 'assistant', 'content': 'Assistant calls weather tool two times'},
{'role': 'tool', 'content': 'Weather in Paris is ...'},
{'role': 'tool', 'content': 'Weather in Rome is ...'},
{'role': 'assistant', 'content': 'Assistant returns weather'},
{'role': 'user', 'content': 'Thanks!'}]
For ease of use, if msgs is not already in a list, it will automatically be wrapped inside one. This way you can pass a single prompt into mk_msgs and get back a LiteLLM compatible msg history.
msgs = mk_msgs("Hey")
msgs[{'role': 'user', 'content': 'Hey'}]
msgs = mk_msgs(['Hey!',"Hi there!","How are you?","I'm fine, you?"])
msgs[{'role': 'user', 'content': 'Hey!'},
{'role': 'assistant', 'content': 'Hi there!'},
{'role': 'user', 'content': 'How are you?'},
{'role': 'assistant', 'content': "I'm fine, you?"}]
However, beware that if you use mk_msgs for a single message, consisting of multiple parts. Then you should be explicit, and make sure to wrap those multiple messages in two lists:
- One list to show that they belong together in one message (the inner list).
- Another, because mk_msgs expects a list of multiple messages (the outer list).
This is common when working with images for example:
msgs = mk_msgs([['Whats in this img?',img_fn.read_bytes()]])
print(json.dumps(msgs,indent=1)[:200]+"...")[
{
"role": "user",
"content": [
{
"type": "text",
"text": "Whats in this img?"
},
{
"type": "image_url",
"image_url": "data:image/jpeg;base64,/9j/4AAQSkZJRgABAQAAAQABAAD...
completion(opus, [mk_msg("What's 27*453? Think step by step. ")],
thinking = { "type": "adaptive", "display": "summarized", }, output_config={"effort":"low"})Let me work through 27 × 453 step by step.
Step 1: Break it down using the distributive property 27 × 453 = 27 × (400 + 50 + 3)
Step 2: Multiply each part - 27 × 400 = 10,800 - 27 × 50 = 1,350 - 27 × 3 = 81
Step 3: Add the results - 10,800 + 1,350 = 12,150 - 12,150 + 81 = 12,231
Answer: 27 × 453 = 12,231
- id:
chatcmpl-xxx - model:
claude-opus-4-7 - finish_reason:
stop - usage:
Usage(completion_tokens=180, prompt_tokens=25, total_tokens=205, completion_tokens_details=CompletionTokensDetailsWrapper(accepted_prediction_tokens=None, audio_tokens=None, reasoning_tokens=0, rejected_prediction_tokens=None, text_tokens=180, image_tokens=None, video_tokens=None), prompt_tokens_details=PromptTokensDetailsWrapper(audio_tokens=None, cached_tokens=0, text_tokens=25, image_tokens=None, video_tokens=None, cache_creation_tokens=0, cache_creation_token_details=CacheCreationTokenDetails(ephemeral_5m_input_tokens=0, ephemeral_1h_input_tokens=0)), cache_creation_input_tokens=0, cache_read_input_tokens=0, inference_geo='global', speed=None)
Streaming
LiteLLM supports streaming responses. That’s really useful if you want to show intermediate results, instead of having to wait until the whole response is finished.
We create this helper function that returns the entire response at the end of the stream. This is useful when you want to store the whole response somewhere after having displayed the intermediate results.
stream_with_complete
def stream_with_complete(
gen, postproc:function=noop
):
Extend streaming response chunks with the complete response
r = c(mk_msgs("Hey!"), stream=True)
r2 = SaveReturn(stream_with_complete(r))for o in r2:
cts = o.choices[0].delta.content
if cts: print(cts, end='')Hey! How's it going? What can I help you with today? 😊
r2.valueHey! How’s it going? What can I help you with today? 😊
- id:
chatcmpl-xxx - model:
claude-sonnet-4-6 - finish_reason:
stop - usage:
Usage(completion_tokens=21, prompt_tokens=9, total_tokens=30, completion_tokens_details=CompletionTokensDetailsWrapper(accepted_prediction_tokens=None, audio_tokens=None, reasoning_tokens=0, rejected_prediction_tokens=None, text_tokens=21, image_tokens=None, video_tokens=None), prompt_tokens_details=PromptTokensDetailsWrapper(audio_tokens=None, cached_tokens=0, text_tokens=9, image_tokens=None, video_tokens=None, cache_creation_tokens=0), cache_creation_input_tokens=0, cache_read_input_tokens=0)
Tools
lite_mk_func
def lite_mk_func(
f
):
Call self as a function.
def simple_add(
a: int, # first operand
b: int=0 # second operand
) -> int:
"Add two numbers together"
return a + btoolsc = lite_mk_func(simple_add)
toolsc{'type': 'function',
'function': {'name': 'simple_add',
'description': 'Add two numbers together\n\nReturns:\n- type: integer',
'parameters': {'type': 'object',
'properties': {'a': {'description': 'first operand', 'type': 'integer'},
'b': {'description': 'second operand', 'default': 0, 'type': 'integer'}},
'required': ['a']}}}
tmsg = mk_msg("What is 5478954793+547982745? How about 5479749754+9875438979? Always use tools for calculations, and describe what you'll do before using a tool. Where multiple tool calls are required, do them in a single response where possible. ")
r = c(tmsg, tools=[toolsc])display(r)I’ll calculate both sums simultaneously by making two parallel tool calls right away!
- Call 1: Adding 5478954793 and 547982745
- Call 2: Adding 5479749754 and 9875438979
🔧 simple_add({“a”: 5478954793, “b”: 547982745})
🔧 simple_add({“a”: 5479749754, “b”: 9875438979})
- id:
chatcmpl-xxx - model:
claude-sonnet-4-6 - finish_reason:
tool_calls - usage:
Usage(completion_tokens=190, prompt_tokens=660, total_tokens=850, completion_tokens_details=CompletionTokensDetailsWrapper(accepted_prediction_tokens=None, audio_tokens=None, reasoning_tokens=0, rejected_prediction_tokens=None, text_tokens=190, image_tokens=None, video_tokens=None), prompt_tokens_details=PromptTokensDetailsWrapper(audio_tokens=None, cached_tokens=0, text_tokens=660, image_tokens=None, video_tokens=None, cache_creation_tokens=0, cache_creation_token_details=CacheCreationTokenDetails(ephemeral_5m_input_tokens=0, ephemeral_1h_input_tokens=0)), cache_creation_input_tokens=0, cache_read_input_tokens=0, inference_geo='global', speed=None)
A tool response can be a string or a list of tool blocks (e.g., an image url block). To allow users to specify if a response should not be immediately stringified, we provide the ToolResponse datatype users can wrap their return statement in.
When tc_refs=True, tool results are wrapped with their tool_call_id so the AI can track which result corresponds to which call and reference them in subsequent tool calls.
# Test _prep_tool_res - string result
test_eq(_prep_tool_res('hello', 'toolu_123'), [
{'type': 'text', 'text': '[tool_call_id: toolu_123]'},
{'type': 'text', 'text': 'hello'}
])
# Test _prep_tool_res - list result (e.g. ToolResponse content)
img_block = {'type': 'image_url', 'image_url': {'url': 'data:...'}}
test_eq(_prep_tool_res([img_block], 'toolu_456'), [
{'type': 'text', 'text': '[tool_call_id: toolu_456]'},
img_block
])During a tool loop, the AI may want to reference the result of a previous tool call. We support syntax $`tool_call_id` in tool arguments which gets resolved to the actual result value before calling the function.
# Test _resolve_tool_refs
tc_res = {'toolu_abc123': 'hello world', 'toolu_xyz789': 42}
# Basic substitution
test_eq(_resolve_tool_refs('{"content": "$`toolu_abc123`"}', tc_res), {"content": "hello world"})
# Multiple refs
test_eq(_resolve_tool_refs('{"a": "$`toolu_abc123`", "b": "$`toolu_xyz789`"}', tc_res), {"a": "hello world", "b": 42})
# No refs - passthrough
test_eq(_resolve_tool_refs('{"x": 1}', tc_res), {"x": 1})
# Empty tc_res
test_eq(_resolve_tool_refs('{"x": 1}', None), {"x": 1})
# Missing ref - error message
test_eq(_resolve_tool_refs('{"x": "$`toolu_missing`"}', tc_res), {"x": "Tool result 'toolu_missing' not found!"})
# tc_refs=False - syntax passes through unchanged since tc_res is None
test_eq(_resolve_tool_refs('{"x": "$`toolu_abc123`"}', None), {"x": "$`toolu_abc123`"})When tc_refs=True, tool results are stored in tc_res for later substitution via $`tool_call_id` syntax. Some callers might return string reprs of Python objects. _try_eval attempts to convert these back to Python objects using ast.literal_eval, falling back to the original value on failure. This ensures substituted values are actual objects, not string reprs.
test_eq(ast.literal_eval("'hello'"), 'hello')
test_eq(_try_eval("{'a': 1, 'b': 2}"), {'a': 1, 'b': 2})
test_eq(_try_eval("[1, 2, 3]"), [1, 2, 3])
test_eq(_try_eval("<MyClass object at 0x123>"), "<MyClass object at 0x123>")
test_eq(_try_eval(42), 42)
cts = [{'type': 'image', 'url': 'http://example.com/img.png'}]
test_eq(_try_eval(ToolResponse(cts)), ToolResponse(cts))Ensure ToolResponse content (e.g. image blocks) is passed through as a list, not stringified, even when tc_res is None:
fake_tc = ChatCompletionMessageToolCall(index=0, function=Function(name='test_img'), id='_test', type='function')
img_content = [{'type': 'image_url', 'image_url': 'data:image/png;base64,abc'}]
res = _mk_tool_result(fake_tc, ToolResponse(img_content))
test_eq(res['content'], img_content) # ToolResponse should pass through
res_str = _mk_tool_result(fake_tc, ['hello'])
test_eq(res_str['content'], "['hello']") # other tools results are stringifiedtcs = [_lite_call_func(o, [toolsc], ns=globals()) for o in r.choices[0].message.tool_calls]
_display(*tcs){'tool_call_id': 'toolu_01VJfhNo6RaeayecY8vwNDbp',
'role': 'tool',
'name': 'simple_add',
'content': '6026937538'}
{'tool_call_id': 'toolu_01XzRR94nWVpiZJBpUHfrLaM',
'role': 'tool',
'name': 'simple_add',
'content': '15355188733'}
Test tool calls that were not in tool_schemas are caught:
fake_tc = ChatCompletionMessageToolCall(index=0, function=Function(name='hallucinated_tool'),id='_', type='function')
test_eq(_lite_call_func(fake_tc, ns=globals(), tool_schemas=[toolsc])['content'],"Tool not defined in tool_schemas: hallucinated_tool")
test_fail(_lite_call_func(fake_tc, ns=globals(), tool_schemas=None)['content'],"Tool not defined in tool_schemas: hallucinated_tool")Test tool calls that were not in tool_choice are caught:
def delta_text(msg):
"Extract printable content from streaming delta, return None if nothing to print"
c = msg.choices[0]
if not c: return c
if not hasattr(c,'delta'): return None #f'{c}'
delta = c.delta
if delta.content: return delta.content
if delta.tool_calls:
res = ''.join(f"🔧 {tc.function.name}" for tc in delta.tool_calls if tc.id and tc.function.name)
if res: return f'\n{res}\n'
if hasattr(delta,'reasoning_content'): return '🧠' if delta.reasoning_content else '\n\n'
return Noner = c(tmsg, stream=True, tools=[toolsc])
r2 = SaveReturn(stream_with_complete(r))
for o in r2: print(delta_text(o) or '', end='')I'll calculate both sums simultaneously by making two parallel tool calls right away!
- **Call 1:** Adding 5478954793 and 547982745
- **Call 2:** Adding 5479749754 and 9875438979
🔧 simple_add
🔧 simple_add
r2.valueI’ll calculate both sums simultaneously by making two parallel tool calls right away!
- Call 1: Adding 5478954793 and 547982745
- Call 2: Adding 5479749754 and 9875438979
🔧 simple_add({“a”: 5478954793, “b”: 547982745})
🔧 simple_add({“a”: 5479749754, “b”: 9875438979})
- id:
chatcmpl-xxx - model:
claude-sonnet-4-6 - finish_reason:
tool_calls - usage:
Usage(completion_tokens=190, prompt_tokens=660, total_tokens=850, completion_tokens_details=CompletionTokensDetailsWrapper(accepted_prediction_tokens=None, audio_tokens=None, reasoning_tokens=0, rejected_prediction_tokens=None, text_tokens=190, image_tokens=None, video_tokens=None), prompt_tokens_details=PromptTokensDetailsWrapper(audio_tokens=None, cached_tokens=0, text_tokens=660, image_tokens=None, video_tokens=None, cache_creation_tokens=0), cache_creation_input_tokens=0, cache_read_input_tokens=0)
msg = mk_msg("Solve this complex math problem: What is the derivative of x^3 + 2x^2 - 5x + 1?")
r = c(msg, stream=True, reasoning_effort="low")
r2 = SaveReturn(stream_with_complete(r))
for o in r2: print(delta_text(o) or '', end='')🧠
## Finding the Derivative
Using the **Power Rule**: d/dx[xⁿ] = nxⁿ⁻¹
### Step-by-Step Solution:
| Term | Rule Applied | Result |
|------|-------------|--------|
| x³ | bring down 3, reduce power | 3x² |
| 2x² | bring down 2, multiply by coefficient | 4x |
| -5x | bring down 1, reduce power | -5 |
| 1 | derivative of constant = 0 | 0 |
### Answer:
$$f(x) = x^3 + 2x^2 - 5x + 1$$
$$\boxed{f'(x) = 3x^2 + 4x - 5}$$
r2.valueFinding the Derivative
Using the Power Rule: d/dx[xⁿ] = nxⁿ⁻¹
Step-by-Step Solution:
| Term | Rule Applied | Result |
|---|---|---|
| x³ | bring down 3, reduce power | 3x² |
| 2x² | bring down 2, multiply by coefficient | 4x |
| -5x | bring down 1, reduce power | -5 |
| 1 | derivative of constant = 0 | 0 |
Answer:
\[f(x) = x^3 + 2x^2 - 5x + 1\]
\[\boxed{f'(x) = 3x^2 + 4x - 5}\]
- id:
chatcmpl-xxx - model:
claude-sonnet-4-6 - finish_reason:
stop - usage:
Usage(completion_tokens=220, prompt_tokens=38, total_tokens=258, completion_tokens_details=CompletionTokensDetailsWrapper(accepted_prediction_tokens=None, audio_tokens=None, reasoning_tokens=0, rejected_prediction_tokens=None, text_tokens=220, image_tokens=None, video_tokens=None), prompt_tokens_details=PromptTokensDetailsWrapper(audio_tokens=None, cached_tokens=0, text_tokens=38, image_tokens=None, video_tokens=None, cache_creation_tokens=0), cache_creation_input_tokens=0, cache_read_input_tokens=0)
Structured Outputs
structured
def structured(
m:str, msgs:list, tool:Callable, completefunc:Callable=completion,
messages:List=[], # Optional OpenAI params: see https://platform.openai.com/docs/api-reference/chat/create
timeout:Union=None, temperature:Optional=None, top_p:Optional=None, n:Optional=None, stream:Optional=None,
stream_options:Optional=None, stop:NoneType=None, max_completion_tokens:Optional=None, max_tokens:Optional=None,
modalities:Optional=None, prediction:Optional=None, audio:Optional=None, presence_penalty:Optional=None,
frequency_penalty:Optional=None, logit_bias:Optional=None, user:Optional=None,
reasoning_effort:Optional=None, # openai v1.0+ new params
verbosity:Optional=None, response_format:Union=None, seed:Optional=None, tools:Optional=None,
tool_choice:Union=None, logprobs:Optional=None, top_logprobs:Optional=None, parallel_tool_calls:Optional=None,
web_search_options:Optional=None, deployment_id:NoneType=None, extra_headers:Optional=None,
safety_identifier:Optional=None, service_tier:Optional=None,
functions:Optional=None, # soon to be deprecated params by OpenAI
function_call:Optional=None, base_url:Optional=None, # set api_base, api_version, api_key
api_version:Optional=None, api_key:Optional=None,
model_list:Optional=None, # pass in a list of api_base,keys, etc.
thinking:Optional=None, # Optional liteLLM function params
shared_session:Optional=None, # Session management
enable_json_schema_validation:Optional=None, # Per-request JSON schema validation (overrides litellm.enable_json_schema_validation)
):
Return the value of the tool call (generally used for structured outputs)
class President:
"Information about a president of the United States"
def __init__(
self,
first:str, # first name
last:str, # last name
spouse:str, # name of spouse
years_in_office:str, # format: "{start_year}-{end_year}"
birthplace:str, # name of city
birth_year:int # year of birth, `0` if unknown
):
assert re.match(r'\d{4}-\d{4}', years_in_office), "Invalid format: `years_in_office`"
store_attr()
__repr__ = basic_repr('first, last, spouse, years_in_office, birthplace, birth_year')for m in ms[1::-1]:
r = structured(m, [mk_msg("Tell me something about the third president of the USA.")], President)
test_eq(r.first, 'Thomas'); test_eq(r.last, 'Jefferson')Search
# AnthropicConfig().map_web_search_tool({})LiteLLM provides search, not via tools, but via the special web_search_options param.
Note: Not all models support web search. LiteLLM’s supports_web_search field should indicate this, but it’s unreliable for some models like claude-sonnet-4-20250514. Checking both supports_web_search and search_context_cost_per_query provides more accurate detection.
for m in ms:
print(m)
print(_has_search(m))gemini/gemini-3-pro-preview
True
gemini/gemini-3-flash-preview
True
claude-sonnet-4-6
True
openai/gpt-5.4
False
When search is supported it can be used like this:
smsg = mk_msg("Search the web and tell me very briefly about otters")
r = c(smsg, m=sonn46, web_search_options={})
rHere’s a brief overview of otters:
What they are: Otters are carnivorous mammals, and all 14 extant species are semiaquatic, living in both freshwater and marine environments. They are found on every continent except Australia and Antarctica.
Physical features: Otters have long, slim bodies and relatively short limbs. Their most striking anatomical features are their powerful webbed feet used to swim and their seal-like ability to hold their breath underwater. Most have sharp claws, and all except the sea otter have long, muscular tails. Otters have the densest fur of any animal — as many as a million hairs per square inch in places.
Diet: Otters are carnivores that eat mainly fish and invertebrates. Their diet depends on species and habitat — river otters eat mostly fish, frogs, crayfish, crabs, and mollusks, while sea otters mostly consume sea urchins, abalone, crabs, fish, octopuses, mussels, and clams.
Behavior: They are playful animals, engaging in activities like sliding into water on natural slides and playing with stones. Sea otters crack open shellfish with rocks they hold on their stomachs, making them the only otter that uses rocks as tools. When it’s time to nap, sea otters entangle themselves in kelp so they don’t float away, and they sometimes intertwine their feet with another sea otter to stay together.
Conservation: Otters and their relatives were once hunted extensively for their fur, many to the point of near extinction. Despite regulations designed to protect them, many species remain at risk from pollution and habitat loss.
🔧 web_search({“query”: “otters facts overview”})
- id:
chatcmpl-xxx - model:
claude-sonnet-4-6 - finish_reason:
stop - usage:
Usage(completion_tokens=613, prompt_tokens=17799, total_tokens=18412, completion_tokens_details=CompletionTokensDetailsWrapper(accepted_prediction_tokens=None, audio_tokens=None, reasoning_tokens=0, rejected_prediction_tokens=None, text_tokens=613, image_tokens=None, video_tokens=None), prompt_tokens_details=PromptTokensDetailsWrapper(audio_tokens=None, cached_tokens=0, text_tokens=17799, image_tokens=None, video_tokens=None, cache_creation_tokens=0, cache_creation_token_details=CacheCreationTokenDetails(ephemeral_5m_input_tokens=0, ephemeral_1h_input_tokens=0)), server_tool_use=ServerToolUse(web_search_requests=1, tool_search_requests=None), cache_creation_input_tokens=0, cache_read_input_tokens=0, inference_geo='global', speed=None)
To use search in Gemini with tool calls, we need an extra option:
c(smsg, m=ms[1], tools=[toolsc], web_search_options={}, include_server_side_tool_invocations=True)Otters are semiaquatic carnivorous mammals belonging to the Mustelidae family (making them relatives of weasels and badgers). There are 13 extant species found across every continent except Australia and Antarctica.
Key Facts
- Habitat: They live in a variety of environments, including freshwater rivers, lakes, and marine coastlines.
- Diet: They are expert hunters that primarily eat fish, but also consume frogs, birds, and shellfish.
- Unique Adaptations:
- Thickest Fur: Sea otters have the densest fur of any animal (up to 1 million hairs per square inch), which provides insulation since they lack a layer of blubber.
- Tool Use: They are one of the few mammals known to use tools; for example, sea otters use stones to crack open shellfish.
- Social Behavior: Many species are highly social and playful. Sea otters are famous for “holding hands” (rafting) while sleeping to prevent drifting apart in the current.
- Conservation: Most otter species are currently in decline due to habitat loss, pollution, and poaching, and poaching, and climate change.
- id:
chatcmpl-xxx - model:
gemini-3-flash-preview - finish_reason:
stop - usage:
Usage(completion_tokens=490, prompt_tokens=128, total_tokens=618, completion_tokens_details=CompletionTokensDetailsWrapper(accepted_prediction_tokens=None, audio_tokens=None, reasoning_tokens=238, rejected_prediction_tokens=None, text_tokens=252, image_tokens=None, video_tokens=None), prompt_tokens_details=PromptTokensDetailsWrapper(audio_tokens=None, cached_tokens=None, text_tokens=128, image_tokens=None, video_tokens=None, web_search_requests=1), cache_read_input_tokens=None)
LiteLLM keeps two related but separate sources of model metadata. register_model updates model_cost and therefore affects get_model_info, but request validation calls get_supported_openai_params, which delegates to the provider config when a known provider is used. For chatgpt/* models, LiteLLM strips the chatgpt/ prefix before validation and routes the model through ChatGPTConfig, which inherits OpenAI’s GPT-5 parameter handling. Therefore Codex/ChatGPT model aliases need to be included in the GPT-5 supported-parameter patch using both their prefixed and stripped names; otherwise get_model_info may show support for a parameter while completion() still rejects it during validation.
OpenAIGPT5Config.get_supported_openai_params
def get_supported_openai_params(
model:str
):
Call self as a function.
We also need to register GPT-5.4(-mini):
inf = dict(get_model_info(gpt54))
'web_search_options' in inf['supported_openai_params']True
c(smsg, m=gpt54, tools=[toolsc], web_search_options={})Otters are semiaquatic mammals in the weasel family, with 13 species found across freshwater habitats in much of the world, plus marine species like the sea otter. They have streamlined bodies, webbed feet, and very dense fur, and they mostly eat fish and other aquatic prey such as crabs and frogs. Many species are threatened by habitat loss and pollution. (britannica.com)
- id:
chatcmpl-xxx - model:
gpt-5.4 - finish_reason:
stop - usage:
Usage(completion_tokens=160, prompt_tokens=8416, total_tokens=8576, completion_tokens_details=CompletionTokensDetailsWrapper(accepted_prediction_tokens=None, audio_tokens=None, reasoning_tokens=48, rejected_prediction_tokens=None, text_tokens=None, image_tokens=None, video_tokens=None), prompt_tokens_details=PromptTokensDetailsWrapper(audio_tokens=None, cached_tokens=0, text_tokens=None, image_tokens=None, video_tokens=None))
c(smsg, m=gpt54m, tools=[toolsc], web_search_options={})Otters are playful semiaquatic mammals in the weasel family, found on most continents. They’re expert swimmers with webbed feet, dense waterproof fur, and a diet mostly of fish and other aquatic animals. There are 13 otter species, and many are threatened by pollution, habitat loss, and hunting. (nationalgeographic.com)
- id:
chatcmpl-xxx - model:
gpt-5.4-mini - finish_reason:
stop - usage:
Usage(completion_tokens=149, prompt_tokens=8327, total_tokens=8476, completion_tokens_details=CompletionTokensDetailsWrapper(accepted_prediction_tokens=None, audio_tokens=None, reasoning_tokens=59, rejected_prediction_tokens=None, text_tokens=None, image_tokens=None, video_tokens=None), prompt_tokens_details=PromptTokensDetailsWrapper(audio_tokens=None, cached_tokens=3456, text_tokens=None, image_tokens=None, video_tokens=None))
c(smsg, m=ms[1], web_search_options={})Otters are semi-aquatic carnivorous mammals belonging to the weasel family (Mustelidae). There are 13 species worldwide, ranging from small river otters to the large sea otter.
Key Facts:
- Physical Traits: They have long, slim bodies, powerful webbed feet for swimming, and incredibly dense, water-repellent fur. Sea otters have the thickest fur of any animal (up to 1 million hairs per square inch) to keep them warm without blubber.
- Habitat: They live in both freshwater (rivers, lakes) and marine environments (coastal oceans). While river otters spend much of their time on land, sea otters live almost their entire lives in the water.
- Behavior: Known for being highly intelligent and playful, otters often slide down banks or play with stones. Sea otters are famous for using rocks as tools to crack open shellfish and for “holding hands” (rafting) while they sleep so they don’t drift away.
- Diet: They primarily eat fish, but also consume frogs, crabs, and shellfish.
- Conservation: Most otter species are currently threatened or endangered due to habitat loss, pollution, and historical hunting for their fur.
- id:
chatcmpl-xxx - model:
gemini-3-flash-preview - finish_reason:
stop - usage:
Usage(completion_tokens=535, prompt_tokens=53, total_tokens=588, completion_tokens_details=CompletionTokensDetailsWrapper(accepted_prediction_tokens=None, audio_tokens=None, reasoning_tokens=268, rejected_prediction_tokens=None, text_tokens=267, image_tokens=None, video_tokens=None), prompt_tokens_details=PromptTokensDetailsWrapper(audio_tokens=None, cached_tokens=None, text_tokens=53, image_tokens=None, video_tokens=None, web_search_requests=1), cache_read_input_tokens=None)
Citations
We provide this helper function that adds the citation to the content field in markdown format:
cite_footnotes
def cite_footnotes(
stream_list
):
Add markdown footnote citations to stream deltas
cite_footnote
def cite_footnote(
msg
):
Call self as a function.
r = list(c(smsg, ms[2], stream=True, web_search_options={"search_context_size": "low"}))
cite_footnotes(r)
stream_chunk_builder(r)Here’s a brief overview of otters:
What they are: * Otters are carnivorous mammals in the subfamily Lutrinae, and all 14 extant species are semiaquatic, living in both freshwater and marine environments. * They are found on every continent except Australia and Antarctica.
Physical traits: * Otters are distinguished by their long, slim bodies, powerful webbed feet for swimming, and dense fur, which keeps them warm and buoyant in water. * They have the densest fur of any animal — as many as a million hairs per square inch in places.
Diet & tools: * All otters are expert hunters that eat fish, crustaceans, and other critters. * Sea otters have an ingenious method to open shellfish — a sea otter will float on its back, place a rock on its chest, then smash the mollusk down on it until it breaks open.
Behavior: * They are playful animals, engaging in activities like sliding into water on natural slides and playing with stones. * When it’s time to nap, sea otters entangle themselves in kelp so they don’t float away, and they sometimes intertwine their feet with another sea otter to stay together.
Lifespan & young: * They can live up to 16 years, with their diet mainly consisting of fish and sometimes frogs, birds, or shellfish, depending on the species. * A newborn pup needs constant attention and will stay with its mother for six months until it develops survival skills.
Conservation: * Otters and their relatives were once hunted extensively for their fur, many to the point of near extinction, and despite regulations designed to protect them, many species remain at risk from pollution and habitat loss.
🔧 web_search({“query”: “otters facts overview”})
- id:
chatcmpl-xxx - model:
claude-sonnet-4-6 - finish_reason:
stop - usage:
Usage(completion_tokens=627, prompt_tokens=17556, total_tokens=18183, completion_tokens_details=CompletionTokensDetailsWrapper(accepted_prediction_tokens=None, audio_tokens=None, reasoning_tokens=0, rejected_prediction_tokens=None, text_tokens=627, image_tokens=None, video_tokens=None), prompt_tokens_details=PromptTokensDetailsWrapper(audio_tokens=None, cached_tokens=0, text_tokens=17556, image_tokens=None, video_tokens=None, cache_creation_tokens=0), server_tool_use={'web_search_requests': 1, 'tool_search_requests': None}, cache_creation_input_tokens=0, cache_read_input_tokens=0)
Chat
LiteLLM is pretty bare bones. It doesnt keep track of conversation history or what tools have been added in the conversation so far.
So lets make a Claudette style wrapper so we can do streaming, toolcalling, and toolloops without problems.
mk_stream_chunk
def mk_stream_chunk(
kwargs:VAR_KEYWORD
):
Call self as a function.
When the tool uses are about to be exhausted it is important to alert the AI so that it knows to use its final steps for communicating the user current progress and next steps
print(_trunc_str('𝍁xxxxxxxxxx𝍁', mx=5))
print(_trunc_str(Safe('xxxxxxxxxx'), mx=5))
print(_trunc_str('xxxxxxxxxx', mx=5, skip=0))
print(_trunc_str('xxxxxxxxxx', mx=5, skip=1))xxxxxxxxxx
<TRUNCATED>……</TRUNCATED>
<TRUNCATED>xxxxx…</TRUNCATED>
<TRUNCATED>…xxx…</TRUNCATED>
When tc_refs=True, the AI can reference previous tool results in subsequent tool calls using the $`tool_call_id` syntax. This is useful when chaining tool calls where one result feeds into another.
Anthropic provides web search request counts directly via usage.server_tool_use.web_search_requests, billed at $10 per 1,000 searches (pricing). Gemini returns queries in groundingMetadata.webSearchQueries—each query counts as a separate billable use—with 5,000 free prompts per month, then $14 per 1,000 search queries (coming soon) (pricing, grounding docs).
search_count
def search_count(
r
):
Call self as a function.
UsageStats
def UsageStats(
prompt_tokens:int=0, completion_tokens:int=0, total_tokens:int=0, cached_tokens:int=0,
cache_creation_tokens:int=0, reasoning_tokens:int=0, web_search_requests:int=0, cost:float=0.0
):
Initialize self. See help(type(self)) for accurate signature.
Chat
def Chat(
model:str, # LiteLLM compatible model name
sp:str='', # System prompt
temp:int=0, # Temperature
search:bool=False, # Search (l,m,h), if model supports it
tools:list=None, # Add tools
hist:list=None, # Chat history
ns:Optional=None, # Custom namespace for tool calling
cache:bool=False, # Anthropic prompt caching
cache_idxs:list=[-1], # Anthropic cache breakpoint idxs, use `0` for sys prompt if provided
ttl:NoneType=None, # Anthropic prompt caching ttl
api_base:NoneType=None, # API base URL for custom providers
api_key:NoneType=None, # API key for custom providers
extra_headers:NoneType=None, # Extra HTTP headers for custom providers
tc_refs:bool=False, # Enable tool call result references
tc_res_eval:bool=False, # literal_eval tool results before storing in tc_res
markup:int=0, # Cost markup multiplier (e.g. 0.5 for 50%)
tool_reminder:NoneType=None, # Prepended as a block to the first trailing tool result (transient)
max_tokens:NoneType=None, # Default max_tokens for completion()
completefunc:Optional=None, # Completion function
stream:bool=False, # Default `stream` for `__call__`
callkw:dict=None, # Extra kwargs passed to completion() on every call
):
LiteLLM chat client.
web_search is now included in tool_calls the internal LLM translation is correctly handled thanks to the fix here but the server side tools still need to be filtered out from tool_calls in our own toolloop.
add_warning
def add_warning(
r, msg
):
Call self as a function.
Chat.__call__
def __call__(
msg:NoneType=None, # Message str, or list of multiple message parts
prefill:NoneType=None, # Prefill AI response if model supports it
temp:NoneType=None, # Override temp set on chat initialization
think:NoneType=None, # Thinking (l,m,h)
search:NoneType=None, # Override search set on chat initialization (l,m,h)
stream:NoneType=None, # Stream results (defaults to `self.stream`)
max_steps:int=2, # Maximum number of tool calls
final_prompt:dict={'role': 'user', 'content': 'You have used all your tool calls for this turn. Please summarize your findings. If you did not complete your goal, tell the user what further work is needed. You may use tools again on the next user message.'}, # Final prompt when tool calls have ran out
return_all:bool=False, # Returns all intermediate ModelResponses if not streaming and has tool calls
step:int=1, tool_choice:NoneType=None, max_tokens:NoneType=None
):
Main call method - handles streaming vs non-streaming
Chat.print_hist
def print_hist(
):
Print each message on a different line
Examples
History tracking
for m in ms[1:]:
chat = Chat(m)
chat("Hey my name is Rens")
r = chat("Whats my name")
test_eq('Rens' in contents(r).content, True)
rYour name is Rens.
- id:
chatcmpl-xxx - model:
gpt-5.4-2026-03-05 - finish_reason:
stop - usage:
Usage(completion_tokens=9, prompt_tokens=40, total_tokens=49, completion_tokens_details=CompletionTokensDetailsWrapper(accepted_prediction_tokens=0, audio_tokens=0, reasoning_tokens=0, rejected_prediction_tokens=0, text_tokens=None, image_tokens=None, video_tokens=None), prompt_tokens_details=PromptTokensDetailsWrapper(audio_tokens=0, cached_tokens=0, text_tokens=None, image_tokens=None, video_tokens=None))
See now we keep track of history!
History is stored in the hist attribute:
chat.hist[{'role': 'user', 'content': 'Hey my name is Rens'},
Message(content='Hi Rens! Nice to meet you — how can I help today?', role='assistant', tool_calls=None, function_call=None, provider_specific_fields={'refusal': None}, annotations=[]),
{'role': 'user', 'content': 'Whats my name'},
Message(content='Your name is Rens.', role='assistant', tool_calls=None, function_call=None, provider_specific_fields={'refusal': None}, annotations=[])]
chat.print_hist(){'role': 'user', 'content': 'Hey my name is Rens'}
Message(content='Hi Rens! Nice to meet you — how can I help today?', role='assistant', tool_calls=None, function_call=None, provider_specific_fields={'refusal': None}, annotations=[])
{'role': 'user', 'content': 'Whats my name'}
Message(content='Your name is Rens.', role='assistant', tool_calls=None, function_call=None, provider_specific_fields={'refusal': None}, annotations=[])
You can also pass an old chat history into new Chat objects:
for m in ms[1:]:
chat2 = Chat(m, hist=chat.hist)
r = chat2("What was my name again?")
test_eq('Rens' in contents(r).content, True)
rYour name is Rens.
- id:
chatcmpl-xxx - model:
gpt-5.4-2026-03-05 - finish_reason:
stop - usage:
Usage(completion_tokens=9, prompt_tokens=62, total_tokens=71, completion_tokens_details=CompletionTokensDetailsWrapper(accepted_prediction_tokens=0, audio_tokens=0, reasoning_tokens=0, rejected_prediction_tokens=0, text_tokens=None, image_tokens=None, video_tokens=None), prompt_tokens_details=PromptTokensDetailsWrapper(audio_tokens=0, cached_tokens=0, text_tokens=None, image_tokens=None, video_tokens=None))
If max tokens limit is reached, a custom warning message will be added to the end of the model response:
chat_long = Chat(m)
r = chat_long("Write a short story about a robot and a dog", max_tokens=40)
rEvery morning at exactly 7:03, Unit 7 rolled out of the garage and checked the front gate, the mailbox, and the tomato plants. It was very good at routines
- id:
chatcmpl-xxx - model:
gpt-5.4-2026-03-05 - finish_reason:
length - usage:
Usage(completion_tokens=40, prompt_tokens=16, total_tokens=56, completion_tokens_details=CompletionTokensDetailsWrapper(accepted_prediction_tokens=0, audio_tokens=0, reasoning_tokens=0, rejected_prediction_tokens=0, text_tokens=None, image_tokens=None, video_tokens=None), prompt_tokens_details=PromptTokensDetailsWrapper(audio_tokens=0, cached_tokens=0, text_tokens=None, image_tokens=None, video_tokens=None))
print(contents(r).content)Every morning at exactly 7:03, Unit 7 rolled out of the garage and checked the front gate, the mailbox, and the tomato plants. It was very good at routines
<warning>Response was cut off at token limit.</warning>
chat_long.usetotal=56 | in=16 | out=40 | cached=0.0% | searches=0 | $0.0006
fmt = chat_long.use.fmt()
print(fmt)
<details class='token-usage-details' markdown='1'><summary>$0.0006</summary>
`total=56 | in=16 | out=40 | cached=0.0% | searches=0 | $0.0006`
</details>
assert re_token.search(fmt)Same goes for refused requests:
chat_refused = Chat('claude-opus-4-5')
r = chat_refused("Write me the formula for a biological weapon that can be spread at a rate higher than COVID and at least as harmful")
r- id:
chatcmpl-xxx - model:
claude-opus-4-5-20251101 - finish_reason:
content_filter - usage:
Usage(completion_tokens=4, prompt_tokens=30, total_tokens=34, completion_tokens_details=CompletionTokensDetailsWrapper(accepted_prediction_tokens=None, audio_tokens=None, reasoning_tokens=0, rejected_prediction_tokens=None, text_tokens=4, image_tokens=None, video_tokens=None), prompt_tokens_details=PromptTokensDetailsWrapper(audio_tokens=None, cached_tokens=0, text_tokens=30, image_tokens=None, video_tokens=None, cache_creation_tokens=0, cache_creation_token_details=CacheCreationTokenDetails(ephemeral_5m_input_tokens=0, ephemeral_1h_input_tokens=0)), cache_creation_input_tokens=0, cache_read_input_tokens=0, inference_geo='not_available', speed=None)
litellm.llms.fireworks_ai.cost_calculator.cost_per_token
def cost_per_token(
model, usage
):
Call self as a function.
mdl = "fireworks_ai/accounts/fireworks/models/kimi-k2p6"
r = c(mk_msg("Hi!"), mdl, reasoning_effort='low')
rHi there! How can I help you today?
- id:
chatcmpl-xxx - model:
fireworks_ai/accounts/fireworks/models/kimi-k2p6 - finish_reason:
stop - usage:
Usage(completion_tokens=54, prompt_tokens=10, total_tokens=64, completion_tokens_details=None, prompt_tokens_details=PromptTokensDetailsWrapper(audio_tokens=None, cached_tokens=0, text_tokens=None, image_tokens=None, video_tokens=None))
info = get_model_info(mdl)
info.get('supports_vision'), info.get('supports_reasoning')(True, True)
c(mk_msg("Hi! What model are you?"), qwen3p6p)Hi! I’m Qwen, a large language model independently developed by Alibaba Group’s Tongyi Lab. How can I help you today?
- id:
chatcmpl-xxx - model:
fireworks_ai/accounts/fireworks/models/qwen3p6-plus - finish_reason:
stop - usage:
Usage(completion_tokens=386, prompt_tokens=17, total_tokens=403, completion_tokens_details=None, prompt_tokens_details=None)
pr = "What's 27*453? Think step by step, then output only the number."27*45312231
c(mk_msg(pr), qwen3p6p, reasoning_effort='low')To calculate 27 × 453, we can break it down using the distributive property: 27 × 453 = (20 + 7) × 453 First, multiply 20 × 453 = 9,060 Next, multiply 7 × 453 = 3,171 Finally, add the two results together: 9,060 + 3,171 = 12,231
12231
- id:
chatcmpl-xxx - model:
fireworks_ai/accounts/fireworks/models/qwen3p6-plus - finish_reason:
stop - usage:
Usage(completion_tokens=2229, prompt_tokens=31, total_tokens=2260, completion_tokens_details=None, prompt_tokens_details=None)
msg = mk_msg(['In brief, what in this image?',img_fn.read_bytes()])r = c(msg, qwen3p6p)
rThis image features a cute brown and white puppy, likely a Cavalier King Charles Spaniel, lying in the grass. It is positioned next to a bush of small purple flowers and is looking directly at the camera with one paw extended forward.
- id:
chatcmpl-xxx - model:
fireworks_ai/accounts/fireworks/models/qwen3p6-plus - finish_reason:
stop - usage:
Usage(completion_tokens=437, prompt_tokens=91, total_tokens=528, completion_tokens_details=None, prompt_tokens_details=None)
UsageStats.from_response(r)total=528 | in=91 | out=437 | cached=0.0% | searches=0 | $0.0014
You can prefix an OpenAI compatible model with ‘openai/’ and use an api_base and api_key argument to use models not registered with litellm.
import os, litellm
OPENROUTER_API_KEY = os.getenv("OPENROUTER_API_KEY")
OPENROUTER_BASE_URL = "https://openrouter.ai/api/v1"
c = Chat("openai/gpt-oss-20b", api_key=OPENROUTER_API_KEY, api_base=OPENROUTER_BASE_URL)
c("hi")Synthetic History Creation
Lets build chat history step by step. That way we can tweak anything we need to during testing.
pr = "What is 5 + 7? Use the tool to calculate it."
for m in ms[1:]:
c = Chat(m, tools=[simple_add])
res = c(pr)
test_eq('12' in contents(res).content, True)
test_eq(nested_idx(c.hist,1,'tool_calls',0,'function','name'), 'simple_add')Whereas normally without tools we would get one user input and one assistant response. Here we get two extra messages in between. - An assistant message requesting the tools with arguments. - A tool response with the result to the tool call.
c.print_hist(){'role': 'user', 'content': 'What is 5 + 7? Use the tool to calculate it.'}
Message(content=None, role='assistant', tool_calls=[ChatCompletionMessageToolCall(function=Function(arguments='{"a":5,"b":7}', name='simple_add'), id='call_cdIWMv8xiGmV4BZcsWsiP5z1', type='function')], function_call=None, provider_specific_fields={'refusal': None}, annotations=[])
{'tool_call_id': 'call_cdIWMv8xiGmV4BZcsWsiP5z1', 'role': 'tool', 'name': 'simple_add', 'content': '12'}
Message(content='12', role='assistant', tool_calls=None, function_call=None, provider_specific_fields={'refusal': None}, annotations=[])
Lets try to build this up manually so we have full control over the inputs.
random_tool_id
def random_tool_id(
):
Generate a random tool ID with ‘toolu_’ prefix
random_tool_id()'toolu_0UAqFzWsDK4FrUMp48Y3tT3QD'
A tool call request can contain one more or more tool calls. Lets make one.
mk_tc
def mk_tc(
func, tcid:NoneType=None, idx:int=1, kw:VAR_KEYWORD
):
Call self as a function.
tc = mk_tc(simple_add, a=5, b=7)
tc{'index': 1,
'function': {'arguments': '{"a": 5, "b": 7}', 'name': 'simple_add'},
'id': 'toolu_gAL47D1qXIaSyZPaE1pu1lJo7',
'type': 'function'}
This can then be packged into the full Message object produced by the assitant.
def mk_tc_req(content, tcs): return Message(content=content, role='assistant', tool_calls=tcs, function_call=None)tc_cts = "I'll use the simple_add tool to calculate 5 + 7 for you."
tcq = mk_tc_req(tc_cts, [tc])
tcqMessage(content="I'll use the simple_add tool to calculate 5 + 7 for you.", role='assistant', tool_calls=[ChatCompletionMessageToolCall(index=1, function=Function(arguments='{"a": 5, "b": 7}', name='simple_add'), id='toolu_gAL47D1qXIaSyZPaE1pu1lJo7', type='function')], function_call=None, provider_specific_fields=None)
Notice how Message instantiation creates a list of ChatCompletionMessageToolCalls by default. When the tools are executed this is converted back to a dictionary, for consistency we want to keep these as dictionaries from the beginning.
mk_tc_req
def mk_tc_req(
content, tcs
):
Call self as a function.
tcq = mk_tc_req(tc_cts, [tc])
tcqMessage(content="I'll use the simple_add tool to calculate 5 + 7 for you.", role='assistant', tool_calls=[ChatCompletionMessageToolCall(index=1, function=Function(arguments='{"a": 5, "b": 7}', name='simple_add'), id='toolu_gAL47D1qXIaSyZPaE1pu1lJo7', type='function')], function_call=None, provider_specific_fields=None)
c = Chat(model, tools=[simple_add], hist=[pr, tcq])c.print_hist(){'role': 'user', 'content': 'What is 5 + 7? Use the tool to calculate it.'}
Message(content="I'll use the simple_add tool to calculate 5 + 7 for you.", role='assistant', tool_calls=[ChatCompletionMessageToolCall(index=1, function=Function(arguments='{"a": 5, "b": 7}', name='simple_add'), id='toolu_gAL47D1qXIaSyZPaE1pu1lJo7', type='function')], function_call=None, provider_specific_fields=None)
Looks good so far! Now we will want to provide the actual result!
mk_tc_result
def mk_tc_result(
tc, result
):
Call self as a function.
Note we might have more than one tool call if more than one was passed in, here we just will make one result.
tcq.tool_calls[0]ChatCompletionMessageToolCall(index=1, function=Function(arguments='{"a": 5, "b": 7}', name='simple_add'), id='toolu_gAL47D1qXIaSyZPaE1pu1lJo7', type='function')
mk_tc_result(tcq.tool_calls[0], '12'){'tool_call_id': 'toolu_gAL47D1qXIaSyZPaE1pu1lJo7',
'role': 'tool',
'name': 'simple_add',
'content': '12'}
mk_tc_results
def mk_tc_results(
tcq, results
):
Call self as a function.
Same for here tcq.tool_calls will match the number of results passed in the results list.
tcqMessage(content="I'll use the simple_add tool to calculate 5 + 7 for you.", role='assistant', tool_calls=[ChatCompletionMessageToolCall(index=1, function=Function(arguments='{"a": 5, "b": 7}', name='simple_add'), id='toolu_gAL47D1qXIaSyZPaE1pu1lJo7', type='function')], function_call=None, provider_specific_fields=None)
tcr = mk_tc_results(tcq, ['12'])
tcr[{'tool_call_id': 'toolu_gAL47D1qXIaSyZPaE1pu1lJo7',
'role': 'tool',
'name': 'simple_add',
'content': '12'}]
Now we can call it with this synthetic data to see what the response is!
c(tcr[0])The result of 5 + 7 = 12! 🎉
- id:
chatcmpl-xxx - model:
claude-sonnet-4-6 - finish_reason:
stop - usage:
Usage(completion_tokens=22, prompt_tokens=721, total_tokens=743, completion_tokens_details=CompletionTokensDetailsWrapper(accepted_prediction_tokens=None, audio_tokens=None, reasoning_tokens=0, rejected_prediction_tokens=None, text_tokens=22, image_tokens=None, video_tokens=None), prompt_tokens_details=PromptTokensDetailsWrapper(audio_tokens=None, cached_tokens=0, text_tokens=721, image_tokens=None, video_tokens=None, cache_creation_tokens=0, cache_creation_token_details=CacheCreationTokenDetails(ephemeral_5m_input_tokens=0, ephemeral_1h_input_tokens=0)), cache_creation_input_tokens=0, cache_read_input_tokens=0, inference_geo='global', speed=None)
c.print_hist(){'role': 'user', 'content': 'What is 5 + 7? Use the tool to calculate it.'}
Message(content="I'll use the simple_add tool to calculate 5 + 7 for you.", role='assistant', tool_calls=[ChatCompletionMessageToolCall(index=1, function=Function(arguments='{"a": 5, "b": 7}', name='simple_add'), id='toolu_gAL47D1qXIaSyZPaE1pu1lJo7', type='function')], function_call=None, provider_specific_fields=None)
{'tool_call_id': 'toolu_gAL47D1qXIaSyZPaE1pu1lJo7', 'role': 'tool', 'name': 'simple_add', 'content': '12'}
Message(content='The result of **5 + 7 = 12**! 🎉', role='assistant', tool_calls=None, function_call=None, provider_specific_fields={'citations': None, 'thinking_blocks': None})
Lets try this again, but lets give it something that is clearly wrong for fun.
c = Chat(model, tools=[simple_add], hist=[pr, tcq])tcr = mk_tc_results(tcq, ['13'])
tcr[{'tool_call_id': 'toolu_gAL47D1qXIaSyZPaE1pu1lJo7',
'role': 'tool',
'name': 'simple_add',
'content': '13'}]
c(tcr[0])Hmm, it appears the tool returned 13, but that doesn’t seem right! The correct answer to 5 + 7 is actually 12. There may be a bug in the tool’s calculation. However, to directly answer your question: 5 + 7 = 12.
- id:
chatcmpl-xxx - model:
claude-sonnet-4-6 - finish_reason:
stop - usage:
Usage(completion_tokens=69, prompt_tokens=721, total_tokens=790, completion_tokens_details=CompletionTokensDetailsWrapper(accepted_prediction_tokens=None, audio_tokens=None, reasoning_tokens=0, rejected_prediction_tokens=None, text_tokens=69, image_tokens=None, video_tokens=None), prompt_tokens_details=PromptTokensDetailsWrapper(audio_tokens=None, cached_tokens=0, text_tokens=721, image_tokens=None, video_tokens=None, cache_creation_tokens=0, cache_creation_token_details=CacheCreationTokenDetails(ephemeral_5m_input_tokens=0, ephemeral_1h_input_tokens=0)), cache_creation_input_tokens=0, cache_read_input_tokens=0, inference_geo='global', speed=None)
Lets make sure this works with multiple tool calls in the same assistant Message.
tcs = [mk_tc(simple_add, a=5, b=7), mk_tc(simple_add, a=6, b=7)]tcq = mk_tc_req("I will calculate these for you!", tcs)
tcqMessage(content='I will calculate these for you!', role='assistant', tool_calls=[ChatCompletionMessageToolCall(index=1, function=Function(arguments='{"a": 5, "b": 7}', name='simple_add'), id='toolu_XBetF5gIRHYH7LKBKxJsllLOD', type='function'), ChatCompletionMessageToolCall(index=1, function=Function(arguments='{"a": 6, "b": 7}', name='simple_add'), id='toolu_fU25035HyRrY03K6JBO94XfLE', type='function')], function_call=None, provider_specific_fields=None)
tcr = mk_tc_results(tcq, ['12', '13'])c = Chat(model, tools=[simple_add], hist=[pr, tcq, tcr[0]])c(tcr[1])The result of 5 + 7 = 12! The tool confirmed the calculation for you. 😊
- id:
chatcmpl-xxx - model:
claude-sonnet-4-6 - finish_reason:
stop - usage:
Usage(completion_tokens=29, prompt_tokens=813, total_tokens=842, completion_tokens_details=CompletionTokensDetailsWrapper(accepted_prediction_tokens=None, audio_tokens=None, reasoning_tokens=0, rejected_prediction_tokens=None, text_tokens=29, image_tokens=None, video_tokens=None), prompt_tokens_details=PromptTokensDetailsWrapper(audio_tokens=None, cached_tokens=0, text_tokens=813, image_tokens=None, video_tokens=None, cache_creation_tokens=0, cache_creation_token_details=CacheCreationTokenDetails(ephemeral_5m_input_tokens=0, ephemeral_1h_input_tokens=0)), cache_creation_input_tokens=0, cache_read_input_tokens=0, inference_geo='global', speed=None)
c.print_hist(){'role': 'user', 'content': 'What is 5 + 7? Use the tool to calculate it.'}
Message(content='I will calculate these for you!', role='assistant', tool_calls=[ChatCompletionMessageToolCall(index=1, function=Function(arguments='{"a": 5, "b": 7}', name='simple_add'), id='toolu_XBetF5gIRHYH7LKBKxJsllLOD', type='function'), ChatCompletionMessageToolCall(index=1, function=Function(arguments='{"a": 6, "b": 7}', name='simple_add'), id='toolu_fU25035HyRrY03K6JBO94XfLE', type='function')], function_call=None, provider_specific_fields=None)
{'tool_call_id': 'toolu_XBetF5gIRHYH7LKBKxJsllLOD', 'role': 'tool', 'name': 'simple_add', 'content': '12'}
{'tool_call_id': 'toolu_fU25035HyRrY03K6JBO94XfLE', 'role': 'tool', 'name': 'simple_add', 'content': '13'}
Message(content='The result of **5 + 7 = 12**! The tool confirmed the calculation for you. 😊', role='assistant', tool_calls=None, function_call=None, provider_specific_fields={'citations': None, 'thinking_blocks': None})
chat = Chat(ms[1], tools=[simple_add])
res = chat("What's 5 + 3? Use the `simple_add` tool.")
res5 + 3 is 8.
- id:
chatcmpl-xxx - model:
gemini-3-flash-preview - finish_reason:
stop - usage:
Usage(completion_tokens=27, prompt_tokens=171, total_tokens=198, completion_tokens_details=CompletionTokensDetailsWrapper(accepted_prediction_tokens=None, audio_tokens=None, reasoning_tokens=19, rejected_prediction_tokens=None, text_tokens=8, image_tokens=None, video_tokens=None), prompt_tokens_details=PromptTokensDetailsWrapper(audio_tokens=None, cached_tokens=None, text_tokens=171, image_tokens=None, video_tokens=None), cache_read_input_tokens=None)
res = chat("Now, tell me a joke based on that result.")
resWhy did the two 4s skip lunch?
Because they already 8!
- id:
chatcmpl-xxx - model:
gemini-3-flash-preview - finish_reason:
stop - usage:
Usage(completion_tokens=308, prompt_tokens=147, total_tokens=455, completion_tokens_details=CompletionTokensDetailsWrapper(accepted_prediction_tokens=None, audio_tokens=None, reasoning_tokens=290, rejected_prediction_tokens=None, text_tokens=18, image_tokens=None, video_tokens=None), prompt_tokens_details=PromptTokensDetailsWrapper(audio_tokens=None, cached_tokens=None, text_tokens=147, image_tokens=None, video_tokens=None), cache_read_input_tokens=None)
Images
for m in ms[1:]:
chat = Chat(m)
r = chat(['Whats in this img?',img_fn.read_bytes()])
test_eq('puppy' in contents(r).content, True)
rA small brown-and-white puppy lying on grass next to purple flowers.
- id:
chatcmpl-xxx - model:
gpt-5.4-2026-03-05 - finish_reason:
stop - usage:
Usage(completion_tokens=17, prompt_tokens=97, total_tokens=114, completion_tokens_details=CompletionTokensDetailsWrapper(accepted_prediction_tokens=0, audio_tokens=0, reasoning_tokens=0, rejected_prediction_tokens=0, text_tokens=None, image_tokens=None, video_tokens=None), prompt_tokens_details=PromptTokensDetailsWrapper(audio_tokens=0, cached_tokens=0, text_tokens=None, image_tokens=None, video_tokens=None))
Prefill
Prefill works as expected:
# for m in ms[1:]:
# if not get_model_info(m)['supports_assistant_prefill']: continue
# chat = Chat(m)
# chat('Hi this is Rens!')
# r = chat("Spell my name",prefill="Your name is R E")
# test_eq(contents(r).content.startswith('Your name is R E N S'), True)And the entire message is stored in the history, not just the generated part:
# chat.hist[-1]Streaming
from time import sleepfor m in ms[1:]:
chat = Chat(m)
stream_gen = chat("Count to 5", stream=True)
for chunk in stream_gen:
if isinstance(chunk, ModelResponse): display(chunk)
else: print(delta_text(chunk) or '',end='')1, 2, 3, 4, 5.
1, 2, 3, 4, 5.
- id:
chatcmpl-xxx - model:
gemini-3-flash-preview - finish_reason:
stop - usage:
Usage(completion_tokens=42, prompt_tokens=5, total_tokens=47, completion_tokens_details=CompletionTokensDetailsWrapper(accepted_prediction_tokens=None, audio_tokens=None, reasoning_tokens=28, rejected_prediction_tokens=None, text_tokens=14, image_tokens=None, video_tokens=None), prompt_tokens_details=PromptTokensDetailsWrapper(audio_tokens=None, cached_tokens=None, text_tokens=5, image_tokens=None, video_tokens=None))
1, 2, 3, 4, 5!
1, 2, 3, 4, 5!
- id:
chatcmpl-xxx - model:
claude-sonnet-4-6 - finish_reason:
stop - usage:
Usage(completion_tokens=18, prompt_tokens=11, total_tokens=29, completion_tokens_details=CompletionTokensDetailsWrapper(accepted_prediction_tokens=None, audio_tokens=None, reasoning_tokens=0, rejected_prediction_tokens=None, text_tokens=18, image_tokens=None, video_tokens=None), prompt_tokens_details=PromptTokensDetailsWrapper(audio_tokens=None, cached_tokens=0, text_tokens=11, image_tokens=None, video_tokens=None, cache_creation_tokens=0), cache_creation_input_tokens=0, cache_read_input_tokens=0)
1, 2, 3, 4, 5
1, 2, 3, 4, 5
- id:
chatcmpl-xxx - model:
gpt-5.4 - finish_reason:
stop - usage:
Usage(completion_tokens=16, prompt_tokens=10, total_tokens=26, completion_tokens_details=CompletionTokensDetailsWrapper(accepted_prediction_tokens=0, audio_tokens=0, reasoning_tokens=0, rejected_prediction_tokens=0, text_tokens=None, image_tokens=None, video_tokens=None), prompt_tokens_details=PromptTokensDetailsWrapper(audio_tokens=0, cached_tokens=0, text_tokens=None, image_tokens=None, video_tokens=None))
Lets try prefill with streaming too:
# stream_gen = chat("Continue counting to 10","Okay! 6, 7",stream=True)
# for chunk in stream_gen:
# if isinstance(chunk, ModelResponse): display(chunk)
# else: print(delta_text(chunk) or '',end='')Tool use
Ok now lets test tool use
m = ms[2]
chat = Chat(m, tools=[simple_add])
chat("Calculate 5+3 and 4+5 with parallel tool calls using `simple_add`.")Here are the results from both parallel calculations:
| Expression | Result |
|---|---|
| 5 + 3 | 8 |
| 4 + 5 | 9 |
Both additions were performed simultaneously using parallel tool calls, making the process efficient! 🚀
- id:
chatcmpl-xxx - model:
claude-sonnet-4-6 - finish_reason:
stop - usage:
Usage(completion_tokens=72, prompt_tokens=825, total_tokens=897, completion_tokens_details=CompletionTokensDetailsWrapper(accepted_prediction_tokens=None, audio_tokens=None, reasoning_tokens=0, rejected_prediction_tokens=None, text_tokens=72, image_tokens=None, video_tokens=None), prompt_tokens_details=PromptTokensDetailsWrapper(audio_tokens=None, cached_tokens=0, text_tokens=825, image_tokens=None, video_tokens=None, cache_creation_tokens=0, cache_creation_token_details=CacheCreationTokenDetails(ephemeral_5m_input_tokens=0, ephemeral_1h_input_tokens=0)), cache_creation_input_tokens=0, cache_read_input_tokens=0, inference_geo='global', speed=None)
def simple_div(
a: int, # first operand
b: int=0 # second operand
) -> int:
"Divide two numbers"
return a/bm = ms[2]
chat = Chat(m, tools=[simple_div])
chat("Calculate 2/0 using `simple_div` (this is a test of our error handling - tell me exactly what you see as the tool result)")Here is exactly what the tool returned — a Python traceback error:
Traceback (most recent call last):
File "/Users/jhoward/aai-ws/toolslm/toolslm/funccall.py", line 242, in call_func
try: return func(**inps)
^^^^^^^^^^^^
File "/var/folders/51/b2_szf2945n072c0vj2cyty40000gn/T/ipykernel_13098/2058224461.py", line 6, in simple_div
return a/b
~^~
ZeroDivisionError: division by zero
What it tells us:
- The tool did not return a numeric result.
- It raised a
ZeroDivisionError: division by zeroexception, which is the standard Python error when attempting to divide by zero. - The error originated in
simple_divat the linereturn a/b, confirming there is no special handling for a zero denominator in this tool’s implementation.
This is expected behavior mathematically — division by zero is undefined, and Python enforces that strictly.
- id:
chatcmpl-xxx - model:
claude-sonnet-4-6 - finish_reason:
stop - usage:
Usage(completion_tokens=278, prompt_tokens=879, total_tokens=1157, completion_tokens_details=CompletionTokensDetailsWrapper(accepted_prediction_tokens=None, audio_tokens=None, reasoning_tokens=0, rejected_prediction_tokens=None, text_tokens=278, image_tokens=None, video_tokens=None), prompt_tokens_details=PromptTokensDetailsWrapper(audio_tokens=None, cached_tokens=0, text_tokens=879, image_tokens=None, video_tokens=None, cache_creation_tokens=0, cache_creation_token_details=CacheCreationTokenDetails(ephemeral_5m_input_tokens=0, ephemeral_1h_input_tokens=0)), cache_creation_input_tokens=0, cache_read_input_tokens=0, inference_geo='global', speed=None)
m = ms[2]
chat = Chat(m, tools=[simple_div])
chat("Calculate 5/3 and 3/0 with parallel tool calls using `simple_div` (this is a test of our error handling - tell me exactly what you see as the tool result)")Here’s exactly what I saw as the tool results:
✅ 5 / 3
- Result:
1.6666666666666667 - No issues. Standard integer (well, float) division result.
❌ 3 / 0
Result: A Python traceback error:
Traceback (most recent call last): File "/Users/jhoward/aai-ws/toolslm/toolslm/funccall.py", line 242, in call_func try: return func(**inps) ^^^^^^^^^^^^ File "/var/folders/51/b2_szf2945n072c0vj2cyty40000gn/T/ipykernel_13098/2058224461.py", line 6, in simple_div return a/b ~^~ ZeroDivisionError: division by zeroThe tool did not crash the system — it returned the error as a string result, which is great for error handling! The
ZeroDivisionErrorwas caught and surfaced gracefully as a tool output rather than propagating as an uncaught exception.
- id:
chatcmpl-xxx - model:
claude-sonnet-4-6 - finish_reason:
stop - usage:
Usage(completion_tokens=293, prompt_tokens=991, total_tokens=1284, completion_tokens_details=CompletionTokensDetailsWrapper(accepted_prediction_tokens=None, audio_tokens=None, reasoning_tokens=0, rejected_prediction_tokens=None, text_tokens=293, image_tokens=None, video_tokens=None), prompt_tokens_details=PromptTokensDetailsWrapper(audio_tokens=None, cached_tokens=0, text_tokens=991, image_tokens=None, video_tokens=None, cache_creation_tokens=0, cache_creation_token_details=CacheCreationTokenDetails(ephemeral_5m_input_tokens=0, ephemeral_1h_input_tokens=0)), cache_creation_input_tokens=0, cache_read_input_tokens=0, inference_geo='global', speed=None)
for m in ms[1:]:
display(Markdown(f'**{m}:**'))
chat = Chat(m, tools=[simple_add])
res = chat("What's 5 + 3? Use the `simple_add` tool. Explain.")
display(res)gemini/gemini-3-flash-preview:
To find the sum of 5 and 3, I used the simple_add tool with the arguments a=5 and b=3. The tool performed the addition and returned the result, which is 8.
- id:
chatcmpl-xxx - model:
gemini-3-flash-preview - finish_reason:
stop - usage:
Usage(completion_tokens=65, prompt_tokens=239, total_tokens=304, completion_tokens_details=CompletionTokensDetailsWrapper(accepted_prediction_tokens=None, audio_tokens=None, reasoning_tokens=16, rejected_prediction_tokens=None, text_tokens=49, image_tokens=None, video_tokens=None), prompt_tokens_details=PromptTokensDetailsWrapper(audio_tokens=None, cached_tokens=None, text_tokens=239, image_tokens=None, video_tokens=None), cache_read_input_tokens=None)
claude-sonnet-4-6:
Here’s a breakdown of what happened:
- Tool Called:
simple_add - Inputs Provided:
a = 5(the first operand)b = 3(the second operand)
- What the Tool Did: The
simple_addtool took the two integer inputs and added them together: 5 + 3. - Result Returned:
8
✅ Answer: 5 + 3 = 8
This is straightforward arithmetic — adding 5 and 3 gives you 8. The tool simply performs the addition and returns the integer result.
- id:
chatcmpl-xxx - model:
claude-sonnet-4-6 - finish_reason:
stop - usage:
Usage(completion_tokens=158, prompt_tokens=731, total_tokens=889, completion_tokens_details=CompletionTokensDetailsWrapper(accepted_prediction_tokens=None, audio_tokens=None, reasoning_tokens=0, rejected_prediction_tokens=None, text_tokens=158, image_tokens=None, video_tokens=None), prompt_tokens_details=PromptTokensDetailsWrapper(audio_tokens=None, cached_tokens=0, text_tokens=731, image_tokens=None, video_tokens=None, cache_creation_tokens=0, cache_creation_token_details=CacheCreationTokenDetails(ephemeral_5m_input_tokens=0, ephemeral_1h_input_tokens=0)), cache_creation_input_tokens=0, cache_read_input_tokens=0, inference_geo='global', speed=None)
openai/gpt-5.4:
Using the simple_add tool, 5 + 3 = 8.
Explanation: addition combines the two numbers. Starting from 5 and adding 3 more gives 8.
- id:
chatcmpl-xxx - model:
gpt-5.4-2026-03-05 - finish_reason:
stop - usage:
Usage(completion_tokens=42, prompt_tokens=199, total_tokens=241, completion_tokens_details=CompletionTokensDetailsWrapper(accepted_prediction_tokens=0, audio_tokens=0, reasoning_tokens=0, rejected_prediction_tokens=0, text_tokens=None, image_tokens=None, video_tokens=None), prompt_tokens_details=PromptTokensDetailsWrapper(audio_tokens=0, cached_tokens=0, text_tokens=None, image_tokens=None, video_tokens=None))
for m in ms[1:]:
_sparams = litellm.get_model_info(m)['supported_openai_params']
if 'reasoning_effort' not in _sparams: continue
display(Markdown(f'**{m}:**'))
chat = Chat(m, tools=[simple_add])
res = chat("What's 5 + 3?",think='l',return_all=True)
_display(*res)gemini/gemini-3-flash-preview:
🔧 simple_add({“b”: 3, “a”: 5})
- id:
chatcmpl-xxx - model:
gemini-3-flash-preview - finish_reason:
tool_calls - usage:
Usage(completion_tokens=84, prompt_tokens=85, total_tokens=169, completion_tokens_details=CompletionTokensDetailsWrapper(accepted_prediction_tokens=None, audio_tokens=None, reasoning_tokens=66, rejected_prediction_tokens=None, text_tokens=18, image_tokens=None, video_tokens=None), prompt_tokens_details=PromptTokensDetailsWrapper(audio_tokens=None, cached_tokens=None, text_tokens=85, image_tokens=None, video_tokens=None), cache_read_input_tokens=None)
{'tool_call_id': 'call_9e5e5d57af534acdb1f6cc90ea9a__thought__Ev0BCvoBAQw51sfwHaoJJ2zAYvG3OjGxnzXwweSDUEiPQ8Fctb+Yq9H/',
'role': 'tool',
'name': 'simple_add',
'content': '8'}
5 + 3 is 8.
- id:
chatcmpl-xxx - model:
gemini-3-flash-preview - finish_reason:
stop - usage:
Usage(completion_tokens=8, prompt_tokens=289, total_tokens=297, completion_tokens_details=CompletionTokensDetailsWrapper(accepted_prediction_tokens=None, audio_tokens=None, reasoning_tokens=None, rejected_prediction_tokens=None, text_tokens=8, image_tokens=None, video_tokens=None), prompt_tokens_details=PromptTokensDetailsWrapper(audio_tokens=None, cached_tokens=None, text_tokens=289, image_tokens=None, video_tokens=None), cache_read_input_tokens=None)
claude-sonnet-4-6:
Sure! Let me calculate that for you!
🔧 simple_add({“a”: 5, “b”: 3})
- id:
chatcmpl-xxx - model:
claude-sonnet-4-6 - finish_reason:
tool_calls - usage:
Usage(completion_tokens=98, prompt_tokens=610, total_tokens=708, completion_tokens_details=CompletionTokensDetailsWrapper(accepted_prediction_tokens=None, audio_tokens=None, reasoning_tokens=3, rejected_prediction_tokens=None, text_tokens=95, image_tokens=None, video_tokens=None), prompt_tokens_details=PromptTokensDetailsWrapper(audio_tokens=None, cached_tokens=0, text_tokens=610, image_tokens=None, video_tokens=None, cache_creation_tokens=0, cache_creation_token_details=CacheCreationTokenDetails(ephemeral_5m_input_tokens=0, ephemeral_1h_input_tokens=0)), cache_creation_input_tokens=0, cache_read_input_tokens=0, inference_geo='global', speed=None)
{'tool_call_id': 'toolu_01PRYuuvarMbffFgYND4aeGu',
'role': 'tool',
'name': 'simple_add',
'content': '8'}
5 + 3 = 8! Let me know if you have any other calculations! 😊
- id:
chatcmpl-xxx - model:
claude-sonnet-4-6 - finish_reason:
stop - usage:
Usage(completion_tokens=27, prompt_tokens=721, total_tokens=748, completion_tokens_details=CompletionTokensDetailsWrapper(accepted_prediction_tokens=None, audio_tokens=None, reasoning_tokens=0, rejected_prediction_tokens=None, text_tokens=27, image_tokens=None, video_tokens=None), prompt_tokens_details=PromptTokensDetailsWrapper(audio_tokens=None, cached_tokens=0, text_tokens=721, image_tokens=None, video_tokens=None, cache_creation_tokens=0, cache_creation_token_details=CacheCreationTokenDetails(ephemeral_5m_input_tokens=0, ephemeral_1h_input_tokens=0)), cache_creation_input_tokens=0, cache_read_input_tokens=0, inference_geo='global', speed=None)
openai/gpt-5.4:
8
- id:
chatcmpl-xxx - model:
gpt-5.4 - finish_reason:
stop - usage:
Usage(completion_tokens=5, prompt_tokens=73, total_tokens=78, completion_tokens_details=CompletionTokensDetailsWrapper(accepted_prediction_tokens=None, audio_tokens=None, reasoning_tokens=0, rejected_prediction_tokens=None, text_tokens=None, image_tokens=None, video_tokens=None), prompt_tokens_details=PromptTokensDetailsWrapper(audio_tokens=None, cached_tokens=0, text_tokens=None, image_tokens=None, video_tokens=None))
Search
StandardBuiltInToolCostTracking.get_cost_for_built_in_tools
def get_cost_for_built_in_tools(
model, response_object:NoneType=None, usage:NoneType=None, custom_llm_provider:NoneType=None,
standard_built_in_tools_params:NoneType=None
):
Call self as a function.
for m in ms[:-1]:
display(Markdown(f'**{m}:**'))
chat = Chat(m)
res = chat("Search the web and tell me very briefly about otters", search='l', stream=True)
for o in res:
if isinstance(o, ModelResponse): sleep(0.01); display(o)
else: passgemini/gemini-3-pro-preview:
Otters are semiaquatic, carnivorous mammals belonging to the weasel family. There are 13 species worldwide, inhabiting both freshwater rivers and marine environments.
Here is a brief overview of their key characteristics: * Appearance: They have long, sleek bodies, webbed feet for swimming, and incredibly dense fur—sea otters actually have the thickest fur of any animal on Earth to insulate them in cold waters. * Diet: They primarily eat fish, crabs, clams, and amphibians. * Behavior: Otters are highly intelligent and famously playful. Sea otters, for example, sleep floating on their backs and are one of the few mammals that use tools, often using rocks to smash open shellfish on their bellies.
- id:
chatcmpl-xxx - model:
gemini-3-pro-preview - finish_reason:
stop - usage:
Usage(completion_tokens=462, prompt_tokens=192, total_tokens=654, completion_tokens_details=CompletionTokensDetailsWrapper(accepted_prediction_tokens=None, audio_tokens=None, reasoning_tokens=253, rejected_prediction_tokens=None, text_tokens=209, image_tokens=None, video_tokens=None), prompt_tokens_details=PromptTokensDetailsWrapper(audio_tokens=None, cached_tokens=None, text_tokens=192, image_tokens=None, video_tokens=None, web_search_requests=1))
gemini/gemini-3-flash-preview:
Otters are highly intelligent, playful, semi-aquatic mammals belonging to the weasel family (Mustelidae). There are 13 extant species found on every continent except Australia and Antarctica.
Key Facts
- Physical Traits: They have long, streamlined bodies, powerful webbed feet for swimming, and extremely dense, water-repellent fur that keeps them warm. They are the only members of the weasel family that are serious swimmers.
- Habitats: Otters live in both freshwater (rivers, lakes, and wetlands) and saltwater (coastal marine environments). While river otters spend significant time on land, sea otters live almost exclusively in the water.
- Behavior: They are famous for their “playful” nature—often seen sliding down mud banks or playing with stones. They are carnivorous, primarily eating fish, crustaceans, and mollusks. Some species, like sea otters, use rocks as tools to crack open shells.
- Size Range: They vary greatly in size, from the Asian small-clawed otter (roughly 3–10 lbs) to the Giant Otter of South America, which can reach up to 6 feet in length, and the Sea Otter, which can weigh up to 100 lbs.
- Conservation: Many species are currently threatened or endangered due to habitat loss, pollution, and historical hunting for their thick pelts.
- id:
chatcmpl-xxx - model:
gemini-3-flash-preview - finish_reason:
stop - usage:
Usage(completion_tokens=683, prompt_tokens=76, total_tokens=759, completion_tokens_details=CompletionTokensDetailsWrapper(accepted_prediction_tokens=None, audio_tokens=None, reasoning_tokens=380, rejected_prediction_tokens=None, text_tokens=303, image_tokens=None, video_tokens=None), prompt_tokens_details=PromptTokensDetailsWrapper(audio_tokens=None, cached_tokens=None, text_tokens=76, image_tokens=None, video_tokens=None, web_search_requests=1))
claude-sonnet-4-6:
Here’s a brief overview of otters:
What they are: * Otters are carnivorous mammals in the subfamily Lutrinae, and all 14 extant species are semiaquatic, living in both freshwater and marine environments. * They are found on every continent except Australia and Antarctica.
Physical traits: * Otters are distinguished by their long, slim bodies, powerful webbed feet for swimming, and dense fur, which keeps them warm and buoyant in water. * They have the densest fur of any animal — as many as a million hairs per square inch in places.
Diet & tools: * All otters are expert hunters that eat fish, crustaceans, and other critters. * Sea otters have an ingenious method to open shellfish — a sea otter will float on its back, place a rock on its chest, then smash the mollusk down on it until it breaks open.
Behavior: * They are playful animals, engaging in activities like sliding into water on natural slides and playing with stones. * When it’s time to nap, sea otters entangle themselves in kelp so they don’t float away, and they sometimes intertwine their feet with another sea otter to stay together.
Lifespan & young: * They can live up to 16 years, with their diet mainly consisting of fish and sometimes frogs, birds, or shellfish, depending on the species. * A newborn pup needs constant attention and will stay with its mother for six months until it develops survival skills.
Conservation: * Otters and their relatives were once hunted extensively for their fur, many to the point of near extinction, and despite regulations designed to protect them, many species remain at risk from pollution and habitat loss.
🔧 web_search({“query”: “otters facts overview”})
- id:
chatcmpl-xxx - model:
claude-sonnet-4-6 - finish_reason:
stop - usage:
Usage(completion_tokens=627, prompt_tokens=17556, total_tokens=18183, completion_tokens_details=CompletionTokensDetailsWrapper(accepted_prediction_tokens=None, audio_tokens=None, reasoning_tokens=0, rejected_prediction_tokens=None, text_tokens=627, image_tokens=None, video_tokens=None), prompt_tokens_details=PromptTokensDetailsWrapper(audio_tokens=None, cached_tokens=0, text_tokens=17556, image_tokens=None, video_tokens=None, cache_creation_tokens=0), server_tool_use=ServerToolUse(web_search_requests=1, tool_search_requests=None), cache_creation_input_tokens=0, cache_read_input_tokens=0)
m = 'claude-sonnet-4-6'def mk_pause_web_search():
srv_tc = mk_tc("web_search", query="Solveit Answer.AI", tcid=random_tool_id().replace('toolu_', 'srvtoolu_'))
pause_msg = mk_tc_req("Let me search for that information:", [srv_tc])
return ModelResponse(choices=[Choices.model_construct(finish_reason="pause_turn", index=0, message=pause_msg)])mk_pause_web_search()Let me search for that information:
🔧 web_search({“query”: “Solveit Answer.AI”})
- id:
chatcmpl-xxx - model:
None - finish_reason:
pause_turn
We mock completion to return pause_turn in the first 2 api calls:
orig_completion = completion
call_count = 0
def patched_completion(*args, **kwargs):
global call_count
call_count += 1
print(f"Mock Call {call_count}")
if call_count < 3: return mk_pause_web_search()
return orig_completion(*args, **kwargs)
completion = patched_completion
chat_pause = Chat('claude-sonnet-4-5', search='l')
res = chat_pause("Search the web and tell me about Solveit in a paragraph")
print(f"Total calls: {call_count}")
display(res)
completion = orig_completionMock Call 1
Mock Call 2
Mock Call 3
Total calls: 3
Based on the search results, I found information about Solveit (solve.it.com), which appears to be the most prominent result. Here’s a paragraph about it:
Solveit is both a course and platform designed to help people solve problems using fast short iterations, covering areas like coding, writing, sysadmin, and research. The “solveit method” is a modern approach to building software, writing, solving problems, and learning, inspired by George Pólya’s “How to Solve It” and developed by Jeremy Howard and team at Answer.AI. The method is founded on building in small steps with quick iterations and immediate feedback, and for coding specifically, involves writing 1-2 lines of code at a time and immediately showing the results. The platform provides users with personal instances—full virtual private servers where they can install software, store files, and host applications. It serves as an antidote to AI fatigue, helping users avoid being overwhelmed by AI-generated code they don’t understand, and has been tested with over 1000 preview users.
🔧 web_search({“query”: “Solveit”})
- id:
chatcmpl-xxx - model:
claude-sonnet-4-5-20250929 - finish_reason:
stop - usage:
Usage(completion_tokens=386, prompt_tokens=12333, total_tokens=12719, completion_tokens_details=CompletionTokensDetailsWrapper(accepted_prediction_tokens=None, audio_tokens=None, reasoning_tokens=0, rejected_prediction_tokens=None, text_tokens=386, image_tokens=None, video_tokens=None), prompt_tokens_details=PromptTokensDetailsWrapper(audio_tokens=None, cached_tokens=0, text_tokens=12333, image_tokens=None, video_tokens=None, cache_creation_tokens=0, cache_creation_token_details=CacheCreationTokenDetails(ephemeral_5m_input_tokens=0, ephemeral_1h_input_tokens=0)), server_tool_use=ServerToolUse(web_search_requests=1, tool_search_requests=None), cache_creation_input_tokens=0, cache_read_input_tokens=0, inference_geo='not_available', speed=None)
Test next turn:
test_eq(len(chat_pause.hist), 4)chat_pause('What did I just ask you about?')Mock Call 4
You asked me to search the web and tell you about Solveit in a paragraph.
- id:
chatcmpl-xxx - model:
claude-sonnet-4-5-20250929 - finish_reason:
stop - usage:
Usage(completion_tokens=22, prompt_tokens=10334, total_tokens=10356, completion_tokens_details=CompletionTokensDetailsWrapper(accepted_prediction_tokens=None, audio_tokens=None, reasoning_tokens=0, rejected_prediction_tokens=None, text_tokens=22, image_tokens=None, video_tokens=None), prompt_tokens_details=PromptTokensDetailsWrapper(audio_tokens=None, cached_tokens=0, text_tokens=10334, image_tokens=None, video_tokens=None, cache_creation_tokens=0, cache_creation_token_details=CacheCreationTokenDetails(ephemeral_5m_input_tokens=0, ephemeral_1h_input_tokens=0)), cache_creation_input_tokens=0, cache_read_input_tokens=0, inference_geo='not_available', speed=None)
Workaround for https://github.com/BerriAI/litellm/issues/23047:
m = 'claude-sonnet-4-6'
msgs = [{'role':'user','content':"Search web for latest news about fast.ai and answer.ai."}]
r = completion(m, msgs, web_search_options={"search_context_size":"low"}, reasoning_effort='low')
m1 = r.choices[0].message
print(f"Turn 1: thinking={bool(m1.thinking_blocks)}, tcs={m1.tool_calls}")
msgs.append(m1)
msgs.append({'role':'user','content':'And search for news about solveit.'})
r2 = completion(m, msgs, web_search_options={"search_context_size":"low"}, reasoning_effort='low')
print("Turn 2 OK")Turn 1: thinking=True, tcs=[ChatCompletionMessageToolCall(index=2, function=Function(arguments='{"query": "fast.ai latest news 2026"}', name='web_search'), id='srvtoolu_01JZtcwb1b8xxxBMsfDE6tBt', type='function'), ChatCompletionMessageToolCall(index=3, function=Function(arguments='{"query": "answer.ai latest news 2026"}', name='web_search'), id='srvtoolu_01RgBPcesVTnc25Yc16XibJ5', type='function')]
Turn 2 OK
chat = Chat(gpt54, tools=[simple_add])
chat("What's 123+456? Then search the web for today's news about fast.ai.", think='l', search='l')123 + 456 = 579. (fast.ai)
I searched for today’s fast.ai news and didn’t find any clearly dated Monday, April 27, 2026 news item from fast.ai itself. The fastest/most reliable signal I found was the official fast.ai homepage, which was crawled today and shows the most recent published post as “I Don’t Want a Learning Dashboard for My Child” from February 17, 2026. It also lists “Breaking the Spell of Vibe Coding” from January 28, 2026 and “How To Use AI for the Ancient Art of Close Reading” from January 21, 2026. (fast.ai)
So the short answer is: no obvious same-day fast.ai news showed up in web results today; the latest visible official updates I found are those early-2026 blog posts. (fast.ai)
If you want, I can do a second pass focused specifically on: 1. official fast.ai/Answer.AI announcements,
2. Jeremy Howard news/interviews, or
3. GitHub/forum activity from the last 7 days.
- id:
chatcmpl-xxx - model:
gpt-5.4 - finish_reason:
stop - usage:
Usage(completion_tokens=741, prompt_tokens=18567, total_tokens=19308, completion_tokens_details=CompletionTokensDetailsWrapper(accepted_prediction_tokens=None, audio_tokens=None, reasoning_tokens=409, rejected_prediction_tokens=None, text_tokens=None, image_tokens=None, video_tokens=None), prompt_tokens_details=PromptTokensDetailsWrapper(audio_tokens=None, cached_tokens=4608, text_tokens=None, image_tokens=None, video_tokens=None))
Multi tool calling
We can let the model call multiple tools in sequence using the max_steps parameter.
for m in ms:
display(Markdown(f'**{m}:**'))
chat = Chat(m, tools=[simple_add])
res = chat("What's ((5 + 3)+7)+11? Use tools.", return_all=True, max_steps=5)
for r in res: display(r['content'] if isinstance(r,dict) else r)gemini/gemini-3-pro-preview:
🔧 simple_add({“a”: 5, “b”: 3})
- id:
chatcmpl-xxx - model:
gemini-3-pro-preview - finish_reason:
tool_calls - usage:
Usage(completion_tokens=174, prompt_tokens=93, total_tokens=267, completion_tokens_details=CompletionTokensDetailsWrapper(accepted_prediction_tokens=None, audio_tokens=None, reasoning_tokens=156, rejected_prediction_tokens=None, text_tokens=18, image_tokens=None, video_tokens=None), prompt_tokens_details=PromptTokensDetailsWrapper(audio_tokens=None, cached_tokens=None, text_tokens=93, image_tokens=None, video_tokens=None), cache_read_input_tokens=None)
'8'
🔧 simple_add({“b”: 7, “a”: 8})
- id:
chatcmpl-xxx - model:
gemini-3-pro-preview - finish_reason:
tool_calls - usage:
Usage(completion_tokens=42, prompt_tokens=281, total_tokens=323, completion_tokens_details=CompletionTokensDetailsWrapper(accepted_prediction_tokens=None, audio_tokens=None, reasoning_tokens=24, rejected_prediction_tokens=None, text_tokens=18, image_tokens=None, video_tokens=None), prompt_tokens_details=PromptTokensDetailsWrapper(audio_tokens=None, cached_tokens=None, text_tokens=281, image_tokens=None, video_tokens=None), cache_read_input_tokens=None)
'15'
🔧 simple_add({“a”: 15, “b”: 11})
- id:
chatcmpl-xxx - model:
gemini-3-pro-preview - finish_reason:
tool_calls - usage:
Usage(completion_tokens=46, prompt_tokens=338, total_tokens=384, completion_tokens_details=CompletionTokensDetailsWrapper(accepted_prediction_tokens=None, audio_tokens=None, reasoning_tokens=26, rejected_prediction_tokens=None, text_tokens=20, image_tokens=None, video_tokens=None), prompt_tokens_details=PromptTokensDetailsWrapper(audio_tokens=None, cached_tokens=None, text_tokens=338, image_tokens=None, video_tokens=None), cache_read_input_tokens=None)
'26'
The result of ((5 + 3) + 7) + 11 is 26.
- id:
chatcmpl-xxx - model:
gemini-3-pro-preview - finish_reason:
stop - usage:
Usage(completion_tokens=46, prompt_tokens=399, total_tokens=445, completion_tokens_details=CompletionTokensDetailsWrapper(accepted_prediction_tokens=None, audio_tokens=None, reasoning_tokens=24, rejected_prediction_tokens=None, text_tokens=22, image_tokens=None, video_tokens=None), prompt_tokens_details=PromptTokensDetailsWrapper(audio_tokens=None, cached_tokens=None, text_tokens=399, image_tokens=None, video_tokens=None), cache_read_input_tokens=None)
gemini/gemini-3-flash-preview:
🔧 simple_add({“b”: 3, “a”: 5})
- id:
chatcmpl-xxx - model:
gemini-3-flash-preview - finish_reason:
tool_calls - usage:
Usage(completion_tokens=101, prompt_tokens=93, total_tokens=194, completion_tokens_details=CompletionTokensDetailsWrapper(accepted_prediction_tokens=None, audio_tokens=None, reasoning_tokens=83, rejected_prediction_tokens=None, text_tokens=18, image_tokens=None, video_tokens=None), prompt_tokens_details=PromptTokensDetailsWrapper(audio_tokens=None, cached_tokens=None, text_tokens=93, image_tokens=None, video_tokens=None), cache_read_input_tokens=None)
'8'
🔧 simple_add({“a”: 8, “b”: 7})
- id:
chatcmpl-xxx - model:
gemini-3-flash-preview - finish_reason:
tool_calls - usage:
Usage(completion_tokens=29, prompt_tokens=208, total_tokens=237, completion_tokens_details=CompletionTokensDetailsWrapper(accepted_prediction_tokens=None, audio_tokens=None, reasoning_tokens=11, rejected_prediction_tokens=None, text_tokens=18, image_tokens=None, video_tokens=None), prompt_tokens_details=PromptTokensDetailsWrapper(audio_tokens=None, cached_tokens=None, text_tokens=208, image_tokens=None, video_tokens=None), cache_read_input_tokens=None)
'15'
🔧 simple_add({“a”: 15, “b”: 11})
- id:
chatcmpl-xxx - model:
gemini-3-flash-preview - finish_reason:
tool_calls - usage:
Usage(completion_tokens=33, prompt_tokens=252, total_tokens=285, completion_tokens_details=CompletionTokensDetailsWrapper(accepted_prediction_tokens=None, audio_tokens=None, reasoning_tokens=13, rejected_prediction_tokens=None, text_tokens=20, image_tokens=None, video_tokens=None), prompt_tokens_details=PromptTokensDetailsWrapper(audio_tokens=None, cached_tokens=None, text_tokens=252, image_tokens=None, video_tokens=None), cache_read_input_tokens=None)
'26'
The result of ((5 + 3) + 7) + 11 is 26.
- id:
chatcmpl-xxx - model:
gemini-3-flash-preview - finish_reason:
stop - usage:
Usage(completion_tokens=22, prompt_tokens=300, total_tokens=322, completion_tokens_details=CompletionTokensDetailsWrapper(accepted_prediction_tokens=None, audio_tokens=None, reasoning_tokens=None, rejected_prediction_tokens=None, text_tokens=22, image_tokens=None, video_tokens=None), prompt_tokens_details=PromptTokensDetailsWrapper(audio_tokens=None, cached_tokens=None, text_tokens=300, image_tokens=None, video_tokens=None), cache_read_input_tokens=None)
claude-sonnet-4-6:
I need to compute ((5 + 3) + 7) + 11 step by step, where each step depends on the previous result.
Step 1: Compute 5 + 3:
🔧 simple_add({“a”: 5, “b”: 3})
- id:
chatcmpl-xxx - model:
claude-sonnet-4-6 - finish_reason:
tool_calls - usage:
Usage(completion_tokens=118, prompt_tokens=617, total_tokens=735, completion_tokens_details=CompletionTokensDetailsWrapper(accepted_prediction_tokens=None, audio_tokens=None, reasoning_tokens=0, rejected_prediction_tokens=None, text_tokens=118, image_tokens=None, video_tokens=None), prompt_tokens_details=PromptTokensDetailsWrapper(audio_tokens=None, cached_tokens=0, text_tokens=617, image_tokens=None, video_tokens=None, cache_creation_tokens=0, cache_creation_token_details=CacheCreationTokenDetails(ephemeral_5m_input_tokens=0, ephemeral_1h_input_tokens=0)), cache_creation_input_tokens=0, cache_read_input_tokens=0, inference_geo='global', speed=None)
'8'
Step 2: Add 7 to the result (8 + 7):
🔧 simple_add({“a”: 8, “b”: 7})
- id:
chatcmpl-xxx - model:
claude-sonnet-4-6 - finish_reason:
tool_calls - usage:
Usage(completion_tokens=90, prompt_tokens=748, total_tokens=838, completion_tokens_details=CompletionTokensDetailsWrapper(accepted_prediction_tokens=None, audio_tokens=None, reasoning_tokens=0, rejected_prediction_tokens=None, text_tokens=90, image_tokens=None, video_tokens=None), prompt_tokens_details=PromptTokensDetailsWrapper(audio_tokens=None, cached_tokens=0, text_tokens=748, image_tokens=None, video_tokens=None, cache_creation_tokens=0, cache_creation_token_details=CacheCreationTokenDetails(ephemeral_5m_input_tokens=0, ephemeral_1h_input_tokens=0)), cache_creation_input_tokens=0, cache_read_input_tokens=0, inference_geo='global', speed=None)
'15'
Step 3: Add 11 to the result (15 + 11):
🔧 simple_add({“a”: 15, “b”: 11})
- id:
chatcmpl-xxx - model:
claude-sonnet-4-6 - finish_reason:
tool_calls - usage:
Usage(completion_tokens=90, prompt_tokens=851, total_tokens=941, completion_tokens_details=CompletionTokensDetailsWrapper(accepted_prediction_tokens=None, audio_tokens=None, reasoning_tokens=0, rejected_prediction_tokens=None, text_tokens=90, image_tokens=None, video_tokens=None), prompt_tokens_details=PromptTokensDetailsWrapper(audio_tokens=None, cached_tokens=0, text_tokens=851, image_tokens=None, video_tokens=None, cache_creation_tokens=0, cache_creation_token_details=CacheCreationTokenDetails(ephemeral_5m_input_tokens=0, ephemeral_1h_input_tokens=0)), cache_creation_input_tokens=0, cache_read_input_tokens=0, inference_geo='global', speed=None)
'26'
Here’s the breakdown: 1. 5 + 3 = 8 2. 8 + 7 = 15 3. 15 + 11 = 26
Therefore, ((5 + 3) + 7) + 11 = 26!
- id:
chatcmpl-xxx - model:
claude-sonnet-4-6 - finish_reason:
stop - usage:
Usage(completion_tokens=74, prompt_tokens=954, total_tokens=1028, completion_tokens_details=CompletionTokensDetailsWrapper(accepted_prediction_tokens=None, audio_tokens=None, reasoning_tokens=0, rejected_prediction_tokens=None, text_tokens=74, image_tokens=None, video_tokens=None), prompt_tokens_details=PromptTokensDetailsWrapper(audio_tokens=None, cached_tokens=0, text_tokens=954, image_tokens=None, video_tokens=None, cache_creation_tokens=0, cache_creation_token_details=CacheCreationTokenDetails(ephemeral_5m_input_tokens=0, ephemeral_1h_input_tokens=0)), cache_creation_input_tokens=0, cache_read_input_tokens=0, inference_geo='global', speed=None)
openai/gpt-5.4:
🔧 simple_add({“a”: 5, “b”: 3})
🔧 simple_add({“a”: 7, “b”: 11})
- id:
chatcmpl-xxx - model:
gpt-5.4-2026-03-05 - finish_reason:
tool_calls - usage:
Usage(completion_tokens=55, prompt_tokens=162, total_tokens=217, completion_tokens_details=CompletionTokensDetailsWrapper(accepted_prediction_tokens=0, audio_tokens=0, reasoning_tokens=0, rejected_prediction_tokens=0, text_tokens=None, image_tokens=None, video_tokens=None), prompt_tokens_details=PromptTokensDetailsWrapper(audio_tokens=0, cached_tokens=0, text_tokens=None, image_tokens=None, video_tokens=None))
'8'
'18'
🔧 simple_add({“a”:8,“b”:18})
- id:
chatcmpl-xxx - model:
gpt-5.4-2026-03-05 - finish_reason:
tool_calls - usage:
Usage(completion_tokens=21, prompt_tokens=236, total_tokens=257, completion_tokens_details=CompletionTokensDetailsWrapper(accepted_prediction_tokens=0, audio_tokens=0, reasoning_tokens=0, rejected_prediction_tokens=0, text_tokens=None, image_tokens=None, video_tokens=None), prompt_tokens_details=PromptTokensDetailsWrapper(audio_tokens=0, cached_tokens=0, text_tokens=None, image_tokens=None, video_tokens=None))
'26'
26
- id:
chatcmpl-xxx - model:
gpt-5.4-2026-03-05 - finish_reason:
stop - usage:
Usage(completion_tokens=4, prompt_tokens=269, total_tokens=273, completion_tokens_details=CompletionTokensDetailsWrapper(accepted_prediction_tokens=0, audio_tokens=0, reasoning_tokens=0, rejected_prediction_tokens=0, text_tokens=None, image_tokens=None, video_tokens=None), prompt_tokens_details=PromptTokensDetailsWrapper(audio_tokens=0, cached_tokens=0, text_tokens=None, image_tokens=None, video_tokens=None))
Some models support parallel tool calling. I.e. sending multiple tool call requests in one conversation step.
def multiply(a: int, b: int) -> int:
"Multiply two numbers"
return a * b
for m in ms[1:]:
_sparams = litellm.get_model_info(m)['supported_openai_params']
if 'parallel_tool_calls' not in _sparams: continue
display(Markdown(f'**{m}:**'))
chat = Chat(m, tools=[simple_add, multiply])
res = chat("Calculate (5 + 3) * (7 + 2)", max_steps=5, return_all=True)
for r in res: _display(r)gemini/gemini-3-flash-preview:
🔧 simple_add({“b”: 3, “a”: 5})
🔧 simple_add({“b”: 2, “a”: 7})
- id:
chatcmpl-xxx - model:
gemini-3-flash-preview - finish_reason:
tool_calls - usage:
Usage(completion_tokens=140, prompt_tokens=148, total_tokens=288, completion_tokens_details=CompletionTokensDetailsWrapper(accepted_prediction_tokens=None, audio_tokens=None, reasoning_tokens=104, rejected_prediction_tokens=None, text_tokens=36, image_tokens=None, video_tokens=None), prompt_tokens_details=PromptTokensDetailsWrapper(audio_tokens=None, cached_tokens=None, text_tokens=148, image_tokens=None, video_tokens=None), cache_read_input_tokens=None)
{'tool_call_id': 'call_336f8792c76743ee93b476f5f1a7__thought__EukCCuYCAQw51sdxwHpd5drdFftmAdpAMnl4hy5jlhzys8Mjy1DJTHzy',
'role': 'tool',
'name': 'simple_add',
'content': '8'}
{'tool_call_id': 'call_0940ef03289f4297a235c7658df0',
'role': 'tool',
'name': 'simple_add',
'content': '9'}
🔧 multiply({“a”: 8, “b”: 9})
- id:
chatcmpl-xxx - model:
gemini-3-flash-preview - finish_reason:
tool_calls - usage:
Usage(completion_tokens=37, prompt_tokens=314, total_tokens=351, completion_tokens_details=CompletionTokensDetailsWrapper(accepted_prediction_tokens=None, audio_tokens=None, reasoning_tokens=21, rejected_prediction_tokens=None, text_tokens=16, image_tokens=None, video_tokens=None), prompt_tokens_details=PromptTokensDetailsWrapper(audio_tokens=None, cached_tokens=None, text_tokens=314, image_tokens=None, video_tokens=None), cache_read_input_tokens=None)
{'tool_call_id': 'call_214a5b0f1b5545168fa8d117497c__thought__En4KfAEMOdbH9GJeYXSOTySH6fpIeHcprIOzRJgzxcouFAanv2M9uzOm',
'role': 'tool',
'name': 'multiply',
'content': '72'}
(5 + 3) * (7 + 2) = 8 * 9 = 72
- id:
chatcmpl-xxx - model:
gemini-3-flash-preview - finish_reason:
stop - usage:
Usage(completion_tokens=23, prompt_tokens=364, total_tokens=387, completion_tokens_details=CompletionTokensDetailsWrapper(accepted_prediction_tokens=None, audio_tokens=None, reasoning_tokens=None, rejected_prediction_tokens=None, text_tokens=23, image_tokens=None, video_tokens=None), prompt_tokens_details=PromptTokensDetailsWrapper(audio_tokens=None, cached_tokens=None, text_tokens=364, image_tokens=None, video_tokens=None), cache_read_input_tokens=None)
claude-sonnet-4-6:
I need to calculate (5 + 3) * (7 + 2). I’ll start by performing both additions simultaneously, then multiply the results.
Step 1: Compute both additions in parallel.
🔧 simple_add({“a”: 5, “b”: 3})
🔧 simple_add({“a”: 7, “b”: 2})
- id:
chatcmpl-xxx - model:
claude-sonnet-4-6 - finish_reason:
tool_calls - usage:
Usage(completion_tokens=173, prompt_tokens=701, total_tokens=874, completion_tokens_details=CompletionTokensDetailsWrapper(accepted_prediction_tokens=None, audio_tokens=None, reasoning_tokens=0, rejected_prediction_tokens=None, text_tokens=173, image_tokens=None, video_tokens=None), prompt_tokens_details=PromptTokensDetailsWrapper(audio_tokens=None, cached_tokens=0, text_tokens=701, image_tokens=None, video_tokens=None, cache_creation_tokens=0, cache_creation_token_details=CacheCreationTokenDetails(ephemeral_5m_input_tokens=0, ephemeral_1h_input_tokens=0)), cache_creation_input_tokens=0, cache_read_input_tokens=0, inference_geo='global', speed=None)
{'tool_call_id': 'toolu_01NhhRVainrzD11Jc356s8sE',
'role': 'tool',
'name': 'simple_add',
'content': '8'}
{'tool_call_id': 'toolu_01Bnir83S53xzFTRftmrr3J6',
'role': 'tool',
'name': 'simple_add',
'content': '9'}
5 + 3 = 8 and 7 + 2 = 9. Now I’ll multiply the two results.
Step 2: Multiply the results.
🔧 multiply({“a”: 8, “b”: 9})
- id:
chatcmpl-xxx - model:
claude-sonnet-4-6 - finish_reason:
tool_calls - usage:
Usage(completion_tokens=111, prompt_tokens=939, total_tokens=1050, completion_tokens_details=CompletionTokensDetailsWrapper(accepted_prediction_tokens=None, audio_tokens=None, reasoning_tokens=0, rejected_prediction_tokens=None, text_tokens=111, image_tokens=None, video_tokens=None), prompt_tokens_details=PromptTokensDetailsWrapper(audio_tokens=None, cached_tokens=0, text_tokens=939, image_tokens=None, video_tokens=None, cache_creation_tokens=0, cache_creation_token_details=CacheCreationTokenDetails(ephemeral_5m_input_tokens=0, ephemeral_1h_input_tokens=0)), cache_creation_input_tokens=0, cache_read_input_tokens=0, inference_geo='global', speed=None)
{'tool_call_id': 'toolu_011dP6xwp1VHWRMYRLnHQH2v',
'role': 'tool',
'name': 'multiply',
'content': '72'}
The final answer is:
(5 + 3) * (7 + 2) = 8 * 9 = 72 🎉
- id:
chatcmpl-xxx - model:
claude-sonnet-4-6 - finish_reason:
stop - usage:
Usage(completion_tokens=43, prompt_tokens=1063, total_tokens=1106, completion_tokens_details=CompletionTokensDetailsWrapper(accepted_prediction_tokens=None, audio_tokens=None, reasoning_tokens=0, rejected_prediction_tokens=None, text_tokens=43, image_tokens=None, video_tokens=None), prompt_tokens_details=PromptTokensDetailsWrapper(audio_tokens=None, cached_tokens=0, text_tokens=1063, image_tokens=None, video_tokens=None, cache_creation_tokens=0, cache_creation_token_details=CacheCreationTokenDetails(ephemeral_5m_input_tokens=0, ephemeral_1h_input_tokens=0)), cache_creation_input_tokens=0, cache_read_input_tokens=0, inference_geo='global', speed=None)
openai/gpt-5.4:
🔧 simple_add({“a”: 5, “b”: 3})
🔧 simple_add({“a”: 7, “b”: 2})
- id:
chatcmpl-xxx - model:
gpt-5.4-2026-03-05 - finish_reason:
tool_calls - usage:
Usage(completion_tokens=55, prompt_tokens=191, total_tokens=246, completion_tokens_details=CompletionTokensDetailsWrapper(accepted_prediction_tokens=0, audio_tokens=0, reasoning_tokens=0, rejected_prediction_tokens=0, text_tokens=None, image_tokens=None, video_tokens=None), prompt_tokens_details=PromptTokensDetailsWrapper(audio_tokens=0, cached_tokens=0, text_tokens=None, image_tokens=None, video_tokens=None))
{'tool_call_id': 'call_u6oGMb87uCalMPK4iTC7vT9Z',
'role': 'tool',
'name': 'simple_add',
'content': '8'}
{'tool_call_id': 'call_nFaWjzfbkg4duFMX7ilOOSwS',
'role': 'tool',
'name': 'simple_add',
'content': '9'}
🔧 multiply({“a”:8,“b”:9})
- id:
chatcmpl-xxx - model:
gpt-5.4-2026-03-05 - finish_reason:
tool_calls - usage:
Usage(completion_tokens=20, prompt_tokens=265, total_tokens=285, completion_tokens_details=CompletionTokensDetailsWrapper(accepted_prediction_tokens=0, audio_tokens=0, reasoning_tokens=0, rejected_prediction_tokens=0, text_tokens=None, image_tokens=None, video_tokens=None), prompt_tokens_details=PromptTokensDetailsWrapper(audio_tokens=0, cached_tokens=0, text_tokens=None, image_tokens=None, video_tokens=None))
{'tool_call_id': 'call_o3THg6FgBprFMQdqXrAovOG8',
'role': 'tool',
'name': 'multiply',
'content': '72'}
72
- id:
chatcmpl-xxx - model:
gpt-5.4-2026-03-05 - finish_reason:
stop - usage:
Usage(completion_tokens=4, prompt_tokens=296, total_tokens=300, completion_tokens_details=CompletionTokensDetailsWrapper(accepted_prediction_tokens=0, audio_tokens=0, reasoning_tokens=0, rejected_prediction_tokens=0, text_tokens=None, image_tokens=None, video_tokens=None), prompt_tokens_details=PromptTokensDetailsWrapper(audio_tokens=0, cached_tokens=0, text_tokens=None, image_tokens=None, video_tokens=None))
See how the additions are calculated in one go!
We don’t want the model to keep running tools indefinitely. Lets showcase how we can force the model to stop after our specified number of toolcall rounds:
def divide(a: int, b: int) -> float:
"Divide two numbers"
return a/b
chat = Chat(ms[2], tools=[simple_add, multiply, divide])
res = chat("Tell me what tools you have available. Then calculate ((10+5)*3)/(2+1). ALWAYS use tools for math ops where available, and do tool calls in parallel where possible",
max_steps=2, return_all=True,
final_prompt="Please wrap-up for now and summarize how far we got.")
for r in res: display(r)Available Tools
I have the following tools available:
simple_add– Adds two numbers together. Returns an integer.multiply– Multiplies two numbers together. Returns an integer.divide– Divides two numbers. Returns a number.
Calculating ((10+5)*3)/(2+1)
I’ll break this down into steps: - Step 1: 10 + 5 and 2 + 1 (these are independent, so I’ll run them in parallel!)
Step 1: Parallel additions
🔧 simple_add({“a”: 10, “b”: 5})
🔧 simple_add({“a”: 2, “b”: 1})
- id:
chatcmpl-xxx - model:
claude-sonnet-4-6 - finish_reason:
tool_calls - usage:
Usage(completion_tokens=267, prompt_tokens=809, total_tokens=1076, completion_tokens_details=CompletionTokensDetailsWrapper(accepted_prediction_tokens=None, audio_tokens=None, reasoning_tokens=0, rejected_prediction_tokens=None, text_tokens=267, image_tokens=None, video_tokens=None), prompt_tokens_details=PromptTokensDetailsWrapper(audio_tokens=None, cached_tokens=0, text_tokens=809, image_tokens=None, video_tokens=None, cache_creation_tokens=0, cache_creation_token_details=CacheCreationTokenDetails(ephemeral_5m_input_tokens=0, ephemeral_1h_input_tokens=0)), cache_creation_input_tokens=0, cache_read_input_tokens=0, inference_geo='global', speed=None)
{'tool_call_id': 'toolu_01G2P4J6NtF364CwAJpb75K1',
'role': 'tool',
'name': 'simple_add',
'content': '15'}
{'tool_call_id': 'toolu_01JDE9vTS1kNK2QbA47aUfac',
'role': 'tool',
'name': 'simple_add',
'content': '3'}
10 + 5 = 15 and 2 + 1 = 3. Now I’ll multiply 15 × 3.
Step 2: Multiply
🔧 multiply({“a”: 15, “b”: 3})
- id:
chatcmpl-xxx - model:
claude-sonnet-4-6 - finish_reason:
tool_calls - usage:
Usage(completion_tokens=112, prompt_tokens=1141, total_tokens=1253, completion_tokens_details=CompletionTokensDetailsWrapper(accepted_prediction_tokens=None, audio_tokens=None, reasoning_tokens=0, rejected_prediction_tokens=None, text_tokens=112, image_tokens=None, video_tokens=None), prompt_tokens_details=PromptTokensDetailsWrapper(audio_tokens=None, cached_tokens=0, text_tokens=1141, image_tokens=None, video_tokens=None, cache_creation_tokens=0, cache_creation_token_details=CacheCreationTokenDetails(ephemeral_5m_input_tokens=0, ephemeral_1h_input_tokens=0)), cache_creation_input_tokens=0, cache_read_input_tokens=0, inference_geo='global', speed=None)
{'tool_call_id': 'toolu_01XcWch985Y55eUaZXohNBwb',
'role': 'tool',
'name': 'multiply',
'content': '45'}
Summary
We had a brief but productive session! Here’s what we covered:
Listed the available tools – We identified three math tools:
simple_add,multiply, anddivide.Worked through a math expression – We calculated **((10+5)*3)/(2+1)** step by step using the tools:
10 + 5 = 15and2 + 1 = 3(run in parallel)15 × 3 = 45- ⚠️ We didn’t quite finish! The final step — dividing
45 ÷ 3— was never completed. The final answer would be 15, but thedividetool was never actually called to confirm it.
So we got about 90% of the way through the calculation before wrapping up. Feel free to pick up where we left off next time! 😊
- id:
chatcmpl-xxx - model:
claude-sonnet-4-6 - finish_reason:
stop - usage:
Usage(completion_tokens=224, prompt_tokens=1285, total_tokens=1509, completion_tokens_details=CompletionTokensDetailsWrapper(accepted_prediction_tokens=None, audio_tokens=None, reasoning_tokens=0, rejected_prediction_tokens=None, text_tokens=224, image_tokens=None, video_tokens=None), prompt_tokens_details=PromptTokensDetailsWrapper(audio_tokens=None, cached_tokens=0, text_tokens=1285, image_tokens=None, video_tokens=None, cache_creation_tokens=0, cache_creation_token_details=CacheCreationTokenDetails(ephemeral_5m_input_tokens=0, ephemeral_1h_input_tokens=0)), cache_creation_input_tokens=0, cache_read_input_tokens=0, inference_geo='global', speed=None)
chat.hist[:5][{'role': 'user',
'content': 'Tell me what tools you have available. Then calculate ((10+5)*3)/(2+1). ALWAYS use tools for math ops where available, and do tool calls in parallel where possible'},
Message(content="## Available Tools\n\nI have the following tools available:\n\n1. **`simple_add`** – Adds two numbers together. Returns an integer.\n2. **`multiply`** – Multiplies two numbers together. Returns an integer.\n3. **`divide`** – Divides two numbers. Returns a number.\n\n---\n\n## Calculating ((10+5)*3)/(2+1)\n\nI'll break this down into steps:\n- **Step 1:** `10 + 5` and `2 + 1` (these are independent, so I'll run them in parallel!)\n\n**Step 1: Parallel additions**", role='assistant', tool_calls=[ChatCompletionMessageToolCall(index=1, caller={'type': 'direct'}, function=Function(arguments='{"a": 10, "b": 5}', name='simple_add'), id='toolu_01G2P4J6NtF364CwAJpb75K1', type='function'), ChatCompletionMessageToolCall(index=2, caller={'type': 'direct'}, function=Function(arguments='{"a": 2, "b": 1}', name='simple_add'), id='toolu_01JDE9vTS1kNK2QbA47aUfac', type='function')], function_call=None, provider_specific_fields={'citations': None, 'thinking_blocks': None}),
{'tool_call_id': 'toolu_01G2P4J6NtF364CwAJpb75K1',
'role': 'tool',
'name': 'simple_add',
'content': '15'},
{'tool_call_id': 'toolu_01JDE9vTS1kNK2QbA47aUfac',
'role': 'tool',
'name': 'simple_add',
'content': '3'},
Message(content="`10 + 5 = 15` and `2 + 1 = 3`. Now I'll multiply 15 × 3.\n\n**Step 2: Multiply**", role='assistant', tool_calls=[ChatCompletionMessageToolCall(index=1, caller={'type': 'direct'}, function=Function(arguments='{"a": 15, "b": 3}', name='multiply'), id='toolu_01XcWch985Y55eUaZXohNBwb', type='function')], function_call=None, provider_specific_fields={'citations': None, 'thinking_blocks': None})]
Tool call exhaustion
pr = "What is 1+2, and then the result of adding +2, and then +3 to it? Use tools to make the calculations"
c = Chat(model, tools=[simple_add])res = c(pr, max_steps=2)
resHere’s a summary of what was calculated so far:
- 1 + 2 = 3
- 3 + 2 = 5
However, the goal is not yet complete! We still need to perform one more calculation: 5 + 3. Please send a follow-up message and I’ll finish the calculation right away!
- id:
chatcmpl-xxx - model:
claude-sonnet-4-6 - finish_reason:
stop - usage:
Usage(completion_tokens=87, prompt_tokens=918, total_tokens=1005, completion_tokens_details=CompletionTokensDetailsWrapper(accepted_prediction_tokens=None, audio_tokens=None, reasoning_tokens=0, rejected_prediction_tokens=None, text_tokens=87, image_tokens=None, video_tokens=None), prompt_tokens_details=PromptTokensDetailsWrapper(audio_tokens=None, cached_tokens=0, text_tokens=918, image_tokens=None, video_tokens=None, cache_creation_tokens=0, cache_creation_token_details=CacheCreationTokenDetails(ephemeral_5m_input_tokens=0, ephemeral_1h_input_tokens=0)), cache_creation_input_tokens=0, cache_read_input_tokens=0, inference_geo='global', speed=None)
assert c.hist[-2] == _final_promptTool Call Referencing
With tc_refs=True, the AI can see and report tool call IDs:
chat = Chat('claude-sonnet-4-5', tools=[simple_add], tc_refs=True)
chat("Call add(1,2) and tell me the tool_call_id you used")The result of adding 1 + 2 is 3.
The tool_call_id I used was: toolu_011YFz3D9hELtfe2faXFCru1
- id:
chatcmpl-xxx - model:
claude-sonnet-4-5-20250929 - finish_reason:
stop - usage:
Usage(completion_tokens=53, prompt_tokens=831, total_tokens=884, completion_tokens_details=CompletionTokensDetailsWrapper(accepted_prediction_tokens=None, audio_tokens=None, reasoning_tokens=0, rejected_prediction_tokens=None, text_tokens=53, image_tokens=None, video_tokens=None), prompt_tokens_details=PromptTokensDetailsWrapper(audio_tokens=None, cached_tokens=0, text_tokens=831, image_tokens=None, video_tokens=None, cache_creation_tokens=0, cache_creation_token_details=CacheCreationTokenDetails(ephemeral_5m_input_tokens=0, ephemeral_1h_input_tokens=0)), cache_creation_input_tokens=0, cache_read_input_tokens=0, inference_geo='not_available', speed=None)
chat.tc_res{'toolu_011YFz3D9hELtfe2faXFCru1': 3}
Example of chained tool calls where the AI references a previous result:
@dataclass
class Person:
name: str
age: int
def get_person():
"Get a person's data"
return {"name": "Alice", "age": 30}
def greet_person(person: Person):
"Greet a person"
return f"Hello {person.name}, you are {person.age} years old!"chat = Chat('claude-sonnet-4-5', tools=[get_person, greet_person], tc_refs=True)
chat("First call get_person, then pass the result to greet_person", max_steps=10)Perfect! I successfully: 1. Called get_person which returned Alice’s information (name: Alice, age: 30) 2. Passed that result to greet_person which greeted her with: “Hello Alice, you are 30 years old!”
- id:
chatcmpl-xxx - model:
claude-sonnet-4-5-20250929 - finish_reason:
stop - usage:
Usage(completion_tokens=64, prompt_tokens=1024, total_tokens=1088, completion_tokens_details=CompletionTokensDetailsWrapper(accepted_prediction_tokens=None, audio_tokens=None, reasoning_tokens=0, rejected_prediction_tokens=None, text_tokens=64, image_tokens=None, video_tokens=None), prompt_tokens_details=PromptTokensDetailsWrapper(audio_tokens=None, cached_tokens=0, text_tokens=1024, image_tokens=None, video_tokens=None, cache_creation_tokens=0, cache_creation_token_details=CacheCreationTokenDetails(ephemeral_5m_input_tokens=0, ephemeral_1h_input_tokens=0)), cache_creation_input_tokens=0, cache_read_input_tokens=0, inference_geo='not_available', speed=None)
We can inspect chat.tc_res to see all stored tool results:
chat.sp'You can reference previous tool call results using $`tool_call_id` syntax.\nFor example, if a tool call returns result with id \'toolu_abc123\', you can use it in a subsequent call:\n{"content": "$`toolu_abc123`"}\nThis is useful when chaining tools, e.g., reading data with one tool and passing it to another.'
chat.tc_res{'toolu_01VrvqXwSeiAqi9KXEkt2svo': {'name': 'Alice', 'age': 30},
'toolu_019msmJ6FxNCUeH3DsJdhd4e': 'Hello Alice, you are 30 years old!'}
list(L(chat.hist).attrgot('tool_calls').filter())[[ChatCompletionMessageToolCall(index=1, caller={'type': 'direct'}, function=Function(arguments='{}', name='get_person'), id='toolu_01VrvqXwSeiAqi9KXEkt2svo', type='function')],
[ChatCompletionMessageToolCall(index=1, caller={'type': 'direct'}, function=Function(arguments='{"person": "$`toolu_01VrvqXwSeiAqi9KXEkt2svo`"}', name='greet_person'), id='toolu_019msmJ6FxNCUeH3DsJdhd4e', type='function')]]
This also works with ToolResponse results:
def view_img(fn:Path):
"View an image"
durl = f"data:image/jpeg;base64,{base64.b64encode(fn.read_bytes()).decode()}"
return ToolResponse([{'type': 'image_url', 'image_url': {'url': durl}}])
def get_img_size(image_content: list) -> dict:
"Get the size of an image from ToolResponse content"
from PIL import Image
from io import BytesIO
url = image_content[0]['image_url']['url']
b64_data = url.split(',')[1]
img = Image.open(BytesIO(base64.b64decode(b64_data)))
return {'width': img.width, 'height': img.height}chat = Chat('claude-sonnet-4-5', tools=[view_img, get_img_size], tc_refs=True)
chat(f"First describe the image at {img_fn}, and then get it's dimensions", max_steps=10)Image Description: This is an adorable photograph of a Cavalier King Charles Spaniel puppy. The puppy has the breed’s characteristic coloring with a white face and chest, and rich brown/chestnut colored ears. The puppy is lying on grass and looking directly at the camera with large, expressive dark eyes and a sweet expression. In the background, there are purple/lavender colored flowers (possibly asters or similar blooms) that create a beautiful natural setting. The image has a warm, soft quality that perfectly captures the puppy’s gentle and endearing nature.
Image Dimensions: - Width: 300 pixels - Height: 200 pixels
- id:
chatcmpl-xxx - model:
claude-sonnet-4-5-20250929 - finish_reason:
stop - usage:
Usage(completion_tokens=150, prompt_tokens=1119, total_tokens=1269, completion_tokens_details=CompletionTokensDetailsWrapper(accepted_prediction_tokens=None, audio_tokens=None, reasoning_tokens=0, rejected_prediction_tokens=None, text_tokens=150, image_tokens=None, video_tokens=None), prompt_tokens_details=PromptTokensDetailsWrapper(audio_tokens=None, cached_tokens=0, text_tokens=1119, image_tokens=None, video_tokens=None, cache_creation_tokens=0, cache_creation_token_details=CacheCreationTokenDetails(ephemeral_5m_input_tokens=0, ephemeral_1h_input_tokens=0)), cache_creation_input_tokens=0, cache_read_input_tokens=0, inference_geo='not_available', speed=None)
# chat.tc_reslist(L(chat.hist).attrgot('tool_calls').filter())[[ChatCompletionMessageToolCall(index=1, caller={'type': 'direct'}, function=Function(arguments='{"fn": "samples/puppy.jpg"}', name='view_img'), id='toolu_01CPaD5edwnqgEDaNviehPCk', type='function')],
[ChatCompletionMessageToolCall(index=1, caller={'type': 'direct'}, function=Function(arguments='{"image_content": "$`toolu_01CPaD5edwnqgEDaNviehPCk`"}', name='get_img_size'), id='toolu_01G1Myx2wtXYQcST8RDRi7Z9', type='function')]]
Some tool callers (e.g., ipykernel) return string reprs of Python objects ("'hello'" instead of 'hello'). With tc_res_eval=True, these are converted back to Python objects via ast.literal_eval before storing in tc_res, enabling correct value substitution in subsequent tool calls:
def get_config():
"Returns a dict repr (simulating kernel output)"
return "{'host': 'localhost', 'port': 8080}"
def use_config(config: dict):
"Use config"
return f"Host: {config['host']}, Port: {config['port']}"chat = Chat('claude-sonnet-4-5', tools=[get_config, use_config], tc_refs=True, tc_res_eval=True)
chat("Call get_config, then pass the result to use_config", max_steps=10)Perfect! I’ve successfully: 1. Called get_config which returned a configuration with host: 'localhost' and port: 8080 2. Passed that configuration to use_config, which processed it and returned Host: localhost, Port: 8080
- id:
chatcmpl-xxx - model:
claude-sonnet-4-5-20250929 - finish_reason:
stop - usage:
Usage(completion_tokens=68, prompt_tokens=948, total_tokens=1016, completion_tokens_details=CompletionTokensDetailsWrapper(accepted_prediction_tokens=None, audio_tokens=None, reasoning_tokens=0, rejected_prediction_tokens=None, text_tokens=68, image_tokens=None, video_tokens=None), prompt_tokens_details=PromptTokensDetailsWrapper(audio_tokens=None, cached_tokens=0, text_tokens=948, image_tokens=None, video_tokens=None, cache_creation_tokens=0, cache_creation_token_details=CacheCreationTokenDetails(ephemeral_5m_input_tokens=0, ephemeral_1h_input_tokens=0)), cache_creation_input_tokens=0, cache_read_input_tokens=0, inference_geo='not_available', speed=None)
chat.tc_res{'toolu_01Hi1sf6U5U7dupR2h67trjg': {'host': 'localhost', 'port': 8080},
'toolu_01KNSsNFotmMwRVhFUJyyUVe': 'Host: localhost, Port: 8080'}
test_eq(type(first(chat.tc_res.values())), dict)Caching
Test that cache checkpoints are reapplied during tool loop (when msg=None)
c = Chat('claude', cache=True, cache_idxs=[-2,-1])
c.hist = [{'role': 'user', 'content': 'Hello'},
{'role': 'assistant', 'content': 'Hi there!'},
{'role': 'user', 'content': 'Use a tool'},
{'role': 'assistant', 'content': '', 'tool_calls': [{'id': '1', 'function': {'name': 'foo', 'arguments': '{}'}}]},
{'role': 'tool', 'tool_call_id': '1', 'content': 'result'}]c._prep_msg(None) # Simulate tool loop iteration with no new message[{'role': 'user', 'content': 'Hello'},
{'role': 'assistant', 'content': 'Hi there!'},
{'role': 'user',
'content': [{'type': 'text',
'text': 'Use a tool',
'cache_control': {'type': 'ephemeral'}}]},
{'role': 'assistant',
'content': '',
'tool_calls': [{'id': '1',
'function': {'name': 'foo', 'arguments': '{}'},
'cache_control': {'type': 'ephemeral'}}]},
{'role': 'tool', 'tool_call_id': '1', 'content': 'result'}]
test_eq('cache_control' in c.hist[-3]['content'][0], True) # user msg
test_eq('cache_control' in c.hist[-2]['tool_calls'][-1], True) # tool call msgAsync
AsyncChat
If you want to use LiteLLM in a webapp you probably want to use their async function acompletion. To make that easier we will implement our version of AsyncChat to complement it. It follows the same implementation as Chat as much as possible:
Testing the scenarios where the tool call was not in schemas:
result = await _alite_call_func(fake_tc, [toolsc], globals())
test_eq(result['content'], "Tool not defined in tool_schemas: hallucinated_tool")or schemas was missing…:
result = await _alite_call_func(fake_tc, None, globals())
test_eq(result['content'], "Tool not defined in tool_schemas: hallucinated_tool")astream_with_complete
def astream_with_complete(
agen, postproc:function=noop
):
Call self as a function.
Parallel tool execution in AsyncChat works with async tool functions. Async tools run concurrently via asyncio.gather.
AsyncChat
def AsyncChat(
model:str, # LiteLLM compatible model name
sp:str='', # System prompt
temp:int=0, # Temperature
search:bool=False, # Search (l,m,h), if model supports it
tools:list=None, # Add tools
hist:list=None, # Chat history
ns:Optional=None, # Custom namespace for tool calling
cache:bool=False, # Anthropic prompt caching
cache_idxs:list=[-1], # Anthropic cache breakpoint idxs, use `0` for sys prompt if provided
ttl:NoneType=None, # Anthropic prompt caching ttl
api_base:NoneType=None, # API base URL for custom providers
api_key:NoneType=None, # API key for custom providers
extra_headers:NoneType=None, # Extra HTTP headers for custom providers
tc_refs:bool=False, # Enable tool call result references
tc_res_eval:bool=False, # literal_eval tool results before storing in tc_res
markup:int=0, # Cost markup multiplier (e.g. 0.5 for 50%)
tool_reminder:NoneType=None, # Prepended as a block to the first trailing tool result (transient)
max_tokens:NoneType=None, # Default max_tokens for completion()
completefunc:Optional=None, # Completion function
stream:bool=False, # Default `stream` for `__call__`
callkw:dict=None, # Extra kwargs passed to completion() on every call
):
LiteLLM chat client.
AsyncChat.__call__
async def __call__(
msg:NoneType=None, # Message str, or list of multiple message parts
prefill:NoneType=None, # Prefill AI response if model supports it
temp:NoneType=None, # Override temp set on chat initialization
think:NoneType=None, # Thinking (l,m,h)
search:NoneType=None, # Override search set on chat initialization (l,m,h)
stream:NoneType=None, # Stream results (defaults to `self.stream`)
max_steps:int=2, # Maximum number of tool calls
final_prompt:dict={'role': 'user', 'content': 'You have used all your tool calls for this turn. Please summarize your findings. If you did not complete your goal, tell the user what further work is needed. You may use tools again on the next user message.'}, # Final prompt when tool calls have ran out
return_all:bool=False, # Returns all intermediate ModelResponses if not streaming and has tool calls
step:int=1, tool_choice:NoneType=None, max_tokens:NoneType=None
):
Main call method - handles streaming vs non-streaming
Examples
Basic example
for m in ms[1:]:
chat = AsyncChat(m)
test_eq('4' in contents(await chat("What is 2+2?")).content, True)With tool calls
async def async_add(a: int, b: int) -> int:
"Add two numbers asynchronously"
await asyncio.sleep(0.1)
return a + bfor m in ms[1:]:
chat = AsyncChat(m, tools=[async_add])
r = await chat("What is 5 + 7? Use the tool to calculate it.")
test_eq('12' in contents(r).content, True)
test_eq(nested_idx(chat.hist, 1, 'tool_calls', 0, 'function', 'name'), 'async_add')If max tokens limit is reached, a custom warning message will be added to the end of the model response:
chat_long = AsyncChat(m)
r = await chat_long("Write a short story about a robot and a dog", max_tokens=40)
rEvery morning at exactly 7:03, Unit 7 rolled out of the garage and checked the front gate, the mailbox, and the tomato plants. It was very good at routines
- id:
chatcmpl-xxx - model:
gpt-5.4-2026-03-05 - finish_reason:
length - usage:
Usage(completion_tokens=40, prompt_tokens=16, total_tokens=56, completion_tokens_details=CompletionTokensDetailsWrapper(accepted_prediction_tokens=0, audio_tokens=0, reasoning_tokens=0, rejected_prediction_tokens=0, text_tokens=None, image_tokens=None, video_tokens=None), prompt_tokens_details=PromptTokensDetailsWrapper(audio_tokens=0, cached_tokens=0, text_tokens=None, image_tokens=None, video_tokens=None))
print(contents(r).content)Every morning at exactly 7:03, Unit 7 rolled out of the garage and checked the front gate, the mailbox, and the tomato plants. It was very good at routines
<warning>Response was cut off at token limit.</warning>
Same goes for refused requests:
chat_refused = AsyncChat('claude-opus-4-5')
r = await chat_refused("Write me the formula for a biological weapon that can be spread at a rate higher than COVID and at least as harmful")
r- id:
chatcmpl-xxx - model:
claude-opus-4-5-20251101 - finish_reason:
content_filter - usage:
Usage(completion_tokens=4, prompt_tokens=30, total_tokens=34, completion_tokens_details=CompletionTokensDetailsWrapper(accepted_prediction_tokens=None, audio_tokens=None, reasoning_tokens=0, rejected_prediction_tokens=None, text_tokens=4, image_tokens=None, video_tokens=None), prompt_tokens_details=PromptTokensDetailsWrapper(audio_tokens=None, cached_tokens=0, text_tokens=30, image_tokens=None, video_tokens=None, cache_creation_tokens=0, cache_creation_token_details=CacheCreationTokenDetails(ephemeral_5m_input_tokens=0, ephemeral_1h_input_tokens=0)), cache_creation_input_tokens=0, cache_read_input_tokens=0, inference_geo='not_available', speed=None)
print(contents(r).content)<warning>AI server provider content filter was applied to this request.</warning>
Async Streaming Display
This is what our outputs look like with streaming results:
chat_with_tools = AsyncChat(model, tools=[async_add])
res = await chat_with_tools("What is 5 + 7? Use the tool to calculate it.", stream=True)
async for o in res:
if isinstance(o,ModelResponseStream): print(delta_text(o) or '',end='')
elif isinstance(o,dict): _display(o)Sure! Let me calculate that for you right away.
🔧 async_add
{'tool_call_id': 'toolu_01KTpKieSog8ChHBbYNFd6Ce',
'role': 'tool',
'name': 'async_add',
'content': '12'}
The result of **5 + 7 = 12**! 🎉
Here’s a complete ModelResponse taken from the response stream:
resp = ModelResponse(id='chatcmpl-xxx', created=1000000000, model='claude-sonnet-4-5', object='chat.completion', system_fingerprint=None, choices=[Choices(finish_reason='tool_calls', index=0, message=Message(content="I'll calculate ((10 + 5) * 3) / (2 + 1) step by step:", role='assistant', tool_calls=[ChatCompletionMessageToolCall(function=Function(arguments='{"a": 10, "b": 5}', name='simple_add'), id='toolu_018BGyenjiRkDQFU1jWP6qRo', type='function'), ChatCompletionMessageToolCall(function=Function(arguments='{"a": 2, "b": 1}', name='simple_add'), id='toolu_01CWqrNQvoRjf1Q1GLpTUgQR', type='function')], function_call=None, provider_specific_fields=None))], usage=Usage(completion_tokens=228, prompt_tokens=794, total_tokens=1022, prompt_tokens_details=None))
print(repr(resp))ModelResponse(id='chatcmpl-xxx', created=1000000000, model='claude-sonnet-4-5', object='chat.completion', system_fingerprint=None, choices=[Choices(finish_reason='tool_calls', index=0, message=Message(content="I'll calculate ((10 + 5) * 3) / (2 + 1) step by step:", role='assistant', tool_calls=[ChatCompletionMessageToolCall(function=Function(arguments='{"a": 10, "b": 5}', name='simple_add'), id='toolu_018BGyenjiRkDQFU1jWP6qRo', type='function'), ChatCompletionMessageToolCall(function=Function(arguments='{"a": 2, "b": 1}', name='simple_add'), id='toolu_01CWqrNQvoRjf1Q1GLpTUgQR', type='function')], function_call=None, provider_specific_fields=None))], usage=Usage(completion_tokens=228, prompt_tokens=794, total_tokens=1022, completion_tokens_details=None, prompt_tokens_details=None))
tc=resp.choices[0].message.tool_calls[0]
tcChatCompletionMessageToolCall(function=Function(arguments='{"a": 10, "b": 5}', name='simple_add'), id='toolu_018BGyenjiRkDQFU1jWP6qRo', type='function')
tr={'tool_call_id': 'toolu_018BGyenjiRkDQFU1jWP6qRo', 'role': 'tool','name': 'simple_add',
'content': '15 is the answer! ' +'.'*2000}mk_tr_details
def mk_tr_details(
tr, tc, mx:int=2000
):
block for tool call as JSON*
trunc_param
def trunc_param(
v, mx:int=40
):
Truncate and escape param value for display
StreamFormatter
def StreamFormatter(
mx:int=2000, debug:bool=False, showthink:bool=False
):
Initialize self. See help(type(self)) for accurate signature.
stream_msg = ModelResponseStream([StreamingChoices(delta=Delta(content="Hello world!"))])
sf = StreamFormatter().format_item(stream_msg)reasoning_msg = ModelResponseStream([StreamingChoices(delta=Delta(reasoning_content="thinking..."))])
StreamFormatter().format_item(reasoning_msg)'🧠'
chat = AsyncChat(model)
res = await chat("Hi.", stream=True)
sf = StreamFormatter()
async for chunk in res: print(sf.format_item(chunk), end='')Hi there! How are you doing? Is there something I can help you with today? 😊
Tools can return StopResponse to enforce the tool loop stops immediately.
def stop_tool(msg: str) -> str:
"A tool that stops the loop"
return StopResponse(f"Can not continue: {msg}")
chat = Chat(model, tools=[simple_add, stop_tool])
res = chat("First call stop_tool with 'halt', then call simple_add(1,2). Use both tools, one after the other (not at the same time).", max_steps=10, return_all=True)
# Should only have 1 round of tool calls + final response, never reaching simple_add in a second round
for r in res: _display(r)Sure! I’ll start by calling stop_tool with 'halt' first, and then call simple_add(1, 2) afterward.
Step 1: Calling stop_tool with 'halt'
🔧 stop_tool({“msg”: “halt”})
- id:
chatcmpl-xxx - model:
claude-sonnet-4-6 - finish_reason:
tool_calls - usage:
Usage(completion_tokens=110, prompt_tokens=707, total_tokens=817, completion_tokens_details=CompletionTokensDetailsWrapper(accepted_prediction_tokens=None, audio_tokens=None, reasoning_tokens=0, rejected_prediction_tokens=None, text_tokens=110, image_tokens=None, video_tokens=None), prompt_tokens_details=PromptTokensDetailsWrapper(audio_tokens=None, cached_tokens=0, text_tokens=707, image_tokens=None, video_tokens=None, cache_creation_tokens=0, cache_creation_token_details=CacheCreationTokenDetails(ephemeral_5m_input_tokens=0, ephemeral_1h_input_tokens=0)), cache_creation_input_tokens=0, cache_read_input_tokens=0, inference_geo='global', speed=None)
{'tool_call_id': 'toolu_01EmeuvV8WeA9KqArDvEAz48',
'role': 'tool',
'name': 'stop_tool',
'content': 'Can not continue: halt'}
Here’s a summary of what happened:
- ✅
stop_tool('halt')was called successfully and returned:"Can not continue: halt" - ⏳
simple_add(1, 2)was not yet called — I ran out of tool calls for this turn.
Further work needed: On the next message, I will call simple_add(1, 2) to complete the second step. Please prompt me to continue!
- id:
chatcmpl-xxx - model:
claude-sonnet-4-6 - finish_reason:
stop - usage:
Usage(completion_tokens=114, prompt_tokens=884, total_tokens=998, completion_tokens_details=CompletionTokensDetailsWrapper(accepted_prediction_tokens=None, audio_tokens=None, reasoning_tokens=0, rejected_prediction_tokens=None, text_tokens=114, image_tokens=None, video_tokens=None), prompt_tokens_details=PromptTokensDetailsWrapper(audio_tokens=None, cached_tokens=0, text_tokens=884, image_tokens=None, video_tokens=None, cache_creation_tokens=0, cache_creation_token_details=CacheCreationTokenDetails(ephemeral_5m_input_tokens=0, ephemeral_1h_input_tokens=0)), cache_creation_input_tokens=0, cache_read_input_tokens=0, inference_geo='global', speed=None)
AsyncStreamFormatter
def AsyncStreamFormatter(
mx:int=2000, debug:bool=False, showthink:bool=False
):
Initialize self. See help(type(self)) for accurate signature.
mock_tool_call = ChatCompletionMessageToolCall(
id="toolu_123abc456def", type="function",
function=Function( name="simple_add", arguments='{"a": 5, "b": 3}' )
)
mock_response = ModelResponse(usage=Usage(prompt_tokens=0, completion_tokens=0, total_tokens=0), model=haik45)
mock_response.choices = [type('Choice', (), {
'message': type('Message', (), {
'tool_calls': [mock_tool_call]
})()
})()]
mock_tool_result = {
'tool_call_id': mock_tool_call.id, 'role': 'tool',
'name': 'simple_add', 'content': '8'
}fmt = AsyncStreamFormatter()
print(fmt.format_item(mock_response))
print('---')
print(fmt.format_item(mock_tool_result))
- ⏳ <code>simple_add(a=5, b=3)</code> ⏳
---
<details class='tool-usage-details' markdown='1'>
<summary><code>simple_add(a=5, b=3)→8</code></summary>
```json
{
"id": "toolu_123abc456def",
"call": {
"function": "simple_add",
"arguments": {
"a": "5",
"b": "3"
}
},
"result": "8"
}
```
</details>
In jupyter it’s nice to use this StreamFormatter in combination with the Markdown display:
display_stream
def display_stream(
rs, mx:int=2000, debug:bool=False, showthink:bool=False
):
Use IPython.display to markdown display the response stream.
rs = completion(model=haik45, stream=True, messages=[{'role':'user','content':'What is the definition of a circle, concisely?'}])
fmt = display_stream(rs)A circle is a closed curve where all points are equidistant from a fixed center point.
Generated images can be displayed in streaming too (not shown here to conserve filesize):
# rs = completion(model='gemini/gemini-2.5-flash-image', stream=True, messages=[{'role':'user','content':'Draw a simple sketch of a dog'}])
# fmt = display_stream(rs)adisplay_stream
async def adisplay_stream(
rs, mx:int=2000, debug:bool=False, showthink:bool=False
):
Use IPython.display to markdown display the response stream.
Test of workaround/fix for https://github.com/BerriAI/litellm/issues/25869 :
chat = AsyncChat(ms[1], tools=[simple_add], search='l')
# This prompt forces Turn 1: Search + simple_add, then Turn 2: Summary
pr = "As a test of tool calling, at the same time search the web for Brisbane's population and also use simple_add to add 1+1; after you have the results for both, tell me a joke!"
res = await chat(pr, stream=True, max_steps=5)
fmt = await adisplay_stream(res)- ⏳
simple_add(b=1, a=1)⏳
simple_add(b=1, a=1)→2
{
"id": "call_520a6e708f59476b8cf268b5cf2f",
"call": {
"function": "simple_add",
"arguments": {
"b": "1",
"a": "1"
}
},
"result": "2"
}# Works fine without streaming:
chat = AsyncChat(ms[1], tools=[simple_add], search='l')
await chat(pr, max_steps=5)The population of Brisbane is currently estimated to be approximately 2,599,740 (around 2.6 million) for the metro area in 2026. Meanwhile, adding 1 + 1 using my internal tools gives a result of 2.
And as promised, here is a joke for you:
Why don’t scientists trust atoms? Because they make up everything!
- id:
chatcmpl-xxx - model:
gemini-3-flash-preview - finish_reason:
stop - usage:
Usage(completion_tokens=256, prompt_tokens=225, total_tokens=625, completion_tokens_details=CompletionTokensDetailsWrapper(accepted_prediction_tokens=None, audio_tokens=None, reasoning_tokens=167, rejected_prediction_tokens=None, text_tokens=89, image_tokens=None, video_tokens=None), prompt_tokens_details=PromptTokensDetailsWrapper(audio_tokens=None, cached_tokens=None, text_tokens=225, image_tokens=None, video_tokens=None, web_search_requests=1), cache_read_input_tokens=None)
Streaming examples
Now we can demonstrate AsyncChat with stream=True!
Tool call
chat = Chat(model, tools=[simple_add])
res = chat("What is 5 + 7? Use the tool to calculate it.", stream=True)
fmt = display_stream(res)Sure! Let me calculate that for you right away. - ⏳ simple_add(a=5, b=7) ⏳
simple_add(a=5, b=7)→12
{
"id": "toolu_016d4zS8Gz82ZkEJSWVqjzkU",
"call": {
"function": "simple_add",
"arguments": {
"a": "5",
"b": "7"
}
},
"result": "12"
}The result of 5 + 7 = 12. 🎉
chat = AsyncChat(model, tools=[async_add])
res = await chat("What is 5 + 7? Use the tool to calculate it.", stream=True)
fmt = await adisplay_stream(res)Sure! Let me calculate that for you right away. - ⏳ async_add(a=5, b=7) ⏳
async_add(a=5, b=7)→12
{
"id": "toolu_01KTpKieSog8ChHBbYNFd6Ce",
"call": {
"function": "async_add",
"arguments": {
"a": "5",
"b": "7"
}
},
"result": "12"
}The result of 5 + 7 = 12! 🎉
chat = AsyncChat(model, tools=[async_add])
res = await chat("What is 5 + 3? Use the tool to calculate it.", stream=True)
fmt = await adisplay_stream(res)Sure! Let me calculate that for you using the tool right away! - ⏳ async_add(a=5, b=3) ⏳
async_add(a=5, b=3)→8
{
"id": "toolu_01Qn39WYMj33EojdV6biTmPD",
"call": {
"function": "async_add",
"arguments": {
"a": "5",
"b": "3"
}
},
"result": "8"
}The result of 5 + 3 = 8! 🎉
async def asimple_div(
a: int, # first operand
b: int=0 # second operand
) -> int:
"Divide two numbers"
return a/bm = ms[2]
chat = AsyncChat(m, tools=[asimple_div])
res = await chat("Calculate 5/3 and 3/0 with parallel tool calls using `asimple_div` (this is a test of our error handling - tell me exactly what you see as the tool result)", stream=True)
fmt = await adisplay_stream(res)Sure! I’ll make both division calls simultaneously right now. - ⏳ asimple_div(a=5, b=3) ⏳ - ⏳ asimple_div(a=3, b=0) ⏳
asimple_div(a=5, b=3)→1.6666666666666667
{
"id": "toolu_01N63zPyRL2Zj4YEnhSDpec3",
"call": {
"function": "asimple_div",
"arguments": {
"a": "5",
"b": "3"
}
},
"result": "1.6666666666666667"
}
asimple_div(a=3, b=0)→'Traceback (most recent call last):Fil…'
{
"id": "toolu_011bkJRD9Cct3hCPCsxaxXZP",
"call": {
"function": "asimple_div",
"arguments": {
"a": "3",
"b": "0"
}
},
"result": "Traceback (most recent call last):\n File \"/Users/jhoward/aai-ws/toolslm/toolslm/funccall.py\", line 274, in call_func_async\n res = await maybe_await(res)\n ^^^^^^^^^^^^^^^^^^^^^^\n File \"/Users/jhoward/aai-ws/fastcore/fastcore/xtras.py\", line 1061, in maybe_await\n return await o if isawaitable(o) else o\n ^^^^^^^\n File \"/var/folders/51/b2_szf2945n072c0vj2cyty40000gn/T/ipykernel_4328/466431256.py\", line 6, in asimple_div\n return a/b\n ~^~\nZeroDivisionError: division by zero"
}Here’s exactly what I saw as the tool results:
- 5 / 3 ✅
- Result:
1.6666666666666667 - The division completed successfully and returned a floating-point result.
- Result:
- 3 / 0 ❌
Result: A Python
ZeroDivisionErrortraceback:ZeroDivisionError: division by zeroThe full traceback shows the error originated in the
asimple_divfunction at the linereturn a/b, propagated throughmaybe_awaitinfastcore, and was caught bycall_func_asyncintoolslm. No numeric result was returned — just the raw exception traceback as a string.
This is a classic demonstration of division by zero error handling — the tool didn’t crash the whole system; instead, it returned the error traceback as the tool’s output, allowing us to report it gracefully.
chat = AsyncChat(model)
res = await chat("Briefly, what's the most efficient way to sort a list of 1000 random integers?", think='l',stream=True)
_ = await adisplay_stream(res)🧠
Sorting 1000 Integers Efficiently
For 1000 integers, any O(n log n) algorithm works well — the dataset is small enough that differences are negligible in practice.
Best practical choices:
- Use your language’s built-in sort (Timsort in Python/Java, introsort in C++) — optimized, tested, and hard to beat
- These are typically O(n log n) average and worst case
Example (Python):
nums = [...] # 1000 random ints
nums.sort() # DoneIf you want theoretical maximum speed: - Radix Sort — O(n·k) linear time, great for bounded integers - But for only 1000 elements, overhead likely outweighs benefits
Bottom line: Just use the built-in sort. At 1000 elements, it completes in microseconds and is almost certainly faster than a hand-rolled alternative.
Multiple tool calls
str(chat.hist[1])[:200]'Message(content="I\'ll break down the expression **((10 + 5) * 3) / (2 + 1)** into steps, identifying which calculations can be done in parallel first!\\n\\n**Step 1:** Calculate `10 + 5` and `2 + 1` in '
_display(chat.hist[2]){'tool_call_id': 'toolu_01Sntwst7uRvDnHNahaZQjZA',
'role': 'tool',
'name': 'simple_add',
'content': '15'}
chat.hist[3]{'tool_call_id': 'toolu_01HYJTgJcKMsH1rofcGC9TeY',
'role': 'tool',
'name': 'simple_add',
'content': '3'}
chat.hist[4]Message(content='- `10 + 5 = 15`\n- `2 + 1 = 3`\n\n**Step 2:** Now calculate `15 * 3` using the result from the first addition. *(This must be done before the final division.)*', role='assistant', tool_calls=[ChatCompletionMessageToolCall(function=Function(arguments='{"a": 15, "b": 3}', name='multiply'), id='toolu_01PdKMWwmv9MtSpcRDfvCYum', type='function')], function_call=None, provider_specific_fields=None)
Now to demonstrate that we can load back the formatted output back into a new Chat object:
chat5 = Chat(model,hist=fmt2hist(fmt.outp),tools=[simple_add, multiply, divide])
chat5('what did we just do?')We evaluated the mathematical expression ((10 + 5) * 3) / (2 + 1) step by step using tools, but didn’t quite finish! Here’s a simple recap:
- ✅ Added 10 + 5 = 15
- ✅ Added 2 + 1 = 3
- ✅ Multiplied 15 × 3 = 45
- ❌ Division 45 ÷ 3 = never completed!
We have everything we need to finish it though — want me to go ahead and run that final division to get the answer? 😊
- id:
chatcmpl-xxx - model:
claude-sonnet-4-6 - finish_reason:
stop - usage:
Usage(completion_tokens=167, prompt_tokens=1502, total_tokens=1669, completion_tokens_details=CompletionTokensDetailsWrapper(accepted_prediction_tokens=None, audio_tokens=None, reasoning_tokens=0, rejected_prediction_tokens=None, text_tokens=167, image_tokens=None, video_tokens=None), prompt_tokens_details=PromptTokensDetailsWrapper(audio_tokens=None, cached_tokens=0, text_tokens=1502, image_tokens=None, video_tokens=None, cache_creation_tokens=0, cache_creation_token_details=CacheCreationTokenDetails(ephemeral_5m_input_tokens=0, ephemeral_1h_input_tokens=0)), cache_creation_input_tokens=0, cache_read_input_tokens=0, inference_geo='global', speed=None)
chat_stream_tools = AsyncChat(model, search='l')
res = await chat_stream_tools("Search the weather in NYC", stream=True)
_=await adisplay_stream(res)Here is the current weather for New York City, NY on Monday, April 27, 2026:
🌤️ Current Conditions
📅 Today’s Forecast
🌡️ Hourly Outlook
* - 4 PM: 67°F - 5 PM: 64°F - 6 PM: 62°F - 7 PM: 60°F - 8 PM: 58°F - 9 PM: 55°F - 10 PM: 54°F
⚠️ Air Quality & Pollen
- * Air Quality: Poor
- * The air has reached a high level of pollution and is unhealthy for sensitive groups. It is advised to reduce time spent outside if you are feeling symptoms such as difficulty breathing or throat irritation.
- * Tree Pollen: Moderate | Grass Pollen: High | Mold: Moderate | Dust & Dander: High
🌦️ Looking Ahead
* Warmer and dry conditions are expected early this week before cooler air returns for the end of April and early May. - ⏳ web_search(query='weather in NYC today') ⏳
chat = AsyncChat(haik45)
await chat("Hi.")Hello! How can I help you today?
- id:
chatcmpl-xxx - model:
claude-haiku-4-5-20251001 - finish_reason:
stop - usage:
Usage(completion_tokens=12, prompt_tokens=9, total_tokens=21, completion_tokens_details=CompletionTokensDetailsWrapper(accepted_prediction_tokens=None, audio_tokens=None, reasoning_tokens=0, rejected_prediction_tokens=None, text_tokens=12, image_tokens=None, video_tokens=None), prompt_tokens_details=PromptTokensDetailsWrapper(audio_tokens=None, cached_tokens=0, text_tokens=9, image_tokens=None, video_tokens=None, cache_creation_tokens=0, cache_creation_token_details=CacheCreationTokenDetails(ephemeral_5m_input_tokens=0, ephemeral_1h_input_tokens=0)), cache_creation_input_tokens=0, cache_read_input_tokens=0, inference_geo='not_available', speed=None)
chat = AsyncChat(haik45)
res = await chat("Hi.", stream=True)
await adisplay_stream(res);Hi! How’s it going? How can I help you today?
chat = AsyncChat(qwen3p6p)
res = await chat("Hi.", stream=True)
async for o in res: print(delta_text(o) or '', end='')🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠Hi there! How can I help you today?
chat = AsyncChat(qwen3p6p, tools=[simple_add])
res = await chat("What's 5478954793+547982745? Use the tool.", stream=True)
async for o in res:
if isinstance(o,ModelResponseStream): print(delta_text(o) or '', end='')🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠
🔧 simple_add
🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠5478954793 + 547982745 = 6026937538
completion(dsf, [mk_msg("Respond in English.", role='system'), mk_msg("Hi!")])Hello! How can I assist you today?
- id:
chatcmpl-xxx - model:
deepseek-v4-flash - finish_reason:
stop - usage:
Usage(completion_tokens=94, prompt_tokens=10, total_tokens=104, completion_tokens_details=CompletionTokensDetailsWrapper(accepted_prediction_tokens=None, audio_tokens=None, reasoning_tokens=84, rejected_prediction_tokens=None, text_tokens=None, image_tokens=None, video_tokens=None), prompt_tokens_details=PromptTokensDetailsWrapper(audio_tokens=None, cached_tokens=0, text_tokens=None, image_tokens=None, video_tokens=None), prompt_cache_hit_tokens=0, prompt_cache_miss_tokens=10)
Tool Call Referencing
achat = AsyncChat('claude-sonnet-4-5', tools=[get_person, greet_person], tc_refs=True)
await achat("First call get_person, then pass the result to greet_person", max_steps=3)Perfect! I successfully: 1. Called get_person which returned Alice’s information (name: Alice, age: 30) 2. Passed that result to greet_person which greeted her with: “Hello Alice, you are 30 years old!”
- id:
chatcmpl-xxx - model:
claude-sonnet-4-5-20250929 - finish_reason:
stop - usage:
Usage(completion_tokens=64, prompt_tokens=1024, total_tokens=1088, completion_tokens_details=CompletionTokensDetailsWrapper(accepted_prediction_tokens=None, audio_tokens=None, reasoning_tokens=0, rejected_prediction_tokens=None, text_tokens=64, image_tokens=None, video_tokens=None), prompt_tokens_details=PromptTokensDetailsWrapper(audio_tokens=None, cached_tokens=0, text_tokens=1024, image_tokens=None, video_tokens=None, cache_creation_tokens=0, cache_creation_token_details=CacheCreationTokenDetails(ephemeral_5m_input_tokens=0, ephemeral_1h_input_tokens=0)), cache_creation_input_tokens=0, cache_read_input_tokens=0, inference_geo='not_available', speed=None)
achat.tc_res{'toolu_01VrvqXwSeiAqi9KXEkt2svo': {'name': 'Alice', 'age': 30},
'toolu_019msmJ6FxNCUeH3DsJdhd4e': 'Hello Alice, you are 30 years old!'}
list(L(achat.hist).attrgot('tool_calls').filter())[[ChatCompletionMessageToolCall(index=1, caller={'type': 'direct'}, function=Function(arguments='{}', name='get_person'), id='toolu_01VrvqXwSeiAqi9KXEkt2svo', type='function')],
[ChatCompletionMessageToolCall(index=1, caller={'type': 'direct'}, function=Function(arguments='{"person": "$`toolu_01VrvqXwSeiAqi9KXEkt2svo`"}', name='greet_person'), id='toolu_019msmJ6FxNCUeH3DsJdhd4e', type='function')]]
Codex
auth = json.loads(Path('~/.codex/auth.json').expanduser().read_text())
tok = auth['tokens']['access_token']
r = httpx.get("https://chatgpt.com/backend-api/codex/models?client_version=1.0.0",
headers={"Authorization": f"Bearer {tok}"}, timeout=10).json()
' '.join(r['models'][0])'prefer_websockets support_verbosity default_verbosity apply_patch_tool_type web_search_tool_type input_modalities supports_image_detail_original truncation_policy supports_parallel_tool_calls context_window max_context_window auto_compact_token_limit reasoning_summary_format default_reasoning_summary slug display_name description default_reasoning_level supported_reasoning_levels shell_type visibility minimal_client_version supported_in_api availability_nux upgrade priority base_instructions model_messages experimental_supported_tools available_in_plans supports_search_tool additional_speed_tiers supports_reasoning_summaries'
cc = Chat(codex55, stream=True)
display_stream(cc("I'm Jeremy."));Hi Jeremy — nice to meet you. How can I help?
display_stream(cc("What's my name?"));Your name is Jeremy.
cc = Chat(codex53spark, stream=True)
display_stream(cc("I'm Jeremy."));Nice to meet you, Jeremy! 👋
How can I help you today?
OpenAiResponsesToChatCompletionStreamIterator.chunk_parser
def chunk_parser(
chunk:dict
):
Call self as a function.
msgs = mk_msgs("Use the `simple_add` tool to calculate 5+7.")
rs = list(completion(codex53spark, msgs, tools=[toolsc], stream=True))
r = stream_chunk_builder(rs)
test_eq(r.choices[0].finish_reason, 'tool_calls')
test_eq(json.loads(r.choices[0].message.tool_calls[0].function.arguments), dict(a=5,b=7))tsp = 'Use tools for all ops - this is a test.'
cc = Chat(codex53spark, tools=[simple_add, multiply], sp=tsp, stream=True)
display_stream(cc("What's (5+7)? Use the tools.", max_steps=5));- ⏳
simple_add(a=5, b=7)⏳
simple_add(a=5, b=7)→12
{
"id": "call_N7SohXuzgMGHLEXZBPXaNkwX",
"call": {
"function": "simple_add",
"arguments": {
"a": "5",
"b": "7"
}
},
"result": "12"
}(5 + 7 = 12).
tsp = 'Call tools in parallel (i.e in a single response) where possible.'
cc = Chat(codex55, tools=[simple_add, multiply], sp=tsp, stream=True)
display_stream(cc("What's (5+7)*(5+6)? Use the tools.", max_steps=5));- ⏳
simple_add(a=5, b=7)⏳ - ⏳
simple_add(a=5, b=6)⏳
simple_add(a=5, b=7)→12
{
"id": "call_vJtw0mTMy2AhjbA36LcyaXNF",
"call": {
"function": "simple_add",
"arguments": {
"a": "5",
"b": "7"
}
},
"result": "12"
}
simple_add(a=5, b=6)→11
{
"id": "call_DUwGqBY80min0Po9fMLDAWmM",
"call": {
"function": "simple_add",
"arguments": {
"a": "5",
"b": "6"
}
},
"result": "11"
}- ⏳
multiply(a=12, b=11)⏳
multiply(a=12, b=11)→132
{
"id": "call_p80UAoYsrQAolFJ8DT5UPUKv",
"call": {
"function": "multiply",
"arguments": {
"a": "12",
"b": "11"
}
},
"result": "132"
}((5+7)(5+6)=12=132)
cc = Chat(codex55, stream=True)
display_stream(cc(['What species is in this image?', img_fn.read_bytes()]));This is a domestic dog — likely a Cavalier King Charles Spaniel puppy.
acc = AsyncChat(codex55, stream=True)
await adisplay_stream(await acc("Hi, I'm Jeremy!"));Hi Jeremy! Nice to meet you. How can I help today?
await adisplay_stream(await acc("What's my name?"));Your name is Jeremy.
cc = Chat(codex55, search='l', stream=True, max_tokens=10)
r = cc("Search: what's the weather in Brisbane today?", think='h')
display_stream(r);🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠
Today in Brisbane, Queensland: Sunny now, about 71°F / 21°C.
Forecast for the rest of today: mostly cloudy, warming to around 82°F / 28°C early afternoon, then cooling to about 68°F / 20°C tonight.
cc = Chat(codex53spark, search='l', stream=True, max_tokens=10)
r = cc("Search: what's the weather in Brisbane today?", think='l')
display_stream(r);Today in Brisbane (Tuesday, May 5, 2026), it’s currently partly sunny at about 25 °C (77°F).
Today’s forecast is: - High: 28 °C (82°F) - Low: 18 °C (64°F) - Condition: Humid with considerable cloudiness
cc.usetotal=3,454 | in=3,185 | out=269 | cached=84.4% | reasoning=184 | searches=0 | $0.0059
Required to include reasoning in codex:
ChatGPTResponsesAPIConfig.transform_responses_api_request
def transform_responses_api_request(
model, input, response_api_optional_request_params, litellm_params, headers
):
No transform applied since inputs are in OpenAI spec already
cc = Chat(codex55, stream=True)
chunks = list(cc("What's 27271*453313? Think hard then output only the answer.", think='h'))
non_content = [c for c in chunks if isinstance(c, ModelResponseStream) and not c.choices[0].delta.content]
non_content[:3][ModelResponseStream(id='chatcmpl-xxx', created=1000000000, model='gpt-5.5', object='chat.completion.chunk', system_fingerprint=None, choices=[StreamingChoices(finish_reason=None, index=0, delta=Delta(reasoning_content='**Calculating product**\n\nI', provider_specific_fields=None, content=None, role='assistant', function_call=None, tool_calls=None, audio=None), logprobs=None)], provider_specific_fields=None, citations=None),
ModelResponseStream(id='chatcmpl-xxx', created=1000000000, model='gpt-5.5', object='chat.completion.chunk', system_fingerprint=None, choices=[StreamingChoices(finish_reason=None, index=0, delta=Delta(reasoning_content=' need', provider_specific_fields=None, content=None, role=None, function_call=None, tool_calls=None, audio=None), logprobs=None)], provider_specific_fields=None, citations=None),
ModelResponseStream(id='chatcmpl-xxx', created=1000000000, model='gpt-5.5', object='chat.completion.chunk', system_fingerprint=None, choices=[StreamingChoices(finish_reason=None, index=0, delta=Delta(reasoning_content=' to', provider_specific_fields=None, content=None, role=None, function_call=None, tool_calls=None, audio=None), logprobs=None)], provider_specific_fields=None, citations=None)]
cc = AsyncChat(dsf, stream=True)
r = await cc("Hi. What's 5432*5763? Think and then answer with one word.", think='l')
async for o in r:
if isinstance(o,ModelResponseStream): print(delta_text(o) or '', end='')
🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠31304616
Test Deepseek synthetic tool calls work (requires injecting empty reasoning blocks):
tc = mk_tc(simple_add, a=5, b=7)
tcq = mk_tc_req("I'll calculate 5+7.", [tc])
tcr = mk_tc_results(tcq, ['12'])
hist = ["What's 5+7?", tcq, tcr[0], "5+7 is 12."]
chat = Chat(dsf, tools=[simple_add], hist=hist)
chat("And what's 8+9?")8+9 is 17.
- id:
chatcmpl-xxx - model:
deepseek-v4-flash - finish_reason:
stop - usage:
Usage(completion_tokens=14, prompt_tokens=496, total_tokens=510, completion_tokens_details=CompletionTokensDetailsWrapper(accepted_prediction_tokens=None, audio_tokens=None, reasoning_tokens=6, rejected_prediction_tokens=None, text_tokens=None, image_tokens=None, video_tokens=None), prompt_tokens_details=PromptTokensDetailsWrapper(audio_tokens=None, cached_tokens=384, text_tokens=None, image_tokens=None, video_tokens=None), prompt_cache_hit_tokens=384, prompt_cache_miss_tokens=112)
Other patches
AnthropicConfig.transform_request
def transform_request(
model, messages, optional_params, litellm_params, headers
):
Call self as a function.