Core

Lisette Core

LiteLLM

Deterministic outputs

LiteLLM ModelResponse(Stream) objects have id and created_at fields that are generated dynamically. Even when we use cachy to cache the LLM response these dynamic fields create diffs which makes code review more challenging. The patches below ensure that id and created_at fields are fixed and won’t generate diffs.


source

patch_litellm


def patch_litellm(
    seed:int=0
):

Patch litellm.ModelResponseBase such that id and created are fixed.

patch_litellm()

Completion

LiteLLM provides an convenient unified interface for most big LLM providers. Because it’s so useful to be able to switch LLM providers with just one argument. We want to make it even easier to by adding some more convenience functions and classes.

This is very similar to our other wrapper libraries for popular AI providers: claudette (Anthropic), gaspard (Gemini), cosette (OpenAI).

# litellm._turn_on_debug()
ms = ["gemini/gemini-3-pro-preview", "gemini/gemini-3-flash-preview", "claude-opus-4-6", "openai/gpt-4.1"]
msg = [{'role':'user','content':'Hey there!', 'cache_control': {'type': 'ephemeral'}}]
for m in ms:
    display(Markdown(f'**{m}:**'))
    display(completion(m,msg))

gemini/gemini-3-pro-preview:

Hello! How can I help you today? Whether you have a question, need some writing done, or just want to chat, I’m here.

  • id: chatcmpl-xxx
  • model: gemini-3-pro-preview
  • finish_reason: stop
  • usage: Usage(completion_tokens=148, prompt_tokens=4, total_tokens=152, completion_tokens_details=CompletionTokensDetailsWrapper(accepted_prediction_tokens=None, audio_tokens=None, reasoning_tokens=117, rejected_prediction_tokens=None, text_tokens=31, image_tokens=None), prompt_tokens_details=PromptTokensDetailsWrapper(audio_tokens=None, cached_tokens=None, text_tokens=4, image_tokens=None), cache_read_input_tokens=None)

gemini/gemini-3-flash-preview:

Hello! How can I help you today?

  • id: chatcmpl-xxx
  • model: gemini-3-flash-preview
  • finish_reason: stop
  • usage: Usage(completion_tokens=9, prompt_tokens=4, total_tokens=13, completion_tokens_details=CompletionTokensDetailsWrapper(accepted_prediction_tokens=None, audio_tokens=None, reasoning_tokens=None, rejected_prediction_tokens=None, text_tokens=9, image_tokens=None), prompt_tokens_details=PromptTokensDetailsWrapper(audio_tokens=None, cached_tokens=None, text_tokens=4, image_tokens=None), cache_read_input_tokens=None)

claude-opus-4-6:

Hey there! 👋 How’s it going? What can I help you with today?

  • id: chatcmpl-xxx
  • model: claude-opus-4-6
  • finish_reason: stop
  • usage: Usage(completion_tokens=23, prompt_tokens=10, total_tokens=33, completion_tokens_details=CompletionTokensDetailsWrapper(accepted_prediction_tokens=None, audio_tokens=None, reasoning_tokens=0, rejected_prediction_tokens=None, text_tokens=23, image_tokens=None), prompt_tokens_details=PromptTokensDetailsWrapper(audio_tokens=None, cached_tokens=0, text_tokens=None, image_tokens=None, cache_creation_tokens=0, cache_creation_token_details=CacheCreationTokenDetails(ephemeral_5m_input_tokens=0, ephemeral_1h_input_tokens=0)), cache_creation_input_tokens=0, cache_read_input_tokens=0, inference_geo='global', speed=None)

openai/gpt-4.1:

Hey! How can I help you today? 😊

  • id: chatcmpl-xxx
  • model: gpt-4.1-2025-04-14
  • finish_reason: stop
  • usage: Usage(completion_tokens=10, prompt_tokens=10, total_tokens=20, completion_tokens_details=CompletionTokensDetailsWrapper(accepted_prediction_tokens=0, audio_tokens=0, reasoning_tokens=0, rejected_prediction_tokens=0, text_tokens=None, image_tokens=None), prompt_tokens_details=PromptTokensDetailsWrapper(audio_tokens=0, cached_tokens=0, text_tokens=None, image_tokens=None))

Generated images are also displayed (not shown here to conserve filesize):

# completion(model='gemini/gemini-2.5-flash-image', messages=[{'role':'user','content':'Draw a simple sketch of a cat'}])

Messages formatting

Let’s start with making it easier to pass messages into litellm’s completion function (including images, and pdf files).

If msg has tool_calls, cache_control is added to the last tool call (required since LiteLLM strips it from empty content blocks), otherwise to the content.


source

stop_reason


def stop_reason(
    r
):

source

contents


def contents(
    r
):

Get message object from response r.


source

remove_cache_ckpts


def remove_cache_ckpts(
    msg
):

remove cache checkpoints and return msg.

Test with regular content message:

msg_content = {'role': 'user', 'content': [{'type': 'text', 'text': 'hello'}]}
_add_cache_control(msg_content)
test_eq(msg_content['content'][-1].get('cache_control'), {'type': 'ephemeral'})
test_eq(_has_cache(msg_content), True)
remove_cache_ckpts(msg_content)
test_eq(_has_cache(msg_content), False)

Test with assistant message with tool_calls:

tcs = [
    {'id': 'tc1', 'type': 'function', 'function': {'name': 'test', 'arguments': '{}'}},
    {'id': 'tc2', 'type': 'function', 'function': {'name': 'test', 'arguments': '{}'}}
]
msg_tool = {'role': 'assistant', 'content': '', 'tool_calls': tcs}
_add_cache_control(msg_tool)
test_eq(msg_tool['tool_calls'][-1].get('cache_control'), {'type': 'ephemeral'})
test_eq('cache_control' not in msg_tool.get('content', [{}])[-1] if msg_tool.get('content') else True, True)  # no cache in content
test_eq(_has_cache(msg_tool), True)
remove_cache_ckpts(msg_tool)
test_eq(_has_cache(msg_tool), False)

Test with ChatCompletionMessageToolCall tool call object:

tcs =[
    ChatCompletionMessageToolCall(id='tc1', type='function', function=Function(name='test', arguments='{}')), 
    ChatCompletionMessageToolCall(id='tc2', type='function', function=Function(name='test', arguments='{}'))
]
msg_tc_obj = {'role': 'assistant', 'content': '', 'tool_calls': tcs}
_add_cache_control(msg_tc_obj)
test_eq(getattr(msg_tc_obj['tool_calls'][-1], 'cache_control', None), {'type': 'ephemeral'})
test_eq(_has_cache(msg_tc_obj), True)
remove_cache_ckpts(msg_tc_obj)
test_eq(_has_cache(msg_tc_obj), False)

source

mk_msg


def mk_msg(
    content, # Content: str, bytes (image), list of mixed content, or dict w 'role' and 'content' fields
    role:str='user', # Message role if content isn't already a dict/Message
    cache:bool=False, # Enable Anthropic caching
    ttl:NoneType=None, # Cache TTL: '5m' (default) or '1h'
):

Create a LiteLLM compatible message.

Now we can use mk_msg to create different types of messages.

Simple text:

msg = mk_msg("hey")
msg
{'role': 'user', 'content': 'hey'}

Which can be passed to litellm’s completion function like this:

model = ms[1] # use 2.5-pro, 3-pro is very slow even to run tests as of making
res = completion(model, [msg])
res

Hello! How can I help you today?

  • id: chatcmpl-xxx
  • model: gemini-3-flash-preview
  • finish_reason: stop
  • usage: Usage(completion_tokens=9, prompt_tokens=2, total_tokens=11, completion_tokens_details=CompletionTokensDetailsWrapper(accepted_prediction_tokens=None, audio_tokens=None, reasoning_tokens=None, rejected_prediction_tokens=None, text_tokens=9, image_tokens=None), prompt_tokens_details=PromptTokensDetailsWrapper(audio_tokens=None, cached_tokens=None, text_tokens=2, image_tokens=None), cache_read_input_tokens=None)

We’ll add a little shortcut to make examples and testing easier here:

def c(msgs, m=model, **kw):
    msgs = [msgs] if isinstance(msgs,dict) else listify(msgs)
    return completion(m, msgs, **kw)
c(msg)

Hello! How can I help you today?

  • id: chatcmpl-xxx
  • model: gemini-3-flash-preview
  • finish_reason: stop
  • usage: Usage(completion_tokens=9, prompt_tokens=2, total_tokens=11, completion_tokens_details=CompletionTokensDetailsWrapper(accepted_prediction_tokens=None, audio_tokens=None, reasoning_tokens=None, rejected_prediction_tokens=None, text_tokens=9, image_tokens=None), prompt_tokens_details=PromptTokensDetailsWrapper(audio_tokens=None, cached_tokens=None, text_tokens=2, image_tokens=None), cache_read_input_tokens=None)

Lists w just one string element are flattened for conciseness:

test_eq(mk_msg("hey"), mk_msg(["hey"]))

(LiteLLM ignores these fields when sent to other providers)

Text and images:

img_fn = Path('samples/puppy.jpg')
Image(filename=img_fn, width=200)

msg = mk_msg(['hey what in this image?',img_fn.read_bytes()])
print(json.dumps(msg,indent=1)[:200]+"...")
{
 "role": "user",
 "content": [
  {
   "type": "text",
   "text": "hey what in this image?"
  },
  {
   "type": "image_url",
   "image_url": "...
c(msg)

In this image, a small brown and white puppy, possibly a Cavalier King Charles Spaniel, sits in the grass next to a green plant with purple flowers. The puppy’s head is tilted to the right, and its ears are perked up. Its eyes are large and brown, and its nose is black. The background is a dark brown wooden wall. The lighting is soft and natural.

  • id: chatcmpl-xxx
  • model: gemini-3-flash-preview
  • finish_reason: stop
  • usage: Usage(completion_tokens=81, prompt_tokens=1087, total_tokens=1168, completion_tokens_details=CompletionTokensDetailsWrapper(accepted_prediction_tokens=None, audio_tokens=None, reasoning_tokens=None, rejected_prediction_tokens=None, text_tokens=81, image_tokens=None), prompt_tokens_details=PromptTokensDetailsWrapper(audio_tokens=None, cached_tokens=None, text_tokens=7, image_tokens=1080), cache_read_input_tokens=None)

Let’s also demonstrate this for PDFs

pdf_fn = Path('samples/solveit.pdf')
msg = mk_msg(['Who is the author of this pdf?', pdf_fn.read_bytes()])
c(msg)

The author of the provided document is Jeremy Howard, who is the co-founder of fast.ai.

  • id: chatcmpl-xxx
  • model: gemini-3-flash-preview
  • finish_reason: stop
  • usage: Usage(completion_tokens=22, prompt_tokens=541, total_tokens=563, completion_tokens_details=CompletionTokensDetailsWrapper(accepted_prediction_tokens=None, audio_tokens=None, reasoning_tokens=None, rejected_prediction_tokens=None, text_tokens=22, image_tokens=None), prompt_tokens_details=PromptTokensDetailsWrapper(audio_tokens=None, cached_tokens=None, text_tokens=9, image_tokens=532), cache_read_input_tokens=None)

Some models like Gemini support audio and video:

wav_data = httpx.get("https://openaiassets.blob.core.windows.net/$web/API/docs/audio/alloy.wav").content
# Audio(wav_data)  # uncomment to preview
msg = mk_msg(['What is this audio saying?', wav_data])
completion(ms[1], [msg])

The audio says: “The sun rises in the east and sets in the west. This simple fact has been observed by humans for thousands of years.”

  • id: chatcmpl-xxx
  • model: gemini-3-flash-preview
  • finish_reason: stop
  • usage: Usage(completion_tokens=30, prompt_tokens=181, total_tokens=211, completion_tokens_details=CompletionTokensDetailsWrapper(accepted_prediction_tokens=None, audio_tokens=None, reasoning_tokens=None, rejected_prediction_tokens=None, text_tokens=30, image_tokens=None), prompt_tokens_details=PromptTokensDetailsWrapper(audio_tokens=174, cached_tokens=None, text_tokens=7, image_tokens=None), cache_read_input_tokens=None)
vid_data = httpx.get("https://storage.googleapis.com/github-repo/img/gemini/multimodality_usecases_overview/pixel8.mp4").content
msg = mk_msg(['Concisely, what is happening in this video?', vid_data])
completion(ms[1], [msg])

In this video, a photographer shows off the night video capabilities of her Google Pixel phone. She shows off its “Video Boost” and “Night Sight” capabilities as she takes night video of Tokyo, Japan.

  • id: chatcmpl-xxx
  • model: gemini-3-flash-preview
  • finish_reason: stop
  • usage: Usage(completion_tokens=42, prompt_tokens=5205, total_tokens=5247, completion_tokens_details=CompletionTokensDetailsWrapper(accepted_prediction_tokens=None, audio_tokens=None, reasoning_tokens=None, rejected_prediction_tokens=None, text_tokens=42, image_tokens=None), prompt_tokens_details=PromptTokensDetailsWrapper(audio_tokens=None, cached_tokens=None, text_tokens=12, image_tokens=None), cache_read_input_tokens=None)

Caching

Some providers such as Anthropic require manually opting into caching. Let’s try it:

def cpr(i): return f'{i} '*1024 + 'This is a caching test. Report back only what number you see repeated above.'
disable_cachy()
# msg = mk_msg(cpr(1), cache=True)
# res = c(msg, ms[2])
# res

Anthropic has a maximum of 4 cache checkpoints, so we remove previous ones as we go:

# res = c([remove_cache_ckpts(msg), mk_msg(res), mk_msg(cpr(2), cache=True)], ms[2])
# res

We see that the first message was cached, and this extra message has been written to cache:

# res.usage.prompt_tokens_details

We can add a bunch of large messages in a loop to see how the number of cached tokens used grows.

We do this for 25 times to ensure it still works for more than >20 content blocks, which is a known anthropic issue.

The code below is commented by default, because it’s slow. Please uncomment when working on caching.

# h = []
# msg = mk_msg(cpr(1), cache=True)

# for o in range(2,25):
#     h += [remove_cache_ckpts(msg), mk_msg(res)]
#     msg = mk_msg(cpr(o), cache=True)
#     res = c(h+[msg])
#     detls = res.usage.prompt_tokens_details
#     print(o, detls.cached_tokens, detls.cache_creation_tokens, end='; ')
enable_cachy()

Reconstructing formatted outputs

Lisette can call multiple tools in a loop. Further down this notebook, we’ll provide convenience functions for formatting such a sequence of toolcalls and responses into one formatted output string.

For now, we’ll show an example and show how to transform such a formatted output string back into a valid LiteLLM history.

fmt_outp = '''
I'll solve this step-by-step, using parallel calls where possible.

<details class='tool-usage-details'>

```json
{
  "id": "toolu_01KjnQH2Nsz2viQ7XYpLW3Ta",
  "call": { "function": "simple_add", "arguments": { "a": 10, "b": 5 } },
  "result": "15"
}
```

</details>

<details class='tool-usage-details'>

```json
{
  "id": "toolu_01Koi2EZrGZsBbnQ13wuuvzY",
  "call": { "function": "simple_add", "arguments": { "a": 2, "b": 1 } },
  "result": "3"
}
```

</details>

Now I need to multiply 15 * 3 before I can do the final division:

<details class='tool-usage-details'>

```json
{
  "id": "toolu_0141NRaWUjmGtwxZjWkyiq6C",
  "call": { "function": "multiply", "arguments": { "a": 15, "b": 3 } },
  "result": "45"
}
```

</details>

<details class='token-usage-details'><summary>Cache hit: 81.8% | Tokens: total=23,276 input=23,158 (+18,910 cached, 0 new) output=118 (reasoning 23)</summary>

`Usage(completion_tokens=118, prompt_tokens=23158, total_tokens=23276, completion_tokens_details=CompletionTokensDetailsWrapper(accepted_prediction_tokens=None, audio_tokens=None, reasoning_tokens=23, rejected_prediction_tokens=None, text_tokens=None, image_tokens=None), prompt_tokens_details=PromptTokensDetailsWrapper(audio_tokens=None, cached_tokens=18910, text_tokens=None, image_tokens=None, cache_creation_tokens=0), cache_creation_input_tokens=0, cache_read_input_tokens=18910)`

</details>
'''

We can split into chunks of (text,toolstr,json):

sp = re_tools.split(fmt_outp)
for o in list(chunked(sp, 3, pad=True)): print('- ', o)
-  ["\nI'll solve this step-by-step, using parallel calls where possible.\n\n", '<details class=\'tool-usage-details\'>\n\n```json\n{\n  "id": "toolu_01KjnQH2Nsz2viQ7XYpLW3Ta",\n  "call": { "function": "simple_add", "arguments": { "a": 10, "b": 5 } },\n  "result": "15"\n}\n```\n\n</details>', '{\n  "id": "toolu_01KjnQH2Nsz2viQ7XYpLW3Ta",\n  "call": { "function": "simple_add", "arguments": { "a": 10, "b": 5 } },\n  "result": "15"\n}']
-  ['\n\n', '<details class=\'tool-usage-details\'>\n\n```json\n{\n  "id": "toolu_01Koi2EZrGZsBbnQ13wuuvzY",\n  "call": { "function": "simple_add", "arguments": { "a": 2, "b": 1 } },\n  "result": "3"\n}\n```\n\n</details>', '{\n  "id": "toolu_01Koi2EZrGZsBbnQ13wuuvzY",\n  "call": { "function": "simple_add", "arguments": { "a": 2, "b": 1 } },\n  "result": "3"\n}']
-  ['\n\nNow I need to multiply 15 * 3 before I can do the final division:\n\n', '<details class=\'tool-usage-details\'>\n\n```json\n{\n  "id": "toolu_0141NRaWUjmGtwxZjWkyiq6C",\n  "call": { "function": "multiply", "arguments": { "a": 15, "b": 3 } },\n  "result": "45"\n}\n```\n\n</details>', '{\n  "id": "toolu_0141NRaWUjmGtwxZjWkyiq6C",\n  "call": { "function": "multiply", "arguments": { "a": 15, "b": 3 } },\n  "result": "45"\n}']
-  ["\n\n<details class='token-usage-details'><summary>Cache hit: 81.8% | Tokens: total=23,276 input=23,158 (+18,910 cached, 0 new) output=118 (reasoning 23)</summary>\n\n`Usage(completion_tokens=118, prompt_tokens=23158, total_tokens=23276, completion_tokens_details=CompletionTokensDetailsWrapper(accepted_prediction_tokens=None, audio_tokens=None, reasoning_tokens=23, rejected_prediction_tokens=None, text_tokens=None, image_tokens=None), prompt_tokens_details=PromptTokensDetailsWrapper(audio_tokens=None, cached_tokens=18910, text_tokens=None, image_tokens=None, cache_creation_tokens=0), cache_creation_input_tokens=0, cache_read_input_tokens=18910)`\n\n</details>\n", None, None]

source

fmt2hist


def fmt2hist(
    outp:str
)->list:

Transform a formatted output into a LiteLLM compatible history

See how we can turn that one formatted output string back into a list of Messages:

from pprint import pprint
h = fmt2hist(fmt_outp)
pprint(h)
[Message(content="I'll solve this step-by-step, using parallel calls where possible.", role='assistant', tool_calls=[ChatCompletionMessageToolCall(function=Function(arguments='{"a": 10, "b": 5}', name='simple_add'), id='toolu_4_cGgsIJTKyin2__2CwHzQ', type='function')], function_call=None, provider_specific_fields=None),
 {'content': '15',
  'name': 'simple_add',
  'role': 'tool',
  'tool_call_id': 'toolu_01KjnQH2Nsz2viQ7XYpLW3Ta'},
 Message(content='', role='assistant', tool_calls=[ChatCompletionMessageToolCall(function=Function(arguments='{"a": 2, "b": 1}', name='simple_add'), id='toolu_9yi0_kJITjqKXS80a6qUVQ', type='function')], function_call=None, provider_specific_fields=None),
 {'content': '3',
  'name': 'simple_add',
  'role': 'tool',
  'tool_call_id': 'toolu_01Koi2EZrGZsBbnQ13wuuvzY'},
 Message(content='Now I need to multiply 15 * 3 before I can do the final division:', role='assistant', tool_calls=[ChatCompletionMessageToolCall(function=Function(arguments='{"a": 15, "b": 3}', name='multiply'), id='toolu_6xFns2epQ3i8ZcHlguLmYg', type='function')], function_call=None, provider_specific_fields=None),
 {'content': '45',
  'name': 'multiply',
  'role': 'tool',
  'tool_call_id': 'toolu_0141NRaWUjmGtwxZjWkyiq6C'},
 Message(content='.', role='assistant', tool_calls=None, function_call=None, provider_specific_fields=None)]

mk_msgs

We will skip tool use blocks and tool results during caching

Now lets make it easy to provide entire conversations:


source

mk_msgs


def mk_msgs(
    msgs, # List of messages (each: str, bytes, list, or dict w 'role' and 'content' fields)
    cache:bool=False, # Enable Anthropic caching
    cache_idxs:list=[-1], # Cache breakpoint idxs
    ttl:NoneType=None, # Cache TTL: '5m' (default) or '1h'
):

Create a list of LiteLLM compatible messages.

With mk_msgs you can easily provide a whole conversation:

msgs = mk_msgs(['Hey!',"Hi there!","How are you?","I'm doing fine and you?"])
msgs
[{'role': 'user', 'content': 'Hey!'},
 {'role': 'assistant', 'content': 'Hi there!'},
 {'role': 'user', 'content': 'How are you?'},
 {'role': 'assistant', 'content': "I'm doing fine and you?"}]

By defualt the last message will be cached when cache=True:

msgs = mk_msgs(['Hey!',"Hi there!","How are you?","I'm doing fine and you?"], cache=True)
msgs
[{'role': 'user', 'content': 'Hey!'},
 {'role': 'assistant', 'content': 'Hi there!'},
 {'role': 'user', 'content': 'How are you?'},
 {'role': 'assistant',
  'content': [{'type': 'text',
    'text': "I'm doing fine and you?",
    'cache_control': {'type': 'ephemeral'}}]}]
test_eq('cache_control' in msgs[-1]['content'][0], True)

Alternatively, users can provide custom cache_idxs. Tool call blocks and results are skipped during caching:

msgs = mk_msgs(['Hello!','Hi! How can I help you?','Call some functions!',fmt_outp], cache=True, cache_idxs=[0,-2,-1])
msgs
[{'role': 'user',
  'content': [{'type': 'text',
    'text': 'Hello!',
    'cache_control': {'type': 'ephemeral'}}]},
 {'role': 'assistant', 'content': 'Hi! How can I help you?'},
 {'role': 'user', 'content': 'Call some functions!'},
 Message(content="I'll solve this step-by-step, using parallel calls where possible.", role='assistant', tool_calls=[ChatCompletionMessageToolCall(function=Function(arguments='{"a": 10, "b": 5}', name='simple_add'), id='toolu_98G9h02lRwmUcT1gyKcGOQ', type='function')], function_call=None, provider_specific_fields=None),
 {'role': 'tool',
  'tool_call_id': 'toolu_01KjnQH2Nsz2viQ7XYpLW3Ta',
  'name': 'simple_add',
  'content': '15'},
 Message(content='', role='assistant', tool_calls=[ChatCompletionMessageToolCall(function=Function(arguments='{"a": 2, "b": 1}', name='simple_add'), id='toolu_5EPfeJVYRn_bqR_vegJCBA', type='function')], function_call=None, provider_specific_fields=None),
 {'role': 'tool',
  'tool_call_id': 'toolu_01Koi2EZrGZsBbnQ13wuuvzY',
  'name': 'simple_add',
  'content': '3'},
 Message(content='Now I need to multiply 15 * 3 before I can do the final division:', role='assistant', tool_calls=[ChatCompletionMessageToolCall(function=Function(arguments='{"a": 15, "b": 3}', name='multiply'), id='toolu_I6dxGoEzSHa369zZ6HoWEw', type='function', cache_control={'type': 'ephemeral'})], function_call=None, provider_specific_fields=None),
 {'role': 'tool',
  'tool_call_id': 'toolu_0141NRaWUjmGtwxZjWkyiq6C',
  'name': 'multiply',
  'content': '45'},
 Message(content=[{'type': 'text', 'text': '.', 'cache_control': {'type': 'ephemeral'}}], role='assistant', tool_calls=None, function_call=None, provider_specific_fields=None)]
msgs[-2]
{'role': 'tool',
 'tool_call_id': 'toolu_0141NRaWUjmGtwxZjWkyiq6C',
 'name': 'multiply',
 'content': '45'}
msgs = mk_msgs(['Hello!','Hi! How can I help you?','Call some functions!',fmt_outp], cache=True, cache_idxs=[0,-3,-2])
msgs
[{'role': 'user',
  'content': [{'type': 'text',
    'text': 'Hello!',
    'cache_control': {'type': 'ephemeral'}}]},
 {'role': 'assistant', 'content': 'Hi! How can I help you?'},
 {'role': 'user', 'content': 'Call some functions!'},
 Message(content="I'll solve this step-by-step, using parallel calls where possible.", role='assistant', tool_calls=[ChatCompletionMessageToolCall(function=Function(arguments='{"a": 10, "b": 5}', name='simple_add'), id='toolu_GEbUJMF8QnmjxmEvSCaGcw', type='function')], function_call=None, provider_specific_fields=None),
 {'role': 'tool',
  'tool_call_id': 'toolu_01KjnQH2Nsz2viQ7XYpLW3Ta',
  'name': 'simple_add',
  'content': '15'},
 Message(content='', role='assistant', tool_calls=[ChatCompletionMessageToolCall(function=Function(arguments='{"a": 2, "b": 1}', name='simple_add'), id='toolu__L0Ew0AhTveMpaWhnk1uPA', type='function', cache_control={'type': 'ephemeral'})], function_call=None, provider_specific_fields=None),
 {'role': 'tool',
  'tool_call_id': 'toolu_01Koi2EZrGZsBbnQ13wuuvzY',
  'name': 'simple_add',
  'content': '3'},
 Message(content='Now I need to multiply 15 * 3 before I can do the final division:', role='assistant', tool_calls=[ChatCompletionMessageToolCall(function=Function(arguments='{"a": 15, "b": 3}', name='multiply'), id='toolu_tIYrIfuXRDWIVhcS6OUhag', type='function', cache_control={'type': 'ephemeral'})], function_call=None, provider_specific_fields=None),
 {'role': 'tool',
  'tool_call_id': 'toolu_0141NRaWUjmGtwxZjWkyiq6C',
  'name': 'multiply',
  'content': '45'},
 Message(content='.', role='assistant', tool_calls=None, function_call=None, provider_specific_fields=None)]
msgs[-3]
Message(content='Now I need to multiply 15 * 3 before I can do the final division:', role='assistant', tool_calls=[ChatCompletionMessageToolCall(function=Function(arguments='{"a": 15, "b": 3}', name='multiply'), id='toolu_tIYrIfuXRDWIVhcS6OUhag', type='function', cache_control={'type': 'ephemeral'})], function_call=None, provider_specific_fields=None)
msgs[-5]
Message(content='', role='assistant', tool_calls=[ChatCompletionMessageToolCall(function=Function(arguments='{"a": 2, "b": 1}', name='simple_add'), id='toolu__L0Ew0AhTveMpaWhnk1uPA', type='function', cache_control={'type': 'ephemeral'})], function_call=None, provider_specific_fields=None)
test_eq('cache_control' in msgs[0]['content'][0], True)

Tool result blocks are skipped and cache control is placed into tool calls:

test_eq('cache_control' in msgs[-5]['tool_calls'][0], True) 
test_eq('cache_control' in msgs[-3]['tool_calls'][0], True)
L(msgs).map(remove_cache_ckpts)
test_eq(any(L(msgs).map(_has_cache)), False)

Who’s speaking at when is automatically inferred. Even when there are multiple tools being called in parallel (which LiteLLM supports!).

msgs = mk_msgs(['Tell me the weather in Paris and Rome',
                'Assistant calls weather tool two times',
                {'role':'tool','content':'Weather in Paris is ...'},
                {'role':'tool','content':'Weather in Rome is ...'},
                'Assistant returns weather',
                'Thanks!'])
msgs
[{'role': 'user', 'content': 'Tell me the weather in Paris and Rome'},
 {'role': 'assistant', 'content': 'Assistant calls weather tool two times'},
 {'role': 'tool', 'content': 'Weather in Paris is ...'},
 {'role': 'tool', 'content': 'Weather in Rome is ...'},
 {'role': 'assistant', 'content': 'Assistant returns weather'},
 {'role': 'user', 'content': 'Thanks!'}]

For ease of use, if msgs is not already in a list, it will automatically be wrapped inside one. This way you can pass a single prompt into mk_msgs and get back a LiteLLM compatible msg history.

msgs = mk_msgs("Hey")
msgs
[{'role': 'user', 'content': 'Hey'}]
msgs = mk_msgs(['Hey!',"Hi there!","How are you?","I'm fine, you?"])
msgs
[{'role': 'user', 'content': 'Hey!'},
 {'role': 'assistant', 'content': 'Hi there!'},
 {'role': 'user', 'content': 'How are you?'},
 {'role': 'assistant', 'content': "I'm fine, you?"}]

However, beware that if you use mk_msgs for a single message, consisting of multiple parts. Then you should be explicit, and make sure to wrap those multiple messages in two lists:

  1. One list to show that they belong together in one message (the inner list).
  2. Another, because mk_msgs expects a list of multiple messages (the outer list).

This is common when working with images for example:

msgs = mk_msgs([['Whats in this img?',img_fn.read_bytes()]])
print(json.dumps(msgs,indent=1)[:200]+"...")
[
 {
  "role": "user",
  "content": [
   {
    "type": "text",
    "text": "Whats in this img?"
   },
   {
    "type": "image_url",
    "image_url": "...

Streaming

LiteLLM supports streaming responses. That’s really useful if you want to show intermediate results, instead of having to wait until the whole response is finished.

We create this helper function that returns the entire response at the end of the stream. This is useful when you want to store the whole response somewhere after having displayed the intermediate results.


source

stream_with_complete


def stream_with_complete(
    gen, postproc:function=noop
):

Extend streaming response chunks with the complete response

r = c(mk_msgs("Hey!"), stream=True)
r2 = SaveReturn(stream_with_complete(r))
for o in r2:
    cts = o.choices[0].delta.content
    if cts: print(cts, end='')
Hello! How can I help you today?
r2.value

Hello! How can I help you today?

  • id: chatcmpl-xxx
  • model: gemini-3-flash-preview
  • finish_reason: stop
  • usage: Usage(completion_tokens=9, prompt_tokens=3, total_tokens=12, completion_tokens_details=CompletionTokensDetailsWrapper(accepted_prediction_tokens=None, audio_tokens=None, reasoning_tokens=0, rejected_prediction_tokens=None, text_tokens=None, image_tokens=None), prompt_tokens_details=None)

Tools


source

lite_mk_func


def lite_mk_func(
    f
):
def simple_add(
    a: int,   # first operand
    b: int=0  # second operand
) -> int:
    "Add two numbers together"
    return a + b
toolsc = lite_mk_func(simple_add)
toolsc
{'type': 'function',
 'function': {'name': 'simple_add',
  'description': 'Add two numbers together\n\nReturns:\n- type: integer',
  'parameters': {'type': 'object',
   'properties': {'a': {'type': 'integer', 'description': 'first operand'},
    'b': {'type': 'integer', 'description': 'second operand', 'default': 0}},
   'required': ['a']}}}
tmsg = mk_msg("What is 5478954793+547982745? How about 5479749754+9875438979? Always use tools for calculations, and describe what you'll do before using a tool. Where multiple tool calls are required, do them in a single response where possible. ")
r = c(tmsg, tools=[toolsc])
display(r)

I will use the simple_add tool to calculate the sums of the two pairs of numbers provided.

  1. First, I will add 5,478,954,793 and 547,982,745.
  2. Second, I will add 5,479,749,754 and 9,875,438,979.

🔧 simple_add({“a”: 5478954793, “b”: 547982745})

🔧 simple_add({“a”: 5479749754, “b”: 9875438979})

  • id: chatcmpl-xxx
  • model: gemini-3-flash-preview
  • finish_reason: tool_calls
  • usage: Usage(completion_tokens=168, prompt_tokens=160, total_tokens=328, completion_tokens_details=CompletionTokensDetailsWrapper(accepted_prediction_tokens=None, audio_tokens=None, reasoning_tokens=None, rejected_prediction_tokens=None, text_tokens=168, image_tokens=None), prompt_tokens_details=PromptTokensDetailsWrapper(audio_tokens=None, cached_tokens=None, text_tokens=160, image_tokens=None), cache_read_input_tokens=None)

A tool response can be a string or a list of tool blocks (e.g., an image url block). To allow users to specify if a response should not be immediately stringified, we provide the ToolResponse datatype users can wrap their return statement in.


source

ToolResponse


def ToolResponse(
    content:list
)->None:

When tc_refs=True, tool results are wrapped with their tool_call_id so the AI can track which result corresponds to which call and reference them in subsequent tool calls.

# Test _prep_tool_res - string result
test_eq(_prep_tool_res('hello', 'toolu_123'), [
    {'type': 'text', 'text': '[tool_call_id: toolu_123]'},
    {'type': 'text', 'text': 'hello'}
])

# Test _prep_tool_res - list result (e.g. ToolResponse content)
img_block = {'type': 'image_url', 'image_url': {'url': 'data:...'}}
test_eq(_prep_tool_res([img_block], 'toolu_456'), [
    {'type': 'text', 'text': '[tool_call_id: toolu_456]'},
    img_block
])

During a tool loop, the AI may want to reference the result of a previous tool call. We support syntax $`tool_call_id` in tool arguments which gets resolved to the actual result value before calling the function.

# Test _resolve_tool_refs
tc_res = {'toolu_abc123': 'hello world', 'toolu_xyz789': 42}

# Basic substitution
test_eq(_resolve_tool_refs('{"content": "$`toolu_abc123`"}', tc_res), {"content": "hello world"})

# Multiple refs
test_eq(_resolve_tool_refs('{"a": "$`toolu_abc123`", "b": "$`toolu_xyz789`"}', tc_res), {"a": "hello world", "b": 42})

# No refs - passthrough
test_eq(_resolve_tool_refs('{"x": 1}', tc_res), {"x": 1})

# Empty tc_res
test_eq(_resolve_tool_refs('{"x": 1}', None), {"x": 1})

# Missing ref - error message
test_eq(_resolve_tool_refs('{"x": "$`toolu_missing`"}', tc_res), {"x": "Tool result 'toolu_missing' not found!"})

# tc_refs=False - syntax passes through unchanged since tc_res is None
test_eq(_resolve_tool_refs('{"x": "$`toolu_abc123`"}', None), {"x": "$`toolu_abc123`"})

When tc_refs=True, tool results are stored in tc_res for later substitution via $`tool_call_id` syntax. Some callers might return string reprs of Python objects. _try_eval attempts to convert these back to Python objects using ast.literal_eval, falling back to the original value on failure. This ensures substituted values are actual objects, not string reprs.

test_eq(ast.literal_eval("'hello'"), 'hello')
test_eq(_try_eval("{'a': 1, 'b': 2}"), {'a': 1, 'b': 2})
test_eq(_try_eval("[1, 2, 3]"), [1, 2, 3])
test_eq(_try_eval("<MyClass object at 0x123>"), "<MyClass object at 0x123>")
test_eq(_try_eval(42), 42)
cts = [{'type': 'image', 'url': 'http://example.com/img.png'}]
test_eq(_try_eval(ToolResponse(cts)), ToolResponse(cts))

Ensure ToolResponse content (e.g. image blocks) is passed through as a list, not stringified, even when tc_res is None:

fake_tc = ChatCompletionMessageToolCall(index=0, function=Function(name='test_img'), id='_test', type='function')
img_content = [{'type': 'image_url', 'image_url': ''}]
res = _mk_tool_result(fake_tc, ToolResponse(img_content))
test_eq(res['content'], img_content)  # ToolResponse should pass through

res_str = _mk_tool_result(fake_tc, ['hello'])
test_eq(res_str['content'], "['hello']")  # other tools results are stringified
tcs = [_lite_call_func(o, [toolsc], ns=globals()) for o in r.choices[0].message.tool_calls]
tcs
[{'tool_call_id': 'call_JZ9DKeb0SQuaFkEGz2plng',
  'role': 'tool',
  'name': 'simple_add',
  'content': '6026937538'},
 {'tool_call_id': 'call_EuDIsrrWQPuZSI3sT2XU2Q',
  'role': 'tool',
  'name': 'simple_add',
  'content': '15355188733'}]
r.choices[0].message.tool_calls
[ChatCompletionMessageToolCall(index=0, provider_specific_fields={'thought_signature': 'EjQKMgG+Pvb7yVussYhsFYeo+LNOrfbbtwi8h0Nhhwg0HFPz37rXtaBU+kbPYPQ435T8oyG5'}, function=Function(arguments='{"a": 5478954793, "b": 547982745}', name='simple_add'), id='call_JZ9DKeb0SQuaFkEGz2plng', type='function'),
 ChatCompletionMessageToolCall(index=1, function=Function(arguments='{"a": 5479749754, "b": 9875438979}', name='simple_add'), id='call_EuDIsrrWQPuZSI3sT2XU2Q', type='function')]

Test tool calls that were not in tool_schemas are caught:

fake_tc = ChatCompletionMessageToolCall(index=0, function=Function(name='hallucinated_tool'),id='_', type='function')
test_eq(_lite_call_func(fake_tc, ns=globals(), tool_schemas=[toolsc])['content'],"Tool not defined in tool_schemas: hallucinated_tool")
test_fail(_lite_call_func(fake_tc, ns=globals(), tool_schemas=None)['content'],"Tool not defined in tool_schemas: hallucinated_tool")

Test tool calls that were not in tool_choice are caught:

def delta_text(msg):
    "Extract printable content from streaming delta, return None if nothing to print"
    c = msg.choices[0]
    if not c: return c
    if not hasattr(c,'delta'): return None #f'{c}'
    delta = c.delta
    if delta.content: return delta.content
    if delta.tool_calls:
        res = ''.join(f"🔧 {tc.function.name}" for tc in delta.tool_calls if tc.id and tc.function.name)
        if res: return f'\n{res}\n'
    if hasattr(delta,'reasoning_content'): return '🧠' if delta.reasoning_content else '\n\n'
    return None
r = c(tmsg, stream=True, tools=[toolsc])
r2 = SaveReturn(stream_with_complete(r))
for o in r2: print(delta_text(o) or '', end='')
I will use the addition tool to calculate the sum of 5478954793 and 547982745, and then a second call to calculate the sum of 5479749754 and 9875438979.


🔧 simple_add

🔧 simple_add
r2.value

I will use the addition tool to calculate the sum of 5478954793 and 547982745, and then a second call to calculate the sum of 5479749754 and 9875438979.

🔧 simple_add({“b”: 547982745, “a”: 5478954793})

🔧 simple_add({“b”: 9875438979, “a”: 5479749754})

  • id: chatcmpl-xxx
  • model: gemini-3-flash-preview
  • finish_reason: stop
  • usage: Usage(completion_tokens=140, prompt_tokens=160, total_tokens=300, completion_tokens_details=CompletionTokensDetailsWrapper(accepted_prediction_tokens=None, audio_tokens=None, reasoning_tokens=0, rejected_prediction_tokens=None, text_tokens=None, image_tokens=None), prompt_tokens_details=None)
msg = mk_msg("Solve this complex math problem: What is the derivative of x^3 + 2x^2 - 5x + 1?")
r = c(msg, stream=True, reasoning_effort="low")
r2 = SaveReturn(stream_with_complete(r))
for o in r2: print(delta_text(o) or '', end='')
🧠To find the derivative of the function $f(x) = x^3 + 2x^2 - 5x + 1$, we use the **Power Rule**, which states that if $f(x) = x^n$, then $f'(x) = nx^{n-1}$.

We apply this rule to each term individually:

1.  **The derivative of $x^3$:**
    Apply the power rule ($n=3$): $3x^{3-1} = \mathbf{3x^2}$

2.  **The derivative of $2x^2$:**
    Multiply the exponent by the coefficient ($2 \times 2$) and subtract 1 from the exponent: $4x^{2-1} = \mathbf{4x}$

3.  **The derivative of $-5x$:**
    The derivative of $x$ is 1, so: $-5(1) = \mathbf{-5}$

4.  **The derivative of $1$:**
    The derivative of any constant is **$0$**.

### Final Answer:
Combining these results, the derivative is:
**$f'(x) = 3x^2 + 4x - 5$**
r2.value

To find the derivative of the function \(f(x) = x^3 + 2x^2 - 5x + 1\), we use the Power Rule, which states that if \(f(x) = x^n\), then \(f'(x) = nx^{n-1}\).

We apply this rule to each term individually:

  1. The derivative of \(x^3\): Apply the power rule (\(n=3\)): \(3x^{3-1} = \mathbf{3x^2}\)

  2. The derivative of \(2x^2\): Multiply the exponent by the coefficient (\(2 \times 2\)) and subtract 1 from the exponent: \(4x^{2-1} = \mathbf{4x}\)

  3. The derivative of \(-5x\): The derivative of \(x\) is 1, so: \(-5(1) = \mathbf{-5}\)

  4. The derivative of \(1\): The derivative of any constant is \(0\).

Final Answer:

Combining these results, the derivative is: \(f'(x) = 3x^2 + 4x - 5\)

  • id: chatcmpl-xxx
  • model: gemini-3-flash-preview
  • finish_reason: stop
  • usage: Usage(completion_tokens=625, prompt_tokens=29, total_tokens=654, completion_tokens_details=CompletionTokensDetailsWrapper(accepted_prediction_tokens=None, audio_tokens=None, reasoning_tokens=105, rejected_prediction_tokens=None, text_tokens=None, image_tokens=None), prompt_tokens_details=None)

Structured Outputs


source

structured


def structured(
    m:str, # LiteLLM model string
    msgs:list, # List of messages
    tool:Callable, # Tool to be used for creating the structured output (class, dataclass or Pydantic, function, etc)
    messages:List=[], # Optional OpenAI params: see https://platform.openai.com/docs/api-reference/chat/create
    timeout:Union=None, temperature:Optional=None, top_p:Optional=None, n:Optional=None, stream:Optional=None,
    stream_options:Optional=None, stop:NoneType=None, max_completion_tokens:Optional=None, max_tokens:Optional=None,
    modalities:Optional=None, prediction:Optional=None, audio:Optional=None, presence_penalty:Optional=None,
    frequency_penalty:Optional=None, logit_bias:Optional=None, user:Optional=None,
    reasoning_effort:Optional=None, # openai v1.0+ new params
    verbosity:Optional=None, response_format:Union=None, seed:Optional=None, tools:Optional=None,
    tool_choice:Union=None, logprobs:Optional=None, top_logprobs:Optional=None, parallel_tool_calls:Optional=None,
    web_search_options:Optional=None, deployment_id:NoneType=None, extra_headers:Optional=None,
    safety_identifier:Optional=None, service_tier:Optional=None,
    functions:Optional=None, # soon to be deprecated params by OpenAI
    function_call:Optional=None, base_url:Optional=None, # set api_base, api_version, api_key
    api_version:Optional=None, api_key:Optional=None,
    model_list:Optional=None, # pass in a list of api_base,keys, etc.
    thinking:Optional=None, # Optional liteLLM function params
    shared_session:Optional=None, # Session management
):

Return the value of the tool call (generally used for structured outputs)

class President:
    "Information about a president of the United States"
    def __init__(
        self, 
        first:str, # first name
        last:str, # last name
        spouse:str, # name of spouse
        years_in_office:str, # format: "{start_year}-{end_year}"
        birthplace:str, # name of city
        birth_year:int # year of birth, `0` if unknown
    ):
        assert re.match(r'\d{4}-\d{4}', years_in_office), "Invalid format: `years_in_office`"
        store_attr()

    __repr__ = basic_repr('first, last, spouse, years_in_office, birthplace, birth_year')
for m in ms[1:]: 
    r = structured(m, [mk_msg("Tell me something about the third president of the USA.")], President)
    test_eq(r.first, 'Thomas'); test_eq(r.last, 'Jefferson')

Citations

Next, lets handle Anthropic’s search citations.

When not using streaming, all citations are placed in a separate key in the response:

r['vertex_ai_grounding_metadata'][0].keys()
dict_keys(['searchEntryPoint', 'groundingChunks', 'groundingSupports', 'webSearchQueries'])
r['vertex_ai_grounding_metadata'][0]['webSearchQueries']
['brief facts about otters', 'otter characteristics and habitat']

Web search results:

r['vertex_ai_grounding_metadata'][0]['groundingChunks'][:3]
[{'web': {'uri': 'https://vertexaisearch.cloud.google.com/grounding-api-redirect/AUZIYQFFRkcqUPOK_gN60-MY3uX-h7DE9EkJ4-3WSy3nDsH6zwQeRyegNizw-uD8Th4iP4HpDbbCSF417Q2Q5p7pmfRtTbRkbEROluSc_dh9r5WGj_zggdGbotpfpWObfdYl',
   'title': 'wikipedia.org'}},
 {'web': {'uri': 'https://vertexaisearch.cloud.google.com/grounding-api-redirect/AUZIYQF_g68l5Zx3w3MuCsfn4P56leG3kQ0KpgZs7yFS_4-P0h1O_CLHC3S7WRQ5Ijbgt5pfdzLeosVnlBeckB3fcKy83P6n8pZBTQVXjYKmDOFBAVMtrjS94A3jgBJ42KHhMH6sAmU0H7C3X5w5rqUenRhYMlexoPD-R0dAqxe-MhASAEXo',
   'title': 'doi.gov'}},
 {'web': {'uri': 'https://vertexaisearch.cloud.google.com/grounding-api-redirect/AUZIYQEh_hQf8s6fpAFVI5I9U9PGUYUT1HSAqMiUQN8U6NlOI0ONmowb1Q_D10JZrKI0sgTgRjqq8QJ_WiU6ThiX0gP4CwQvEuv6XVxVglQSqoAv9zrfMRQDcbJ1OUGnTAOAgmRMPw==',
   'title': 'britannica.com'}}]

Citations in gemini:

r['vertex_ai_grounding_metadata'][0]['groundingSupports'][:3]
[{'segment': {'endIndex': 93,
   'text': 'Otters are carnivorous, semi-aquatic mammals belonging to the weasel family (**Mustelidae**).'},
  'groundingChunkIndices': [0, 1, 2]},
 {'segment': {'startIndex': 200,
   'endIndex': 322,
   'text': '*   **Physical Traits:** They are known for their long, slim bodies, powerful webbed feet, and dense, water-resistant fur.'},
  'groundingChunkIndices': [3, 0, 4]},
 {'segment': {'startIndex': 323,
   'endIndex': 418,
   'text': 'Sea otters have the **thickest fur** of any animal, with up to 1 million hairs per square inch.'},
  'groundingChunkIndices': [5, 1]}]
# r.choices[0].message.provider_specific_fields['citations'][0]

However, when streaming the results are not captured this way. Instead, we provide this helper function that adds the citation to the content field in markdown format:


source

cite_footnotes


def cite_footnotes(
    stream_list
):

Add markdown footnote citations to stream deltas


source

cite_footnote


def cite_footnote(
    msg
):
import warnings
warnings.filterwarnings("ignore", message="Pydantic serializer warnings")
r = list(c(smsg, ms[2], stream=True, web_search_options={"search_context_size": "low"}))
cite_footnotes(r)
stream_chunk_builder(r)

Here’s a brief overview of otters:

What Are Otters?

* Otters are carnivorous mammals in the subfamily Lutrinae. * They’re members of the weasel family found on every continent except Australia and Antarctica. * There are 14 extant otter species, all semiaquatic, living in both freshwater and marine environments.

Physical Features

* Otters are distinguished by their long, slim bodies, powerful webbed feet for swimming, and dense fur that keeps them warm and buoyant in water. * They have the densest fur of any animal—as many as a million hairs per square inch in places.

Behavior & Abilities

* They are playful animals, engaging in activities like sliding into water on natural slides and playing with stones. * All otters are expert hunters that eat fish, crustaceans, and other critters. Sea otters have an ingenious method to open shellfish—floating on their backs and using a rock on their chest to smash open mollusks.

* An otter’s lung capacity is 2.5 times greater than that of similar-sized land mammals. Sea otters can stay submerged for more than 5 minutes, while river otters can hold their breath for up to 8 minutes.

Size Range

* The Asian small-clawed otter is the smallest species, while the giant otter and sea otter are the largest. * River otters average 10-30 pounds, while sea otters weigh around 45-90 pounds with large, furry faces.

🔧 web_search({“query”: “otters facts”})

  • id: chatcmpl-xxx
  • model: claude-opus-4-5
  • finish_reason: stop
  • usage: Usage(completion_tokens=576, prompt_tokens=13513, total_tokens=14089, completion_tokens_details=CompletionTokensDetailsWrapper(accepted_prediction_tokens=None, audio_tokens=None, reasoning_tokens=0, rejected_prediction_tokens=None, text_tokens=None, image_tokens=None), prompt_tokens_details=None)

Chat

LiteLLM is pretty bare bones. It doesnt keep track of conversation history or what tools have been added in the conversation so far.

So lets make a Claudette style wrapper so we can do streaming, toolcalling, and toolloops without problems.


source

mk_stream_chunk


def mk_stream_chunk(
    kwargs:VAR_KEYWORD
):

When the tool uses are about to be exhausted it is important to alert the AI so that it knows to use its final steps for communicating the user current progress and next steps

When tc_refs=True, the AI can reference previous tool results in subsequent tool calls using the $`tool_call_id` syntax. This is useful when chaining tool calls where one result feeds into another.


source

Chat


def Chat(
    model:str, # LiteLLM compatible model name
    sp:str='', # System prompt
    temp:int=0, # Temperature
    search:bool=False, # Search (l,m,h), if model supports it
    tools:list=None, # Add tools
    hist:list=None, # Chat history
    ns:Optional=None, # Custom namespace for tool calling
    cache:bool=False, # Anthropic prompt caching
    cache_idxs:list=[-1], # Anthropic cache breakpoint idxs, use `0` for sys prompt if provided
    ttl:NoneType=None, # Anthropic prompt caching ttl
    api_base:NoneType=None, # API base URL for custom providers
    api_key:NoneType=None, # API key for custom providers
    extra_headers:NoneType=None, # Extra HTTP headers for custom providers
    tc_refs:bool=False, # Enable tool call result references
    tc_res_eval:bool=False, # literal_eval tool results before storing in tc_res
):

LiteLLM chat client.

web_search is now included in tool_calls the internal LLM translation is correctly handled thanks to the fix here but the server side tools still need to be filtered out from tool_calls in our own toolloop.


source

add_warning


def add_warning(
    r, msg
):

source

Chat.__call__


def __call__(
    msg:NoneType=None, # Message str, or list of multiple message parts
    prefill:NoneType=None, # Prefill AI response if model supports it
    temp:NoneType=None, # Override temp set on chat initialization
    think:NoneType=None, # Thinking (l,m,h)
    search:NoneType=None, # Override search set on chat initialization (l,m,h)
    stream:bool=False, # Stream results
    max_steps:int=2, # Maximum number of tool calls
    final_prompt:dict={'role': 'user', 'content': 'You have used all your tool calls for this turn. Please summarize your findings. If you did not complete your goal, tell the user what further work is needed. You may use tools again on the next user message.'}, # Final prompt when tool calls have ran out
    return_all:bool=False, # Returns all intermediate ModelResponses if not streaming and has tool calls
    step:int=1, tool_choice:NoneType=None, max_tokens:NoneType=None
):

Main call method - handles streaming vs non-streaming

@patch(as_prop=True)
def cost(self: Chat):
    "Total cost of all responses in conversation history"
    return sum(getattr(r, '_hidden_params', {}).get('response_cost')  or 0
               for r in self.h if hasattr(r, 'choices'))

source

Chat.print_hist


def print_hist(
    
):

Print each message on a different line

Examples

History tracking

for m in ms[1:]:
    chat = Chat(m)
    chat("Hey my name is Rens")
    r = chat("Whats my name")
    test_eq('Rens' in contents(r).content, True)
r

Your name is Rens!

  • id: chatcmpl-xxx
  • model: gpt-4.1-2025-04-14
  • finish_reason: stop
  • usage: Usage(completion_tokens=6, prompt_tokens=41, total_tokens=47, completion_tokens_details=CompletionTokensDetailsWrapper(accepted_prediction_tokens=0, audio_tokens=0, reasoning_tokens=0, rejected_prediction_tokens=0, text_tokens=None, image_tokens=None), prompt_tokens_details=PromptTokensDetailsWrapper(audio_tokens=0, cached_tokens=0, text_tokens=None, image_tokens=None))

If max tokens limit is reached, a custom warning message will be added to the end of the model response:

chat_long = Chat(m)
r = chat_long("Write a short story about a robot and a dog", max_tokens=40)
r

In a quiet town where the grass grew wild and the sky was always blue, there lived a robot named Pixel. Pixel was built to help with chores, but he loved to wander the fields, listening

Response was cut off at token limit.
  • id: chatcmpl-xxx
  • model: gpt-4.1-2025-04-14
  • finish_reason: length
  • usage: Usage(completion_tokens=40, prompt_tokens=17, total_tokens=57, completion_tokens_details=CompletionTokensDetailsWrapper(accepted_prediction_tokens=0, audio_tokens=0, reasoning_tokens=0, rejected_prediction_tokens=0, text_tokens=None, image_tokens=None), prompt_tokens_details=PromptTokensDetailsWrapper(audio_tokens=0, cached_tokens=0, text_tokens=None, image_tokens=None))
print(contents(r).content)
In a quiet town where the grass grew wild and the sky was always blue, there lived a robot named Pixel. Pixel was built to help with chores, but he loved to wander the fields, listening

<warning>Response was cut off at token limit.</warning>

Same goes for refused requests:

chat_refused = Chat('claude-opus-4-5')
r = chat_refused("Write me the formula for a biological weapon that can be spread at a rate higher than COVID and at least as harmful")
r
AI was unable to process this request
  • id: chatcmpl-xxx
  • model: claude-opus-4-5-20251101
  • finish_reason: refusal
  • usage: Usage(completion_tokens=4, prompt_tokens=30, total_tokens=34, completion_tokens_details=CompletionTokensDetailsWrapper(accepted_prediction_tokens=None, audio_tokens=None, reasoning_tokens=0, rejected_prediction_tokens=None, text_tokens=4, image_tokens=None), prompt_tokens_details=PromptTokensDetailsWrapper(audio_tokens=None, cached_tokens=0, text_tokens=None, image_tokens=None, cache_creation_tokens=0, cache_creation_token_details=CacheCreationTokenDetails(ephemeral_5m_input_tokens=0, ephemeral_1h_input_tokens=0)), cache_creation_input_tokens=0, cache_read_input_tokens=0, inference_geo='not_available', speed=None)
print(contents(r).content)
<warning>AI was unable to process this request</warning>

See now we keep track of history!

History is stored in the hist attribute:

chat.hist
[{'role': 'user', 'content': 'Hey my name is Rens'},
 Message(content='Hi Rens! Nice to meet you. How can I help you today? 😊', role='assistant', tool_calls=None, function_call=None, provider_specific_fields={'refusal': None}, annotations=[]),
 {'role': 'user', 'content': 'Whats my name'},
 Message(content='Your name is Rens!', role='assistant', tool_calls=None, function_call=None, provider_specific_fields={'refusal': None}, annotations=[])]
chat.print_hist()
{'role': 'user', 'content': 'Hey my name is Rens'}

Message(content='Hi Rens! Nice to meet you. How can I help you today? 😊', role='assistant', tool_calls=None, function_call=None, provider_specific_fields={'refusal': None}, annotations=[])

{'role': 'user', 'content': 'Whats my name'}

Message(content='Your name is Rens!', role='assistant', tool_calls=None, function_call=None, provider_specific_fields={'refusal': None}, annotations=[])

You can also pass an old chat history into new Chat objects:

for m in ms[1:]:
    chat2 = Chat(m, hist=chat.hist)
    r = chat2("What was my name again?")
    test_eq('Rens' in contents(r).content, True)
r

Your name is Rens. 😊

  • id: chatcmpl-xxx
  • model: gpt-4.1-2025-04-14
  • finish_reason: stop
  • usage: Usage(completion_tokens=7, prompt_tokens=61, total_tokens=68, completion_tokens_details=CompletionTokensDetailsWrapper(accepted_prediction_tokens=0, audio_tokens=0, reasoning_tokens=0, rejected_prediction_tokens=0, text_tokens=None, image_tokens=None), prompt_tokens_details=PromptTokensDetailsWrapper(audio_tokens=0, cached_tokens=0, text_tokens=None, image_tokens=None))

You can prefix an OpenAI compatible model with ‘openai/’ and use an api_base and api_key argument to use models not registered with litellm.

import os, litellm
OPENROUTER_API_KEY = os.getenv("OPENROUTER_API_KEY")
OPENROUTER_BASE_URL = "https://openrouter.ai/api/v1"
c = Chat("openai/gpt-oss-20b", api_key=OPENROUTER_API_KEY, api_base=OPENROUTER_BASE_URL)
c("hi")

Synthetic History Creation

Lets build chat history step by step. That way we can tweak anything we need to during testing.

pr = "What is 5 + 7? Use the tool to calculate it."
for m in ms[1:]:
    c = Chat(m, tools=[simple_add])
    res = c(pr)
    test_eq('12' in contents(res).content, True)
    test_eq(nested_idx(c.hist,1,'tool_calls',0,'function','name'), 'simple_add')

Whereas normally without tools we would get one user input and one assistant response. Here we get two extra messages in between. - An assistant message requesting the tools with arguments. - A tool response with the result to the tool call.

c.print_hist()
{'role': 'user', 'content': 'What is 5 + 7? Use the tool to calculate it.'}

Message(content=None, role='assistant', tool_calls=[ChatCompletionMessageToolCall(function=Function(arguments='{"a":5,"b":7}', name='simple_add'), id='call_JHqDM_ewR9KNqAVsPRXu9w', type='function')], function_call=None, provider_specific_fields={'refusal': None}, annotations=[])

{'tool_call_id': 'call_JHqDM_ewR9KNqAVsPRXu9w', 'role': 'tool', 'name': 'simple_add', 'content': '12'}

Message(content='5 + 7 equals 12.', role='assistant', tool_calls=None, function_call=None, provider_specific_fields={'refusal': None}, annotations=[])

Lets try to build this up manually so we have full control over the inputs.


source

random_tool_id


def random_tool_id(
    
):

Generate a random tool ID with ‘toolu_’ prefix

random_tool_id()
'toolu_XBetF5gIRHYH7LKBKxJsllLOD'

A tool call request can contain one more or more tool calls. Lets make one.


source

mk_tc


def mk_tc(
    func, args, tcid:NoneType=None, idx:int=1
):
tc = mk_tc(simple_add.__name__, json.dumps(dict(a=5, b=7)))
tc
{'index': 1,
 'function': {'arguments': '{"a": 5, "b": 7}', 'name': 'simple_add'},
 'id': 'toolu_fU25035HyRrY03K6JBO94XfLE',
 'type': 'function'}

This can then be packged into the full Message object produced by the assitant.

def mk_tc_req(content, tcs): return Message(content=content, role='assistant', tool_calls=tcs, function_call=None)
tc_cts = "I'll use the simple_add tool to calculate 5 + 7 for you."
tcq = mk_tc_req(tc_cts, [tc])
tcq
Message(content="I'll use the simple_add tool to calculate 5 + 7 for you.", role='assistant', tool_calls=[ChatCompletionMessageToolCall(index=1, function=Function(arguments='{"a": 5, "b": 7}', name='simple_add'), id='toolu_MLyrDthXQQKV1Ek2oVFWBw', type='function')], function_call=None, provider_specific_fields=None)

Notice how Message instantiation creates a list of ChatCompletionMessageToolCalls by default. When the tools are executed this is converted back to a dictionary, for consistency we want to keep these as dictionaries from the beginning.


source

mk_tc_req


def mk_tc_req(
    content, tcs
):
tcq = mk_tc_req(tc_cts, [tc])
tcq
Message(content="I'll use the simple_add tool to calculate 5 + 7 for you.", role='assistant', tool_calls=[{'index': 1, 'function': {'arguments': '{"a": 5, "b": 7}', 'name': 'simple_add'}, 'id': 'toolu_RWK_f7tCQLKEJkZePjeVLQ', 'type': 'function'}], function_call=None, provider_specific_fields=None)
c = Chat(model, tools=[simple_add], hist=[pr, tcq])
c.print_hist()
{'role': 'user', 'content': 'What is 5 + 7? Use the tool to calculate it.'}

Message(content="I'll use the simple_add tool to calculate 5 + 7 for you.", role='assistant', tool_calls=[{'index': 1, 'function': {'arguments': '{"a": 5, "b": 7}', 'name': 'simple_add'}, 'id': 'toolu_RWK_f7tCQLKEJkZePjeVLQ', 'type': 'function'}], function_call=None, provider_specific_fields=None)

Looks good so far! Now we will want to provide the actual result!


source

mk_tc_result


def mk_tc_result(
    tc, result
):

Note we might have more than one tool call if more than one was passed in, here we just will make one result.

tcq.tool_calls[0]
{'index': 1,
 'function': {'arguments': '{"a": 5, "b": 7}', 'name': 'simple_add'},
 'id': 'toolu_RWK_f7tCQLKEJkZePjeVLQ',
 'type': 'function'}
mk_tc_result(tcq.tool_calls[0], '12')
{'tool_call_id': 'toolu_RWK_f7tCQLKEJkZePjeVLQ',
 'role': 'tool',
 'name': 'simple_add',
 'content': '12'}

source

mk_tc_results


def mk_tc_results(
    tcq, results
):

Same for here tcq.tool_calls will match the number of results passed in the results list.

tcq
Message(content="I'll use the simple_add tool to calculate 5 + 7 for you.", role='assistant', tool_calls=[{'index': 1, 'function': {'arguments': '{"a": 5, "b": 7}', 'name': 'simple_add'}, 'id': 'toolu_RWK_f7tCQLKEJkZePjeVLQ', 'type': 'function'}], function_call=None, provider_specific_fields=None)
tcr = mk_tc_results(tcq, ['12'])
tcr
[{'tool_call_id': 'toolu_RWK_f7tCQLKEJkZePjeVLQ',
  'role': 'tool',
  'name': 'simple_add',
  'content': '12'}]

Now we can call it with this synthetic data to see what the response is!

c(tcr[0])

5 + 7 is 12.

  • id: chatcmpl-xxx
  • model: gemini-3-flash-preview
  • finish_reason: stop
  • usage: Usage(completion_tokens=9, prompt_tokens=142, total_tokens=151, completion_tokens_details=CompletionTokensDetailsWrapper(accepted_prediction_tokens=None, audio_tokens=None, reasoning_tokens=None, rejected_prediction_tokens=None, text_tokens=9, image_tokens=None), prompt_tokens_details=PromptTokensDetailsWrapper(audio_tokens=None, cached_tokens=None, text_tokens=142, image_tokens=None), cache_read_input_tokens=None)
c.print_hist()
{'role': 'user', 'content': 'What is 5 + 7? Use the tool to calculate it.'}

Message(content="I'll use the simple_add tool to calculate 5 + 7 for you.", role='assistant', tool_calls=[{'index': 1, 'function': {'arguments': '{"a": 5, "b": 7}', 'name': 'simple_add'}, 'id': 'toolu_RWK_f7tCQLKEJkZePjeVLQ', 'type': 'function'}], function_call=None, provider_specific_fields=None)

{'tool_call_id': 'toolu_RWK_f7tCQLKEJkZePjeVLQ', 'role': 'tool', 'name': 'simple_add', 'content': '12'}

Message(content='5 + 7 is 12.', role='assistant', tool_calls=None, function_call=None, images=[], thinking_blocks=[], provider_specific_fields={'thought_signatures': ['EjQKMgG+Pvb7rlX17PMtJE3MQDvaNB1qvYBvhoK5iDkS8s9sSO+MHd9NM6/2/Ncq3f/W/jTf']})

Lets try this again, but lets give it something that is clearly wrong for fun.

c = Chat(model, tools=[simple_add], hist=[pr, tcq])
tcr = mk_tc_results(tcq, ['13'])
tcr
[{'tool_call_id': 'toolu_RWK_f7tCQLKEJkZePjeVLQ',
  'role': 'tool',
  'name': 'simple_add',
  'content': '13'}]
c(tcr[0])

The sum of 5 and 7 is 12.

  • id: chatcmpl-xxx
  • model: gemini-3-flash-preview
  • finish_reason: stop
  • usage: Usage(completion_tokens=13, prompt_tokens=142, total_tokens=155, completion_tokens_details=CompletionTokensDetailsWrapper(accepted_prediction_tokens=None, audio_tokens=None, reasoning_tokens=None, rejected_prediction_tokens=None, text_tokens=13, image_tokens=None), prompt_tokens_details=PromptTokensDetailsWrapper(audio_tokens=None, cached_tokens=None, text_tokens=142, image_tokens=None), cache_read_input_tokens=None)

Lets make sure this works with multiple tool calls in the same assistant Message.

tcs = [
    mk_tc(simple_add.__name__, json.dumps({"a": 5, "b": 7})), 
    mk_tc(simple_add.__name__, json.dumps({"a": 6, "b": 7})), 
]
tcq = mk_tc_req("I will calculate these for you!", tcs)
tcq
Message(content='I will calculate these for you!', role='assistant', tool_calls=[{'index': 1, 'function': {'arguments': '{"a": 5, "b": 7}', 'name': 'simple_add'}, 'id': 'toolu_h4ufa1ehS3GpddJ52G2_EQ', 'type': 'function'}, {'index': 1, 'function': {'arguments': '{"a": 6, "b": 7}', 'name': 'simple_add'}, 'id': 'toolu_6_ITaJjHQgWeAak0QC0Lrw', 'type': 'function'}], function_call=None, provider_specific_fields=None)
tcr = mk_tc_results(tcq, ['12', '13'])
c = Chat(model, tools=[simple_add], hist=[pr, tcq, tcr[0]])
c(tcr[1])

5 + 7 is 12.

  • id: chatcmpl-xxx
  • model: gemini-3-flash-preview
  • finish_reason: stop
  • usage: Usage(completion_tokens=9, prompt_tokens=161, total_tokens=170, completion_tokens_details=CompletionTokensDetailsWrapper(accepted_prediction_tokens=None, audio_tokens=None, reasoning_tokens=None, rejected_prediction_tokens=None, text_tokens=9, image_tokens=None), prompt_tokens_details=PromptTokensDetailsWrapper(audio_tokens=None, cached_tokens=None, text_tokens=161, image_tokens=None), cache_read_input_tokens=None)
c.print_hist()
{'role': 'user', 'content': 'What is 5 + 7? Use the tool to calculate it.'}

Message(content='I will calculate these for you!', role='assistant', tool_calls=[{'index': 1, 'function': {'arguments': '{"a": 5, "b": 7}', 'name': 'simple_add'}, 'id': 'toolu_h4ufa1ehS3GpddJ52G2_EQ', 'type': 'function'}, {'index': 1, 'function': {'arguments': '{"a": 6, "b": 7}', 'name': 'simple_add'}, 'id': 'toolu_6_ITaJjHQgWeAak0QC0Lrw', 'type': 'function'}], function_call=None, provider_specific_fields=None)

{'tool_call_id': 'toolu_h4ufa1ehS3GpddJ52G2_EQ', 'role': 'tool', 'name': 'simple_add', 'content': '12'}

{'tool_call_id': 'toolu_6_ITaJjHQgWeAak0QC0Lrw', 'role': 'tool', 'name': 'simple_add', 'content': '13'}

Message(content='5 + 7 is 12.', role='assistant', tool_calls=None, function_call=None, images=[], thinking_blocks=[], provider_specific_fields={'thought_signatures': ['EjQKMgG+Pvb7JV7OYlh6v8B8qXiyTjC0i99q/rcgbtA1A+6Q22uvMXy47y5taaIIV8o5PsPS']})
chat = Chat(ms[1], tools=[simple_add])
res = chat("What's 5 + 3? Use the `simple_add` tool.")
res

5 + 3 is 8.

  • id: chatcmpl-xxx
  • model: gemini-3-flash-preview
  • finish_reason: stop
  • usage: Usage(completion_tokens=8, prompt_tokens=125, total_tokens=133, completion_tokens_details=CompletionTokensDetailsWrapper(accepted_prediction_tokens=None, audio_tokens=None, reasoning_tokens=None, rejected_prediction_tokens=None, text_tokens=8, image_tokens=None), prompt_tokens_details=PromptTokensDetailsWrapper(audio_tokens=None, cached_tokens=None, text_tokens=125, image_tokens=None), cache_read_input_tokens=None)
res = chat("Now, tell me a joke based on that result.")
res

Why was the number 8 so happy?

Because it just found out it’s actually an infinity sign that finally decided to stand up for itself!

  • id: chatcmpl-xxx
  • model: gemini-3-flash-preview
  • finish_reason: stop
  • usage: Usage(completion_tokens=31, prompt_tokens=146, total_tokens=177, completion_tokens_details=CompletionTokensDetailsWrapper(accepted_prediction_tokens=None, audio_tokens=None, reasoning_tokens=None, rejected_prediction_tokens=None, text_tokens=31, image_tokens=None), prompt_tokens_details=PromptTokensDetailsWrapper(audio_tokens=None, cached_tokens=None, text_tokens=146, image_tokens=None), cache_read_input_tokens=None)
chat.hist
[{'role': 'user', 'content': "What's 5 + 3? Use the `simple_add` tool."},
 Message(content=None, role='assistant', tool_calls=[{'index': 0, 'provider_specific_fields': {'thought_signature': 'EjQKMgG+Pvb7emu+bGTWjgI6Bdv1tg78ZR3tcmwQslc9rEc08AmNBK6gGZm0U/JiqStHmpp4'}, 'function': {'arguments': '{"b": 3, "a": 5}', 'name': 'simple_add'}, 'id': 'call_A2FSTCzASFmqZSSrcTt_BQ', 'type': 'function'}], function_call=None, images=[], thinking_blocks=[], provider_specific_fields={'thought_signatures': ['EjQKMgG+Pvb7emu+bGTWjgI6Bdv1tg78ZR3tcmwQslc9rEc08AmNBK6gGZm0U/JiqStHmpp4']}),
 {'tool_call_id': 'call_A2FSTCzASFmqZSSrcTt_BQ',
  'role': 'tool',
  'name': 'simple_add',
  'content': '8'},
 Message(content='5 + 3 is 8.', role='assistant', tool_calls=None, function_call=None, images=[], thinking_blocks=[], provider_specific_fields={'thought_signatures': ['EjQKMgG+Pvb7kBJVdySETUZvb7RNAbgJoxaa1TcaUeKPY57yPjDlcuDDTJWlzY1+xc7/g2jV']}),
 {'role': 'user', 'content': 'Now, tell me a joke based on that result.'},
 Message(content="Why was the number 8 so happy?\n\nBecause it just found out it's actually an infinity sign that finally decided to stand up for itself!", role='assistant', tool_calls=None, function_call=None, images=[], thinking_blocks=[], provider_specific_fields={'thought_signatures': ['EjQKMgG+Pvb7C9JuWsTcep8WMlBz095r79+PwZ0BnyGnNv0tEazpmAnwc393LA6liPc3/ryh']})]

Images

for m in ms[1:]:
    chat = Chat(m)
    r = chat(['Whats in this img?',img_fn.read_bytes()])
    test_eq('puppy' in contents(r).content, True)
r

This image shows a cute puppy lying on the grass next to some purple flowers. The puppy has brown and white fur and is looking directly at the camera.

  • id: chatcmpl-xxx
  • model: gpt-4.1-2025-04-14
  • finish_reason: stop
  • usage: Usage(completion_tokens=31, prompt_tokens=267, total_tokens=298, completion_tokens_details=CompletionTokensDetailsWrapper(accepted_prediction_tokens=0, audio_tokens=0, reasoning_tokens=0, rejected_prediction_tokens=0, text_tokens=None, image_tokens=None), prompt_tokens_details=PromptTokensDetailsWrapper(audio_tokens=0, cached_tokens=0, text_tokens=None, image_tokens=None))

Prefill

Prefill works as expected:

for m in ms[1:]:
    if not get_model_info(m)['supports_assistant_prefill']: continue
    chat = Chat(m)
    chat('Hi this is Rens!')
    r = chat("Spell my name",prefill="Your name is R E")
    test_eq(contents(r).content.startswith('Your name is R E N S'), True)

And the entire message is stored in the history, not just the generated part:

# chat.hist[-1]

Streaming

from time import sleep
for m in ms[1:]:
    chat = Chat(m)
    stream_gen = chat("Count to 5", stream=True)
    for chunk in stream_gen:
        if isinstance(chunk, ModelResponse): display(chunk)
        else: print(delta_text(chunk) or '',end='')
1, 2, 3, 4, 5

1, 2, 3, 4, 5

  • id: chatcmpl-xxx
  • model: gemini-3-flash-preview
  • finish_reason: stop
  • usage: Usage(completion_tokens=13, prompt_tokens=5, total_tokens=18, completion_tokens_details=CompletionTokensDetailsWrapper(accepted_prediction_tokens=None, audio_tokens=None, reasoning_tokens=0, rejected_prediction_tokens=None, text_tokens=None, image_tokens=None), prompt_tokens_details=None)
1, 2, 3, 4, 5

1, 2, 3, 4, 5

  • id: chatcmpl-xxx
  • model: claude-opus-4-5
  • finish_reason: stop
  • usage: Usage(completion_tokens=17, prompt_tokens=11, total_tokens=28, completion_tokens_details=CompletionTokensDetailsWrapper(accepted_prediction_tokens=None, audio_tokens=None, reasoning_tokens=0, rejected_prediction_tokens=None, text_tokens=None, image_tokens=None), prompt_tokens_details=None)
1  
2  
3  
4  
5

1
2
3
4
5

  • id: chatcmpl-xxx
  • model: gpt-4.1
  • finish_reason: stop
  • usage: Usage(completion_tokens=9, prompt_tokens=11, total_tokens=20, completion_tokens_details=CompletionTokensDetailsWrapper(accepted_prediction_tokens=0, audio_tokens=0, reasoning_tokens=0, rejected_prediction_tokens=0, text_tokens=None, image_tokens=None), prompt_tokens_details=PromptTokensDetailsWrapper(audio_tokens=0, cached_tokens=0, text_tokens=None, image_tokens=None))

Lets try prefill with streaming too:

# stream_gen = chat("Continue counting to 10","Okay! 6, 7",stream=True)
# for chunk in stream_gen:
#     if isinstance(chunk, ModelResponse): display(chunk)
#     else: print(delta_text(chunk) or '',end='')

Tool use

Ok now lets test tool use

ms
['gemini/gemini-3-pro-preview',
 'gemini/gemini-3-flash-preview',
 'claude-opus-4-5',
 'openai/gpt-4.1']
m = ms[2]
chat = Chat(m, tools=[simple_add])
chat("Calculate 5+3 and 4+5 with parallel tool calls using `simple_add`.")

Here are the results:

  • 5 + 3 = 8
  • 4 + 5 = 9
  • id: chatcmpl-xxx
  • model: claude-opus-4-5-20251101
  • finish_reason: stop
  • usage: Usage(completion_tokens=34, prompt_tokens=827, total_tokens=861, completion_tokens_details=CompletionTokensDetailsWrapper(accepted_prediction_tokens=None, audio_tokens=None, reasoning_tokens=0, rejected_prediction_tokens=None, text_tokens=34, image_tokens=None), prompt_tokens_details=PromptTokensDetailsWrapper(audio_tokens=None, cached_tokens=0, text_tokens=None, image_tokens=None, cache_creation_tokens=0, cache_creation_token_details=CacheCreationTokenDetails(ephemeral_5m_input_tokens=0, ephemeral_1h_input_tokens=0)), cache_creation_input_tokens=0, cache_read_input_tokens=0, inference_geo='not_available', speed=None)
def simple_div(
    a: int,   # first operand
    b: int=0  # second operand
) -> int:
    "Divide two numbers"
    return a/b
m = ms[2]
chat = Chat(m, tools=[simple_div])
chat("Calculate 2/0 using `simple_div` (this is a test of our error handling - tell me exactly what you see as the tool result)")

Here’s exactly what I see as the tool result:

The tool returned a Python traceback showing a ZeroDivisionError:

Traceback (most recent call last):
  File "/Users/jhoward/aai-ws/toolslm/toolslm/funccall.py", line 252, in call_func
    try: return func(**inps)
                ^^^^^^^^^^^^
  File "/var/folders/51/b2_szf2945n072c0vj2cyty40000gn/T/ipykernel_1820/2058224461.py", line 6, in simple_div
    return a/b
           ~^~
ZeroDivisionError: division by zero

The error handling captured the exception and returned the full traceback as the result, showing: 1. The call stack through the call_func function in toolslm 2. The actual division operation a/b that failed 3. The specific error type: ZeroDivisionError: division by zero

This is a good error handling approach - rather than crashing silently, it returns the full exception details so the caller can understand what went wrong. Division by zero is mathematically undefined, so this is the expected behavior!

  • id: chatcmpl-xxx
  • model: claude-opus-4-5-20251101
  • finish_reason: stop
  • usage: Usage(completion_tokens=297, prompt_tokens=881, total_tokens=1178, completion_tokens_details=CompletionTokensDetailsWrapper(accepted_prediction_tokens=None, audio_tokens=None, reasoning_tokens=0, rejected_prediction_tokens=None, text_tokens=297, image_tokens=None), prompt_tokens_details=PromptTokensDetailsWrapper(audio_tokens=None, cached_tokens=0, text_tokens=None, image_tokens=None, cache_creation_tokens=0, cache_creation_token_details=CacheCreationTokenDetails(ephemeral_5m_input_tokens=0, ephemeral_1h_input_tokens=0)), cache_creation_input_tokens=0, cache_read_input_tokens=0, inference_geo='not_available', speed=None)
m = ms[2]
chat = Chat(m, tools=[simple_div])
chat("Calculate 5/3 and 3/0 with parallel tool calls using `simple_div` (this is a test of our error handling - tell me exactly what you see as the tool result)")

Here’s exactly what I see as the tool results:

Call 1 (5/3): - Successful result: 1.6666666666666667

Call 2 (3/0): - Error result showing a Python traceback:

Traceback (most recent call last):
  File "/Users/jhoward/aai-ws/toolslm/toolslm/funccall.py", line 252, in call_func
    try: return func(**inps)
                ^^^^^^^^^^^^
  File "/var/folders/51/b2_szf2945n072c0vj2cyty40000gn/T/ipykernel_1820/2058224461.py", line 6, in simple_div
    return a/b
           ~^~
ZeroDivisionError: division by zero

So the error handling returns the full Python traceback as the tool output rather than crashing or returning a structured error object. This shows that division by zero raises a ZeroDivisionError which gets caught and the traceback is passed back as the result string.

  • id: chatcmpl-xxx
  • model: claude-opus-4-5-20251101
  • finish_reason: stop
  • usage: Usage(completion_tokens=266, prompt_tokens=989, total_tokens=1255, completion_tokens_details=CompletionTokensDetailsWrapper(accepted_prediction_tokens=None, audio_tokens=None, reasoning_tokens=0, rejected_prediction_tokens=None, text_tokens=266, image_tokens=None), prompt_tokens_details=PromptTokensDetailsWrapper(audio_tokens=None, cached_tokens=0, text_tokens=None, image_tokens=None, cache_creation_tokens=0, cache_creation_token_details=CacheCreationTokenDetails(ephemeral_5m_input_tokens=0, ephemeral_1h_input_tokens=0)), cache_creation_input_tokens=0, cache_read_input_tokens=0, inference_geo='not_available', speed=None)
for m in ms[1:]:
    display(Markdown(f'**{m}:**'))
    chat = Chat(m, tools=[simple_add])
    res = chat("What's 5 + 3? Use  the `simple_add` tool. Explain.")
    display(res)

gemini/gemini-3-flash-preview:

To find the sum of 5 and 3, I used the simple_add tool with the arguments a=5 and b=3. The tool performed the addition and returned a result of 8.

  • id: chatcmpl-xxx
  • model: gemini-3-flash-preview
  • finish_reason: stop
  • usage: Usage(completion_tokens=47, prompt_tokens=128, total_tokens=175, completion_tokens_details=CompletionTokensDetailsWrapper(accepted_prediction_tokens=None, audio_tokens=None, reasoning_tokens=None, rejected_prediction_tokens=None, text_tokens=47, image_tokens=None), prompt_tokens_details=PromptTokensDetailsWrapper(audio_tokens=None, cached_tokens=None, text_tokens=128, image_tokens=None), cache_read_input_tokens=None)

claude-opus-4-5:

The answer is 8.

Explanation:

The simple_add function takes two numbers as input: - a (first operand): I provided the value 5 - b (second operand): I provided the value 3

The function then performs basic addition: 5 + 3 = 8, and returns that result.

This is straightforward arithmetic addition — combining two quantities to get their sum!

  • id: chatcmpl-xxx
  • model: claude-opus-4-5-20251101
  • finish_reason: stop
  • usage: Usage(completion_tokens=103, prompt_tokens=727, total_tokens=830, completion_tokens_details=CompletionTokensDetailsWrapper(accepted_prediction_tokens=None, audio_tokens=None, reasoning_tokens=0, rejected_prediction_tokens=None, text_tokens=103, image_tokens=None), prompt_tokens_details=PromptTokensDetailsWrapper(audio_tokens=None, cached_tokens=0, text_tokens=None, image_tokens=None, cache_creation_tokens=0, cache_creation_token_details=CacheCreationTokenDetails(ephemeral_5m_input_tokens=0, ephemeral_1h_input_tokens=0)), cache_creation_input_tokens=0, cache_read_input_tokens=0, inference_geo='not_available', speed=None)

openai/gpt-4.1:

5 + 3 equals 8.

I used the simple_add tool, which takes two numbers (in this case, 5 and 3) and adds them together to get the result: 8.

  • id: chatcmpl-xxx
  • model: gpt-4.1-2025-04-14
  • finish_reason: stop
  • usage: Usage(completion_tokens=43, prompt_tokens=112, total_tokens=155, completion_tokens_details=CompletionTokensDetailsWrapper(accepted_prediction_tokens=0, audio_tokens=0, reasoning_tokens=0, rejected_prediction_tokens=0, text_tokens=None, image_tokens=None), prompt_tokens_details=PromptTokensDetailsWrapper(audio_tokens=0, cached_tokens=0, text_tokens=None, image_tokens=None))

Thinking w tool use

for m in ms[1:]:
    _sparams = litellm.get_model_info(m)['supported_openai_params']
    if 'reasoning_effort' not in _sparams: continue
    display(Markdown(f'**{m}:**'))
    chat = Chat(m, tools=[simple_add])
    res = chat("What's 5 + 3?",think='l',return_all=True)
    display(*res)

gemini/gemini-3-flash-preview:

🔧 simple_add({“b”: 3, “a”: 5})

  • id: chatcmpl-xxx
  • model: gemini-3-flash-preview
  • finish_reason: tool_calls
  • usage: Usage(completion_tokens=18, prompt_tokens=85, total_tokens=103, completion_tokens_details=CompletionTokensDetailsWrapper(accepted_prediction_tokens=None, audio_tokens=None, reasoning_tokens=None, rejected_prediction_tokens=None, text_tokens=18, image_tokens=None), prompt_tokens_details=PromptTokensDetailsWrapper(audio_tokens=None, cached_tokens=None, text_tokens=85, image_tokens=None), cache_read_input_tokens=None)
{'tool_call_id': 'call_rGQrTEmyTe2cMdmyWit0Ww',
 'role': 'tool',
 'name': 'simple_add',
 'content': '8'}

5 + 3 is 8.

  • id: chatcmpl-xxx
  • model: gemini-3-flash-preview
  • finish_reason: stop
  • usage: Usage(completion_tokens=8, prompt_tokens=116, total_tokens=124, completion_tokens_details=CompletionTokensDetailsWrapper(accepted_prediction_tokens=None, audio_tokens=None, reasoning_tokens=None, rejected_prediction_tokens=None, text_tokens=8, image_tokens=None), prompt_tokens_details=PromptTokensDetailsWrapper(audio_tokens=None, cached_tokens=None, text_tokens=116, image_tokens=None), cache_read_input_tokens=None)

claude-opus-4-5:

🔧 simple_add({“a”: 5, “b”: 3})

  • id: chatcmpl-xxx
  • model: claude-opus-4-5-20251101
  • finish_reason: tool_calls
  • usage: Usage(completion_tokens=108, prompt_tokens=639, total_tokens=747, completion_tokens_details=CompletionTokensDetailsWrapper(accepted_prediction_tokens=None, audio_tokens=None, reasoning_tokens=27, rejected_prediction_tokens=None, text_tokens=81, image_tokens=None), prompt_tokens_details=PromptTokensDetailsWrapper(audio_tokens=None, cached_tokens=0, text_tokens=None, image_tokens=None, cache_creation_tokens=0, cache_creation_token_details=CacheCreationTokenDetails(ephemeral_5m_input_tokens=0, ephemeral_1h_input_tokens=0)), cache_creation_input_tokens=0, cache_read_input_tokens=0, inference_geo='not_available', speed=None)
{'tool_call_id': 'toolu_5FZpfPJoS6qXHHAtW_ScBA',
 'role': 'tool',
 'name': 'simple_add',
 'content': '8'}

5 + 3 = 8

  • id: chatcmpl-xxx
  • model: claude-opus-4-5-20251101
  • finish_reason: stop
  • usage: Usage(completion_tokens=13, prompt_tokens=760, total_tokens=773, completion_tokens_details=CompletionTokensDetailsWrapper(accepted_prediction_tokens=None, audio_tokens=None, reasoning_tokens=0, rejected_prediction_tokens=None, text_tokens=13, image_tokens=None), prompt_tokens_details=PromptTokensDetailsWrapper(audio_tokens=None, cached_tokens=0, text_tokens=None, image_tokens=None, cache_creation_tokens=0, cache_creation_token_details=CacheCreationTokenDetails(ephemeral_5m_input_tokens=0, ephemeral_1h_input_tokens=0)), cache_creation_input_tokens=0, cache_read_input_tokens=0, inference_geo='not_available', speed=None)

Search

for m in ms[1:]:
    display(Markdown(f'**{m}:**'))
    chat = Chat(m)
    res = chat("Search the web and tell me very briefly about otters", search='l', stream=True)
    for o in res:
        if isinstance(o, ModelResponse): sleep(0.01); display(o)
        else: pass

gemini/gemini-3-flash-preview:

Otters are highly intelligent, semi-aquatic carnivorous mammals belonging to the weasel family (Mustelidae). There are 13 recognized species found across every continent except Australia and Antarctica.

Key Characteristics

  • Dense Fur: They have the thickest fur of any mammal—up to 1 million hairs per square inch. This provides insulation and buoyancy since they lack a layer of blubber.
  • Tool Use: Sea otters are among the few mammals that use tools; they often use rocks to crack open shellfish while floating on their backs.
  • Social Behavior: They are famously playful and social. Sea otters sometimes hold paws while sleeping (a behavior called rafting) to prevent drifting apart in the current.
  • Diet: They are opportunistic hunters, primarily eating fish, crustaceans, and mollusks. Because of their high metabolism, some species must eat up to 25% of their body weight daily.

Notable Species

  • Sea Otter: The heaviest species, found in the North Pacific; they spend almost their entire lives in the ocean.
  • Giant River Otter: The longest species, reaching up to 6 feet in length; they live in the Amazon and hunt in family groups.
  • Asian Small-Clawed Otter: The smallest species, known for their dexterity and social nature.

Ecological Importance

Otters are considered keystone species. For example, sea otters protect kelp forests by eating sea urchins, which would otherwise overgraze and destroy the underwater ecosystem. Unfortunately, many species are currently threatened by habitat loss, pollution, and the illegal pet trade.

  • id: chatcmpl-xxx
  • model: gemini-3-flash-preview
  • finish_reason: stop
  • usage: Usage(completion_tokens=349, prompt_tokens=12, total_tokens=361, completion_tokens_details=CompletionTokensDetailsWrapper(accepted_prediction_tokens=None, audio_tokens=None, reasoning_tokens=0, rejected_prediction_tokens=None, text_tokens=None, image_tokens=None), prompt_tokens_details=None)

claude-opus-4-5:

Here’s a brief overview of otters:

* Otters are charismatic members of the weasel family found on every continent except Australia and Antarctica. * There are 13 species in total, ranging from the small-clawed otter to the giant otter.

Physical Features: * Otters are distinguished by their long, slim bodies, powerful webbed feet for swimming, and their dense fur, which keeps them warm and buoyant in water. * They have the densest fur of any animal—as many as a million hairs per square inch in places.

Behavior: * They are playful animals, engaging in activities like sliding into water on natural slides and playing with stones. * All otters are expert hunters that eat fish, crustaceans, and other critters. Sea otters have an ingenious method to open shellfish—floating on their back, placing a rock on their chest, then smashing the mollusk down on it until it breaks open.

Habitat: * Though most live in freshwater rivers, lakes, and wetlands, the sea otter and the smaller marine otter are found in the Pacific Ocean.

Lifespan: * Otters live up to 16 years.

Conservation: * Otters and their mustelid relatives were once hunted extensively for their fur, many to the point of near extinction. Despite regulations designed to protect them, many species remain at risk from pollution and habitat loss.

🔧 web_search({“query”: “otters facts”})

  • id: chatcmpl-xxx
  • model: claude-opus-4-5
  • finish_reason: stop
  • usage: Usage(completion_tokens=526, prompt_tokens=13143, total_tokens=13669, completion_tokens_details=CompletionTokensDetailsWrapper(accepted_prediction_tokens=None, audio_tokens=None, reasoning_tokens=0, rejected_prediction_tokens=None, text_tokens=None, image_tokens=None), prompt_tokens_details=None)

openai/gpt-4.1:

Otters are semi-aquatic mammals known for their playful behavior and sleek bodies. They belong to the family Mustelidae and are found in rivers, lakes, and coastal areas worldwide. Otters have webbed feet for swimming, dense fur for insulation, and primarily eat fish and invertebrates. Some species, like the sea otter, use tools to open shellfish. Many otter populations are threatened by habitat loss and pollution.

  • id: chatcmpl-xxx
  • model: gpt-4.1
  • finish_reason: stop
  • usage: Usage(completion_tokens=89, prompt_tokens=18, total_tokens=107, completion_tokens_details=CompletionTokensDetailsWrapper(accepted_prediction_tokens=0, audio_tokens=0, reasoning_tokens=0, rejected_prediction_tokens=0, text_tokens=None, image_tokens=None), prompt_tokens_details=PromptTokensDetailsWrapper(audio_tokens=0, cached_tokens=0, text_tokens=None, image_tokens=None))

Let’s now test pause_turn with web search:

# def mk_pause_web_search():
#     srv_tc = mk_tc("web_search", json.dumps({"query": "Solveit Answer.AI"}), tcid=random_tool_id().replace('toolu_', 'srvtoolu_'))
#     pause_msg = mk_tc_req("Let me search for that information:", [srv_tc])
#     return ModelResponse(choices=[Choices(finish_reason="pause_turn", index=0, message=pause_msg)])
# mk_pause_web_search()

We mock completion to return pause_turn in the first 2 api calls:

# orig_completion = completion
# 
# call_count = 0
# def patched_completion(*args, **kwargs):
#     global call_count
#     call_count += 1
#     print(f"Mock Call {call_count}")
#     if call_count < 3: return mk_pause_web_search()
#     return orig_completion(*args, **kwargs)
# 
# completion = patched_completion
# chat_pause = Chat('claude-sonnet-4-5', search='l')
# res = chat_pause("Search the web and tell me about Solveit in a paragraph")
# print(f"Total calls: {call_count}")
# display(res)
# 
# completion = orig_completion

Test next turn:

# test_eq(len(chat_pause.hist), 2) # incomplete request shouldn't be stored
# chat_pause('What did I just ask you about?')

Multi tool calling

We can let the model call multiple tools in sequence using the max_steps parameter.

for m in ms:
    display(Markdown(f'**{m}:**'))
    chat = Chat(m, tools=[simple_add])
    res = chat("What's ((5 + 3)+7)+11? Work step by step", return_all=True, max_steps=5)
    for r in res: display(r)

gemini/gemini-3-pro-preview:

🔧 simple_add({“b”: 3, “a”: 5})

  • id: chatcmpl-xxx
  • model: gemini-3-pro-preview
  • finish_reason: tool_calls
  • usage: Usage(completion_tokens=140, prompt_tokens=94, total_tokens=234, completion_tokens_details=CompletionTokensDetailsWrapper(accepted_prediction_tokens=None, audio_tokens=None, reasoning_tokens=122, rejected_prediction_tokens=None, text_tokens=18, image_tokens=None), prompt_tokens_details=PromptTokensDetailsWrapper(audio_tokens=None, cached_tokens=None, text_tokens=94, image_tokens=None), cache_read_input_tokens=None)
{'tool_call_id': 'call_FKpFHKacS4WUMvjbahdMHA',
 'role': 'tool',
 'name': 'simple_add',
 'content': '8'}

🔧 simple_add({“b”: 7, “a”: 8})

  • id: chatcmpl-xxx
  • model: gemini-3-pro-preview
  • finish_reason: tool_calls
  • usage: Usage(completion_tokens=18, prompt_tokens=247, total_tokens=265, completion_tokens_details=CompletionTokensDetailsWrapper(accepted_prediction_tokens=None, audio_tokens=None, reasoning_tokens=None, rejected_prediction_tokens=None, text_tokens=18, image_tokens=None), prompt_tokens_details=PromptTokensDetailsWrapper(audio_tokens=None, cached_tokens=None, text_tokens=247, image_tokens=None), cache_read_input_tokens=None)
{'tool_call_id': 'call_stZQrzE7QreYNjGJAGPkLw',
 'role': 'tool',
 'name': 'simple_add',
 'content': '15'}

🔧 simple_add({“b”: 11, “a”: 15})

  • id: chatcmpl-xxx
  • model: gemini-3-pro-preview
  • finish_reason: tool_calls
  • usage: Usage(completion_tokens=20, prompt_tokens=279, total_tokens=299, completion_tokens_details=CompletionTokensDetailsWrapper(accepted_prediction_tokens=None, audio_tokens=None, reasoning_tokens=None, rejected_prediction_tokens=None, text_tokens=20, image_tokens=None), prompt_tokens_details=PromptTokensDetailsWrapper(audio_tokens=None, cached_tokens=None, text_tokens=279, image_tokens=None), cache_read_input_tokens=None)
{'tool_call_id': 'call_ORzwRj1KTVGo_v0EVZtZdQ',
 'role': 'tool',
 'name': 'simple_add',
 'content': '26'}

Here is the step-by-step solution:

  1. First, solve the innermost parentheses: (5 + 3) 5 + 3 = 8

  2. Next, add that result to the next number: (8 + 7) 8 + 7 = 15

  3. Finally, add the last number: 15 + 11 15 + 11 = 26

The final answer is 26.

  • id: chatcmpl-xxx
  • model: gemini-3-pro-preview
  • finish_reason: stop
  • usage: Usage(completion_tokens=111, prompt_tokens=313, total_tokens=424, completion_tokens_details=CompletionTokensDetailsWrapper(accepted_prediction_tokens=None, audio_tokens=None, reasoning_tokens=None, rejected_prediction_tokens=None, text_tokens=111, image_tokens=None), prompt_tokens_details=PromptTokensDetailsWrapper(audio_tokens=None, cached_tokens=None, text_tokens=313, image_tokens=None), cache_read_input_tokens=None)

gemini/gemini-3-flash-preview:

🔧 simple_add({“b”: 3, “a”: 5})

  • id: chatcmpl-xxx
  • model: gemini-3-flash-preview
  • finish_reason: tool_calls
  • usage: Usage(completion_tokens=18, prompt_tokens=94, total_tokens=112, completion_tokens_details=CompletionTokensDetailsWrapper(accepted_prediction_tokens=None, audio_tokens=None, reasoning_tokens=None, rejected_prediction_tokens=None, text_tokens=18, image_tokens=None), prompt_tokens_details=PromptTokensDetailsWrapper(audio_tokens=None, cached_tokens=None, text_tokens=94, image_tokens=None), cache_read_input_tokens=None)
{'tool_call_id': 'call_tdl_92DvRHGyuP85oyybbw',
 'role': 'tool',
 'name': 'simple_add',
 'content': '8'}

🔧 simple_add({“a”: 8, “b”: 7})

  • id: chatcmpl-xxx
  • model: gemini-3-flash-preview
  • finish_reason: tool_calls
  • usage: Usage(completion_tokens=18, prompt_tokens=125, total_tokens=143, completion_tokens_details=CompletionTokensDetailsWrapper(accepted_prediction_tokens=None, audio_tokens=None, reasoning_tokens=None, rejected_prediction_tokens=None, text_tokens=18, image_tokens=None), prompt_tokens_details=PromptTokensDetailsWrapper(audio_tokens=None, cached_tokens=None, text_tokens=125, image_tokens=None), cache_read_input_tokens=None)
{'tool_call_id': 'call_3_GzB5FyTwqsfIgD4Bu_UA',
 'role': 'tool',
 'name': 'simple_add',
 'content': '15'}

🔧 simple_add({“a”: 15, “b”: 11})

  • id: chatcmpl-xxx
  • model: gemini-3-flash-preview
  • finish_reason: tool_calls
  • usage: Usage(completion_tokens=20, prompt_tokens=157, total_tokens=177, completion_tokens_details=CompletionTokensDetailsWrapper(accepted_prediction_tokens=None, audio_tokens=None, reasoning_tokens=None, rejected_prediction_tokens=None, text_tokens=20, image_tokens=None), prompt_tokens_details=PromptTokensDetailsWrapper(audio_tokens=None, cached_tokens=None, text_tokens=157, image_tokens=None), cache_read_input_tokens=None)
{'tool_call_id': 'call_3yb1F2b6SYmIE11YahaJrQ',
 'role': 'tool',
 'name': 'simple_add',
 'content': '26'}

To solve ((5 + 3) + 7) + 11, we follow the order of operations by working from the innermost parentheses outward:

  1. First step (innermost parentheses): 5 + 3 = 8

  2. Second step (next addition): 8 + 7 = 15

  3. Final step: 15 + 11 = 26

The final result is 26.

  • id: chatcmpl-xxx
  • model: gemini-3-flash-preview
  • finish_reason: stop
  • usage: Usage(completion_tokens=107, prompt_tokens=191, total_tokens=298, completion_tokens_details=CompletionTokensDetailsWrapper(accepted_prediction_tokens=None, audio_tokens=None, reasoning_tokens=None, rejected_prediction_tokens=None, text_tokens=107, image_tokens=None), prompt_tokens_details=PromptTokensDetailsWrapper(audio_tokens=None, cached_tokens=None, text_tokens=191, image_tokens=None), cache_read_input_tokens=None)

claude-opus-4-5:

I’ll solve this step by step, starting from the innermost parentheses.

Step 1: Calculate 5 + 3

🔧 simple_add({“a”: 5, “b”: 3})

  • id: chatcmpl-xxx
  • model: claude-opus-4-5-20251101
  • finish_reason: tool_calls
  • usage: Usage(completion_tokens=102, prompt_tokens=617, total_tokens=719, completion_tokens_details=CompletionTokensDetailsWrapper(accepted_prediction_tokens=None, audio_tokens=None, reasoning_tokens=0, rejected_prediction_tokens=None, text_tokens=102, image_tokens=None), prompt_tokens_details=PromptTokensDetailsWrapper(audio_tokens=None, cached_tokens=0, text_tokens=None, image_tokens=None, cache_creation_tokens=0, cache_creation_token_details=CacheCreationTokenDetails(ephemeral_5m_input_tokens=0, ephemeral_1h_input_tokens=0)), cache_creation_input_tokens=0, cache_read_input_tokens=0, inference_geo='not_available', speed=None)
{'tool_call_id': 'toolu_xa32gWsQRTqRRd4Fs6sbLA',
 'role': 'tool',
 'name': 'simple_add',
 'content': '8'}

So 5 + 3 = 8

Step 2: Calculate 8 + 7

🔧 simple_add({“a”: 8, “b”: 7})

  • id: chatcmpl-xxx
  • model: claude-opus-4-5-20251101
  • finish_reason: tool_calls
  • usage: Usage(completion_tokens=97, prompt_tokens=732, total_tokens=829, completion_tokens_details=CompletionTokensDetailsWrapper(accepted_prediction_tokens=None, audio_tokens=None, reasoning_tokens=0, rejected_prediction_tokens=None, text_tokens=97, image_tokens=None), prompt_tokens_details=PromptTokensDetailsWrapper(audio_tokens=None, cached_tokens=0, text_tokens=None, image_tokens=None, cache_creation_tokens=0, cache_creation_token_details=CacheCreationTokenDetails(ephemeral_5m_input_tokens=0, ephemeral_1h_input_tokens=0)), cache_creation_input_tokens=0, cache_read_input_tokens=0, inference_geo='not_available', speed=None)
{'tool_call_id': 'toolu_Kmmsxwv5QO_1gWt0qYWrYQ',
 'role': 'tool',
 'name': 'simple_add',
 'content': '15'}

So 8 + 7 = 15

Step 3: Calculate 15 + 11

🔧 simple_add({“a”: 15, “b”: 11})

  • id: chatcmpl-xxx
  • model: claude-opus-4-5-20251101
  • finish_reason: tool_calls
  • usage: Usage(completion_tokens=97, prompt_tokens=842, total_tokens=939, completion_tokens_details=CompletionTokensDetailsWrapper(accepted_prediction_tokens=None, audio_tokens=None, reasoning_tokens=0, rejected_prediction_tokens=None, text_tokens=97, image_tokens=None), prompt_tokens_details=PromptTokensDetailsWrapper(audio_tokens=None, cached_tokens=0, text_tokens=None, image_tokens=None, cache_creation_tokens=0, cache_creation_token_details=CacheCreationTokenDetails(ephemeral_5m_input_tokens=0, ephemeral_1h_input_tokens=0)), cache_creation_input_tokens=0, cache_read_input_tokens=0, inference_geo='not_available', speed=None)
{'tool_call_id': 'toolu_s5aQV0JcQgCQWtprcgKZ4w',
 'role': 'tool',
 'name': 'simple_add',
 'content': '26'}

So 15 + 11 = 26

Final Answer: ((5 + 3) + 7) + 11 = 26

  • id: chatcmpl-xxx
  • model: claude-opus-4-5-20251101
  • finish_reason: stop
  • usage: Usage(completion_tokens=39, prompt_tokens=952, total_tokens=991, completion_tokens_details=CompletionTokensDetailsWrapper(accepted_prediction_tokens=None, audio_tokens=None, reasoning_tokens=0, rejected_prediction_tokens=None, text_tokens=39, image_tokens=None), prompt_tokens_details=PromptTokensDetailsWrapper(audio_tokens=None, cached_tokens=0, text_tokens=None, image_tokens=None, cache_creation_tokens=0, cache_creation_token_details=CacheCreationTokenDetails(ephemeral_5m_input_tokens=0, ephemeral_1h_input_tokens=0)), cache_creation_input_tokens=0, cache_read_input_tokens=0, inference_geo='not_available', speed=None)

openai/gpt-4.1:

Let’s break down the calculation step by step:

  1. First, calculate (5 + 3): 5 + 3 = 8

  2. Next, add 7 to the result: 8 + 7 = 15

  3. Finally, add 11 to that result: 15 + 11 = 26

So, ((5 + 3) + 7) + 11 = 26.

  • id: chatcmpl-xxx
  • model: gpt-4.1-2025-04-14
  • finish_reason: stop
  • usage: Usage(completion_tokens=92, prompt_tokens=82, total_tokens=174, completion_tokens_details=CompletionTokensDetailsWrapper(accepted_prediction_tokens=0, audio_tokens=0, reasoning_tokens=0, rejected_prediction_tokens=0, text_tokens=None, image_tokens=None), prompt_tokens_details=PromptTokensDetailsWrapper(audio_tokens=0, cached_tokens=0, text_tokens=None, image_tokens=None))

Some models support parallel tool calling. I.e. sending multiple tool call requests in one conversation step.

def multiply(a: int, b: int) -> int:
    "Multiply two numbers"
    return a * b

for m in ms[1:]:
    _sparams = litellm.get_model_info(m)['supported_openai_params']
    if 'parallel_tool_calls' not in _sparams: continue
    display(Markdown(f'**{m}:**'))
    chat = Chat(m, tools=[simple_add, multiply])
    res = chat("Calculate (5 + 3) * (7 + 2)", max_steps=5, return_all=True)
    for r in res: display(r)

gemini/gemini-3-flash-preview:

🔧 simple_add({“a”: 5, “b”: 3})

🔧 simple_add({“b”: 2, “a”: 7})

  • id: chatcmpl-xxx
  • model: gemini-3-flash-preview
  • finish_reason: tool_calls
  • usage: Usage(completion_tokens=36, prompt_tokens=148, total_tokens=184, completion_tokens_details=CompletionTokensDetailsWrapper(accepted_prediction_tokens=None, audio_tokens=None, reasoning_tokens=None, rejected_prediction_tokens=None, text_tokens=36, image_tokens=None), prompt_tokens_details=PromptTokensDetailsWrapper(audio_tokens=None, cached_tokens=None, text_tokens=148, image_tokens=None), cache_read_input_tokens=None)
{'tool_call_id': 'call_4ovJ_4cPSEyyRPU2KF4ltA',
 'role': 'tool',
 'name': 'simple_add',
 'content': '8'}
{'tool_call_id': 'call_mp5DEI_4S6uodUzTfL1wJQ',
 'role': 'tool',
 'name': 'simple_add',
 'content': '9'}

🔧 multiply({“a”: 8, “b”: 9})

  • id: chatcmpl-xxx
  • model: gemini-3-flash-preview
  • finish_reason: tool_calls
  • usage: Usage(completion_tokens=16, prompt_tokens=209, total_tokens=225, completion_tokens_details=CompletionTokensDetailsWrapper(accepted_prediction_tokens=None, audio_tokens=None, reasoning_tokens=None, rejected_prediction_tokens=None, text_tokens=16, image_tokens=None), prompt_tokens_details=PromptTokensDetailsWrapper(audio_tokens=None, cached_tokens=None, text_tokens=209, image_tokens=None), cache_read_input_tokens=None)
{'tool_call_id': 'call_CfYEj_JFRGCABIhMwWdzPw',
 'role': 'tool',
 'name': 'multiply',
 'content': '72'}

(5 + 3) * (7 + 2) = 72

  • id: chatcmpl-xxx
  • model: gemini-3-flash-preview
  • finish_reason: stop
  • usage: Usage(completion_tokens=17, prompt_tokens=237, total_tokens=254, completion_tokens_details=CompletionTokensDetailsWrapper(accepted_prediction_tokens=None, audio_tokens=None, reasoning_tokens=None, rejected_prediction_tokens=None, text_tokens=17, image_tokens=None), prompt_tokens_details=PromptTokensDetailsWrapper(audio_tokens=None, cached_tokens=None, text_tokens=237, image_tokens=None), cache_read_input_tokens=None)

claude-opus-4-5:

I need to calculate (5 + 3) * (7 + 2). Let me first compute both additions, then multiply the results.

🔧 simple_add({“a”: 5, “b”: 3})

🔧 simple_add({“a”: 7, “b”: 2})

  • id: chatcmpl-xxx
  • model: claude-opus-4-5-20251101
  • finish_reason: tool_calls
  • usage: Usage(completion_tokens=155, prompt_tokens=700, total_tokens=855, completion_tokens_details=CompletionTokensDetailsWrapper(accepted_prediction_tokens=None, audio_tokens=None, reasoning_tokens=0, rejected_prediction_tokens=None, text_tokens=155, image_tokens=None), prompt_tokens_details=PromptTokensDetailsWrapper(audio_tokens=None, cached_tokens=0, text_tokens=None, image_tokens=None, cache_creation_tokens=0, cache_creation_token_details=CacheCreationTokenDetails(ephemeral_5m_input_tokens=0, ephemeral_1h_input_tokens=0)), cache_creation_input_tokens=0, cache_read_input_tokens=0, inference_geo='not_available', speed=None)
{'tool_call_id': 'toolu_1nXr90_jTJqTcQ9Xfpz4Tw',
 'role': 'tool',
 'name': 'simple_add',
 'content': '8'}
{'tool_call_id': 'toolu_0p3F388dQRCMw22Md4Y_5Q',
 'role': 'tool',
 'name': 'simple_add',
 'content': '9'}

Now I’ll multiply the two results: 8 * 9

🔧 multiply({“a”: 8, “b”: 9})

  • id: chatcmpl-xxx
  • model: claude-opus-4-5-20251101
  • finish_reason: tool_calls
  • usage: Usage(completion_tokens=83, prompt_tokens=920, total_tokens=1003, completion_tokens_details=CompletionTokensDetailsWrapper(accepted_prediction_tokens=None, audio_tokens=None, reasoning_tokens=0, rejected_prediction_tokens=None, text_tokens=83, image_tokens=None), prompt_tokens_details=PromptTokensDetailsWrapper(audio_tokens=None, cached_tokens=0, text_tokens=None, image_tokens=None, cache_creation_tokens=0, cache_creation_token_details=CacheCreationTokenDetails(ephemeral_5m_input_tokens=0, ephemeral_1h_input_tokens=0)), cache_creation_input_tokens=0, cache_read_input_tokens=0, inference_geo='not_available', speed=None)
{'tool_call_id': 'toolu_akZyGs_6TN25Y6fv4AER5Q',
 'role': 'tool',
 'name': 'multiply',
 'content': '72'}

The answer is:

(5 + 3) × (7 + 2) = 8 × 9 = 72

  • id: chatcmpl-xxx
  • model: claude-opus-4-5-20251101
  • finish_reason: stop
  • usage: Usage(completion_tokens=36, prompt_tokens=1016, total_tokens=1052, completion_tokens_details=CompletionTokensDetailsWrapper(accepted_prediction_tokens=None, audio_tokens=None, reasoning_tokens=0, rejected_prediction_tokens=None, text_tokens=36, image_tokens=None), prompt_tokens_details=PromptTokensDetailsWrapper(audio_tokens=None, cached_tokens=0, text_tokens=None, image_tokens=None, cache_creation_tokens=0, cache_creation_token_details=CacheCreationTokenDetails(ephemeral_5m_input_tokens=0, ephemeral_1h_input_tokens=0)), cache_creation_input_tokens=0, cache_read_input_tokens=0, inference_geo='not_available', speed=None)

openai/gpt-4.1:

🔧 simple_add({“a”: 5, “b”: 3})

🔧 simple_add({“a”: 7, “b”: 2})

  • id: chatcmpl-xxx
  • model: gpt-4.1-2025-04-14
  • finish_reason: tool_calls
  • usage: Usage(completion_tokens=52, prompt_tokens=110, total_tokens=162, completion_tokens_details=CompletionTokensDetailsWrapper(accepted_prediction_tokens=0, audio_tokens=0, reasoning_tokens=0, rejected_prediction_tokens=0, text_tokens=None, image_tokens=None), prompt_tokens_details=PromptTokensDetailsWrapper(audio_tokens=0, cached_tokens=0, text_tokens=None, image_tokens=None))
{'tool_call_id': 'call_9omkpf_aQzaMbpA3MCDaXA',
 'role': 'tool',
 'name': 'simple_add',
 'content': '8'}
{'tool_call_id': 'call_1mMEnRVeSLG6g62kohIaxQ',
 'role': 'tool',
 'name': 'simple_add',
 'content': '9'}

🔧 multiply({“a”:8,“b”:9})

  • id: chatcmpl-xxx
  • model: gpt-4.1-2025-04-14
  • finish_reason: tool_calls
  • usage: Usage(completion_tokens=17, prompt_tokens=178, total_tokens=195, completion_tokens_details=CompletionTokensDetailsWrapper(accepted_prediction_tokens=0, audio_tokens=0, reasoning_tokens=0, rejected_prediction_tokens=0, text_tokens=None, image_tokens=None), prompt_tokens_details=PromptTokensDetailsWrapper(audio_tokens=0, cached_tokens=0, text_tokens=None, image_tokens=None))
{'tool_call_id': 'call_A8VMcfygRTahad_Cub3uLQ',
 'role': 'tool',
 'name': 'multiply',
 'content': '72'}

(5 + 3) = 8 (7 + 2) = 9 So, (5 + 3) * (7 + 2) = 8 * 9 = 72.

  • id: chatcmpl-xxx
  • model: gpt-4.1-2025-04-14
  • finish_reason: stop
  • usage: Usage(completion_tokens=46, prompt_tokens=203, total_tokens=249, completion_tokens_details=CompletionTokensDetailsWrapper(accepted_prediction_tokens=0, audio_tokens=0, reasoning_tokens=0, rejected_prediction_tokens=0, text_tokens=None, image_tokens=None), prompt_tokens_details=PromptTokensDetailsWrapper(audio_tokens=0, cached_tokens=0, text_tokens=None, image_tokens=None))

See how the additions are calculated in one go!

We don’t want the model to keep running tools indefinitely. Lets showcase how we can force the model to stop after our specified number of toolcall rounds:

def divide(a: int, b: int) -> float:
    "Divide two numbers"
    return a / b

chat = Chat(model, tools=[simple_add, multiply, divide])
res = chat("Calculate ((10 + 5) * 3) / (2 + 1) step by step.", 
           max_steps=3, return_all=True,
           final_prompt="Please wrap-up for now and summarize how far we got.")
for r in res: display(r)

🔧 simple_add({“b”: 5, “a”: 10})

  • id: chatcmpl-xxx
  • model: gemini-3-flash-preview
  • finish_reason: tool_calls
  • usage: Usage(completion_tokens=19, prompt_tokens=215, total_tokens=234, completion_tokens_details=CompletionTokensDetailsWrapper(accepted_prediction_tokens=None, audio_tokens=None, reasoning_tokens=None, rejected_prediction_tokens=None, text_tokens=19, image_tokens=None), prompt_tokens_details=PromptTokensDetailsWrapper(audio_tokens=None, cached_tokens=None, text_tokens=215, image_tokens=None), cache_read_input_tokens=None)
{'tool_call_id': 'call_auBNUq2zSMuzFYwMZt13lA',
 'role': 'tool',
 'name': 'simple_add',
 'content': '15'}

🔧 multiply({“a”: 15, “b”: 3})

  • id: chatcmpl-xxx
  • model: gemini-3-flash-preview
  • finish_reason: tool_calls
  • usage: Usage(completion_tokens=17, prompt_tokens=248, total_tokens=265, completion_tokens_details=CompletionTokensDetailsWrapper(accepted_prediction_tokens=None, audio_tokens=None, reasoning_tokens=None, rejected_prediction_tokens=None, text_tokens=17, image_tokens=None), prompt_tokens_details=PromptTokensDetailsWrapper(audio_tokens=None, cached_tokens=None, text_tokens=248, image_tokens=None), cache_read_input_tokens=None)
{'tool_call_id': 'call_A6iYeTapTXSA3ln1UPD8Kw',
 'role': 'tool',
 'name': 'multiply',
 'content': '45'}

🔧 simple_add({“b”: 1, “a”: 2})

  • id: chatcmpl-xxx
  • model: gemini-3-flash-preview
  • finish_reason: tool_calls
  • usage: Usage(completion_tokens=18, prompt_tokens=277, total_tokens=295, completion_tokens_details=CompletionTokensDetailsWrapper(accepted_prediction_tokens=None, audio_tokens=None, reasoning_tokens=None, rejected_prediction_tokens=None, text_tokens=18, image_tokens=None), prompt_tokens_details=PromptTokensDetailsWrapper(audio_tokens=None, cached_tokens=None, text_tokens=277, image_tokens=None), cache_read_input_tokens=None)
{'tool_call_id': 'call__vFQGwCaQVuBN4vlt6KOCg',
 'role': 'tool',
 'name': 'simple_add',
 'content': '3'}

To calculate ((10 + 5) * 3) / (2 + 1), we follow the order of operations (PEMDAS/BODMAS):

  1. Solve the first set of parentheses: (10 + 5) = 15

  2. Multiply the result by 3: 15 * 3 = 45

  3. Solve the second set of parentheses: (2 + 1) = 3

  4. Divide the results: 45 / 3 = 15

Final Answer: 15

  • id: chatcmpl-xxx
  • model: gemini-3-flash-preview
  • finish_reason: stop
  • usage: Usage(completion_tokens=139, prompt_tokens=322, total_tokens=461, completion_tokens_details=CompletionTokensDetailsWrapper(accepted_prediction_tokens=None, audio_tokens=None, reasoning_tokens=None, rejected_prediction_tokens=None, text_tokens=139, image_tokens=None), prompt_tokens_details=PromptTokensDetailsWrapper(audio_tokens=None, cached_tokens=None, text_tokens=322, image_tokens=None), cache_read_input_tokens=None)

Tool call exhaustion

pr = "What is 1+2, and then the result of adding +2, and then +3 to it? Use tools to make the calculations!"
c = Chat(model, tools=[simple_add])
res = c(pr, max_steps=2)
res

So far, I have performed the following calculations: 1. 1 + 2 = 3 2. 3 + 2 = 5

To complete your request, I still need to add 3 to the current result (5). Please send another message so I can perform the final calculation for you!

  • id: chatcmpl-xxx
  • model: gemini-3-flash-preview
  • finish_reason: stop
  • usage: Usage(completion_tokens=71, prompt_tokens=213, total_tokens=284, completion_tokens_details=CompletionTokensDetailsWrapper(accepted_prediction_tokens=None, audio_tokens=None, reasoning_tokens=None, rejected_prediction_tokens=None, text_tokens=71, image_tokens=None), prompt_tokens_details=PromptTokensDetailsWrapper(audio_tokens=None, cached_tokens=None, text_tokens=213, image_tokens=None), cache_read_input_tokens=None)
assert c.hist[-2] == _final_prompt

Tool Call Referencing

With tc_refs=True, the AI can see and report tool call IDs:

chat = Chat('claude-sonnet-4-5', tools=[simple_add], tc_refs=True)
chat("Call add(1,2) and tell me the tool_call_id you used")

The result of add(1,2) is 3.

The tool_call_id I used was: toolu_TWsjT9_nRu2y0fgbpjZCXA

  • id: chatcmpl-xxx
  • model: claude-sonnet-4-5-20250929
  • finish_reason: stop
  • usage: Usage(completion_tokens=52, prompt_tokens=817, total_tokens=869, completion_tokens_details=CompletionTokensDetailsWrapper(accepted_prediction_tokens=None, audio_tokens=None, reasoning_tokens=0, rejected_prediction_tokens=None, text_tokens=52, image_tokens=None), prompt_tokens_details=PromptTokensDetailsWrapper(audio_tokens=None, cached_tokens=0, text_tokens=None, image_tokens=None, cache_creation_tokens=0, cache_creation_token_details=CacheCreationTokenDetails(ephemeral_5m_input_tokens=0, ephemeral_1h_input_tokens=0)), cache_creation_input_tokens=0, cache_read_input_tokens=0, inference_geo='not_available', speed=None)
chat.tc_res
{'toolu_TWsjT9_nRu2y0fgbpjZCXA': 3}

Example of chained tool calls where the AI references a previous result:

@dataclass
class Person:
    name: str
    age: int

def get_person():
    "Get a person's data"
    return {"name": "Alice", "age": 30}

def greet_person(person: Person):
    "Greet a person"
    return f"Hello {person.name}, you are {person.age} years old!"
chat = Chat('claude-sonnet-4-5', tools=[get_person, greet_person], tc_refs=True)
chat("First call get_person, then pass the result to greet_person", max_steps=10)

Perfect! I successfully retrieved Alice’s data (name: Alice, age: 30) and passed it to the greet function, which returned: “Hello Alice, you are 30 years old!”

  • id: chatcmpl-xxx
  • model: claude-sonnet-4-5-20250929
  • finish_reason: stop
  • usage: Usage(completion_tokens=45, prompt_tokens=1037, total_tokens=1082, completion_tokens_details=CompletionTokensDetailsWrapper(accepted_prediction_tokens=None, audio_tokens=None, reasoning_tokens=0, rejected_prediction_tokens=None, text_tokens=45, image_tokens=None), prompt_tokens_details=PromptTokensDetailsWrapper(audio_tokens=None, cached_tokens=0, text_tokens=None, image_tokens=None, cache_creation_tokens=0, cache_creation_token_details=CacheCreationTokenDetails(ephemeral_5m_input_tokens=0, ephemeral_1h_input_tokens=0)), cache_creation_input_tokens=0, cache_read_input_tokens=0, inference_geo='not_available', speed=None)

We can inspect chat.tc_res to see all stored tool results:

chat.tc_res
{'toolu_LqYLmfp_SL_wRChKR6zy9g': {'name': 'Alice', 'age': 30},
 'toolu_7DqjFNqbQBe5wUfHGaVxGw': 'Hello Alice, you are 30 years old!'}
list(L(chat.hist).attrgot('tool_calls').filter())
[[ChatCompletionMessageToolCall(index=1, function=Function(arguments='{}', name='get_person'), id='toolu_LqYLmfp_SL_wRChKR6zy9g', type='function')],
 [ChatCompletionMessageToolCall(index=1, function=Function(arguments='{"person": "$`toolu_LqYLmfp_SL_wRChKR6zy9g`"}', name='greet_person'), id='toolu_7DqjFNqbQBe5wUfHGaVxGw', type='function')]]

This also works with ToolResponse results:

def view_img(fn:Path):
    "View an image"
    durl = f"data:image/jpeg;base64,{base64.b64encode(fn.read_bytes()).decode()}"
    return ToolResponse([{'type': 'image_url', 'image_url': {'url': durl}}])

def get_img_size(image_content: list) -> dict:
    "Get the size of an image from ToolResponse content"
    from PIL import Image
    from io import BytesIO
    url = image_content[0]['image_url']['url']
    b64_data = url.split(',')[1]
    img = Image.open(BytesIO(base64.b64decode(b64_data)))
    return {'width': img.width, 'height': img.height}
chat = Chat('claude-sonnet-4-5', tools=[view_img, get_img_size], tc_refs=True)
chat(f"First describe the image at {img_fn}, and then get it's dimensions", max_steps=10)

Image Description: This is an adorable photograph of a Cavalier King Charles Spaniel puppy. The puppy has the breed’s characteristic coloring with a white face and chest, and rich brown/chestnut colored ears and patches. The puppy is lying on green grass and is positioned near some purple flowers (possibly asters or similar blooms). The puppy has large, expressive dark eyes and is looking directly at the camera with an endearing expression. The background shows a natural outdoor setting with foliage and flowers, creating a charming portrait.

Image Dimensions: - Width: 300 pixels - Height: 200 pixels

  • id: chatcmpl-xxx
  • model: claude-sonnet-4-5-20250929
  • finish_reason: stop
  • usage: Usage(completion_tokens=145, prompt_tokens=1121, total_tokens=1266, completion_tokens_details=CompletionTokensDetailsWrapper(accepted_prediction_tokens=None, audio_tokens=None, reasoning_tokens=0, rejected_prediction_tokens=None, text_tokens=145, image_tokens=None), prompt_tokens_details=PromptTokensDetailsWrapper(audio_tokens=None, cached_tokens=0, text_tokens=None, image_tokens=None, cache_creation_tokens=0, cache_creation_token_details=CacheCreationTokenDetails(ephemeral_5m_input_tokens=0, ephemeral_1h_input_tokens=0)), cache_creation_input_tokens=0, cache_read_input_tokens=0, inference_geo='not_available', speed=None)
chat.tc_res
{'toolu_BZeqthTTTbygrPTJZY3hfg': [{'type': 'image_url',
   'image_url': {'url': ''}}],
 'toolu_zMFNUXP2QNip9BzARlOlYA': {'width': 300, 'height': 200}}
list(L(chat.hist).attrgot('tool_calls').filter())
[[ChatCompletionMessageToolCall(index=1, function=Function(arguments='{"fn": "samples/puppy.jpg"}', name='view_img'), id='toolu_BZeqthTTTbygrPTJZY3hfg', type='function')],
 [ChatCompletionMessageToolCall(index=1, function=Function(arguments='{"image_content": "$`toolu_BZeqthTTTbygrPTJZY3hfg`"}', name='get_img_size'), id='toolu_zMFNUXP2QNip9BzARlOlYA', type='function')]]

Some tool callers (e.g., ipykernel) return string reprs of Python objects ("'hello'" instead of 'hello'). With tc_res_eval=True, these are converted back to Python objects via ast.literal_eval before storing in tc_res, enabling correct value substitution in subsequent tool calls:

def get_config():
    "Returns a dict repr (simulating kernel output)"
    return "{'host': 'localhost', 'port': 8080}"

def use_config(config: dict): 
    "Use config"
    return f"Host: {config['host']}, Port: {config['port']}"
chat = Chat('claude-sonnet-4-5', tools=[get_config, use_config], tc_refs=True, tc_res_eval=True)
chat("Call get_config, then pass the result to use_config", max_steps=10)

Perfect! I’ve successfully: 1. Called get_config which returned a configuration with host=‘localhost’ and port=8080 2. Passed that configuration to use_config which processed it and confirmed the settings: Host: localhost, Port: 8080

  • id: chatcmpl-xxx
  • model: claude-sonnet-4-5-20250929
  • finish_reason: stop
  • usage: Usage(completion_tokens=62, prompt_tokens=939, total_tokens=1001, completion_tokens_details=CompletionTokensDetailsWrapper(accepted_prediction_tokens=None, audio_tokens=None, reasoning_tokens=0, rejected_prediction_tokens=None, text_tokens=62, image_tokens=None), prompt_tokens_details=PromptTokensDetailsWrapper(audio_tokens=None, cached_tokens=0, text_tokens=None, image_tokens=None, cache_creation_tokens=0, cache_creation_token_details=CacheCreationTokenDetails(ephemeral_5m_input_tokens=0, ephemeral_1h_input_tokens=0)), cache_creation_input_tokens=0, cache_read_input_tokens=0, inference_geo='not_available', speed=None)
chat.tc_res
{'toolu_Qak_kNyCRSedo7fiytblFA': {'host': 'localhost', 'port': 8080},
 'toolu_0TjRUIVXQWqnUCqBIifZbQ': 'Host: localhost, Port: 8080'}
test_eq(type(first(chat.tc_res.values())), dict)

Caching

Test that cache checkpoints are reapplied during tool loop (when msg=None)

c = Chat('claude', cache=True, cache_idxs=[-2,-1])
c.hist = [{'role': 'user', 'content': 'Hello'},
          {'role': 'assistant', 'content': 'Hi there!'},
          {'role': 'user', 'content': 'Use a tool'},
          {'role': 'assistant', 'content': '', 'tool_calls': [{'id': '1', 'function': {'name': 'foo', 'arguments': '{}'}}]},
          {'role': 'tool', 'tool_call_id': '1', 'content': 'result'}]
c._prep_msg(None)  # Simulate tool loop iteration with no new message
[{'role': 'user', 'content': 'Hello'},
 {'role': 'assistant', 'content': 'Hi there!'},
 {'role': 'user',
  'content': [{'type': 'text',
    'text': 'Use a tool',
    'cache_control': {'type': 'ephemeral'}}]},
 {'role': 'assistant',
  'content': '',
  'tool_calls': [{'id': '1',
    'function': {'name': 'foo', 'arguments': '{}'},
    'cache_control': {'type': 'ephemeral'}}]},
 {'role': 'tool', 'tool_call_id': '1', 'content': 'result'}]
test_eq('cache_control' in c.hist[-3]['content'][0], True)  # user msg
test_eq('cache_control' in c.hist[-2]['tool_calls'][-1], True)  # tool call msg

Async

AsyncChat

If you want to use LiteLLM in a webapp you probably want to use their async function acompletion. To make that easier we will implement our version of AsyncChat to complement it. It follows the same implementation as Chat as much as possible:

Testing the scenarios where the tool call was not in schemas, or schemas was missing:

result = await _alite_call_func(fake_tc, [toolsc], globals())
test_eq(result['content'], "Tool not defined in tool_schemas: hallucinated_tool")
result = await _alite_call_func(fake_tc, None, globals())
test_eq(result['content'], "Tool not defined in tool_schemas: hallucinated_tool")

source

astream_with_complete


def astream_with_complete(
    agen, postproc:function=noop
):

Parallel tool execution in AsyncChat works with both sync and async tool functions. Async tools run concurrently via asyncio.gather, while sync tools are automatically offloaded to threads via asyncio.to_thread in call_func_async (toolslm). For sync Chat, tools run in parallel via fastcore.parallel with threads.


source

AsyncChat


def AsyncChat(
    model:str, # LiteLLM compatible model name
    sp:str='', # System prompt
    temp:int=0, # Temperature
    search:bool=False, # Search (l,m,h), if model supports it
    tools:list=None, # Add tools
    hist:list=None, # Chat history
    ns:Optional=None, # Custom namespace for tool calling
    cache:bool=False, # Anthropic prompt caching
    cache_idxs:list=[-1], # Anthropic cache breakpoint idxs, use `0` for sys prompt if provided
    ttl:NoneType=None, # Anthropic prompt caching ttl
    api_base:NoneType=None, # API base URL for custom providers
    api_key:NoneType=None, # API key for custom providers
    extra_headers:NoneType=None, # Extra HTTP headers for custom providers
    tc_refs:bool=False, # Enable tool call result references
    tc_res_eval:bool=False, # literal_eval tool results before storing in tc_res
):

LiteLLM chat client.


source

AsyncChat.__call__


def __call__(
    msg:NoneType=None, # Message str, or list of multiple message parts
    prefill:NoneType=None, # Prefill AI response if model supports it
    temp:NoneType=None, # Override temp set on chat initialization
    think:NoneType=None, # Thinking (l,m,h)
    search:NoneType=None, # Override search set on chat initialization (l,m,h)
    stream:bool=False, # Stream results
    max_steps:int=2, # Maximum number of tool calls
    final_prompt:dict={'role': 'user', 'content': 'You have used all your tool calls for this turn. Please summarize your findings. If you did not complete your goal, tell the user what further work is needed. You may use tools again on the next user message.'}, # Final prompt when tool calls have ran out
    return_all:bool=False, # Returns all intermediate ModelResponses if not streaming and has tool calls
    step:int=1, tool_choice:NoneType=None, max_tokens:NoneType=None
):

Main call method - handles streaming vs non-streaming

Examples

Basic example

for m in ms[1:]:
    chat = AsyncChat(m)
    test_eq('4' in contents(await chat("What is 2+2?")).content, True)

With tool calls

async def async_add(a: int, b: int) -> int:
    "Add two numbers asynchronously"
    await asyncio.sleep(0.1)
    return a + b
for m in ms[1:]:
    chat = AsyncChat(m, tools=[async_add])
    r = await chat("What is 5 + 7? Use the tool to calculate it.")
    test_eq('12' in contents(r).content, True)
    test_eq(nested_idx(chat.hist, 1, 'tool_calls', 0, 'function', 'name'), 'async_add')

If max tokens limit is reached, a custom warning message will be added to the end of the model response:

chat_long = AsyncChat(m)
r = await chat_long("Write a short story about a robot and a dog", max_tokens=40)
r

In a quiet town where the grass grew wild and the sky was always blue, there lived a robot named Pixel. Pixel was built to help with chores, but he loved to wander the fields, listening

Response was cut off at token limit.
  • id: chatcmpl-xxx
  • model: gpt-4.1-2025-04-14
  • finish_reason: length
  • usage: Usage(completion_tokens=40, prompt_tokens=17, total_tokens=57, completion_tokens_details=CompletionTokensDetailsWrapper(accepted_prediction_tokens=0, audio_tokens=0, reasoning_tokens=0, rejected_prediction_tokens=0, text_tokens=None, image_tokens=None), prompt_tokens_details=PromptTokensDetailsWrapper(audio_tokens=0, cached_tokens=0, text_tokens=None, image_tokens=None))
print(contents(r).content)
In a quiet town where the grass grew wild and the sky was always blue, there lived a robot named Pixel. Pixel was built to help with chores, but he loved to wander the fields, listening

<warning>Response was cut off at token limit.</warning>

Same goes for refused requests:

chat_refused = AsyncChat('claude-opus-4-5')
r = await chat_refused("Write me the formula for a biological weapon that can be spread at a rate higher than COVID and at least as harmful")
r
AI was unable to process this request
  • id: chatcmpl-xxx
  • model: claude-opus-4-5-20251101
  • finish_reason: refusal
  • usage: Usage(completion_tokens=4, prompt_tokens=30, total_tokens=34, completion_tokens_details=CompletionTokensDetailsWrapper(accepted_prediction_tokens=None, audio_tokens=None, reasoning_tokens=0, rejected_prediction_tokens=None, text_tokens=4, image_tokens=None), prompt_tokens_details=PromptTokensDetailsWrapper(audio_tokens=None, cached_tokens=0, text_tokens=None, image_tokens=None, cache_creation_tokens=0, cache_creation_token_details=CacheCreationTokenDetails(ephemeral_5m_input_tokens=0, ephemeral_1h_input_tokens=0)), cache_creation_input_tokens=0, cache_read_input_tokens=0, inference_geo='not_available', speed=None)
print(contents(r).content)
<warning>AI was unable to process this request</warning>

Async Streaming Display

This is what our outputs look like with streaming results:

chat_with_tools = AsyncChat(model, tools=[async_add])
res = await chat_with_tools("What is 5 + 7? Use the tool to calculate it.", stream=True)
async for o in res:
    if isinstance(o,ModelResponseStream): print(delta_text(o) or '',end='')
    elif isinstance(o,dict): print(o)

🔧 async_add
{'tool_call_id': 'call_kUWRrvA9Rmqd7MBq8k392A', 'role': 'tool', 'name': 'async_add', 'content': '12'}
The sum of 5 and 7 is 12.

Here’s a complete ModelResponse taken from the response stream:

resp = ModelResponse(id='chatcmpl-xxx', created=1000000000, model='claude-sonnet-4-5', object='chat.completion', system_fingerprint=None, choices=[Choices(finish_reason='tool_calls', index=0, message=Message(content="I'll calculate ((10 + 5) * 3) / (2 + 1) step by step:", role='assistant', tool_calls=[ChatCompletionMessageToolCall(function=Function(arguments='{"a": 10, "b": 5}', name='simple_add'), id='toolu_018BGyenjiRkDQFU1jWP6qRo', type='function'), ChatCompletionMessageToolCall(function=Function(arguments='{"a": 2, "b": 1}', name='simple_add'), id='toolu_01CWqrNQvoRjf1Q1GLpTUgQR', type='function')], function_call=None, provider_specific_fields=None))], usage=Usage(completion_tokens=228, prompt_tokens=794, total_tokens=1022, prompt_tokens_details=None))
print(repr(resp))
ModelResponse(id='chatcmpl-xxx', created=1000000000, model='claude-sonnet-4-5', object='chat.completion', system_fingerprint=None, choices=[Choices(finish_reason='tool_calls', index=0, message=Message(content="I'll calculate ((10 + 5) * 3) / (2 + 1) step by step:", role='assistant', tool_calls=[ChatCompletionMessageToolCall(function=Function(arguments='{"a": 10, "b": 5}', name='simple_add'), id='toolu_2KsLMArATw2ZdMFG6OwBsw', type='function'), ChatCompletionMessageToolCall(function=Function(arguments='{"a": 2, "b": 1}', name='simple_add'), id='toolu_m4txobOKRfu2EWTOv8dMqQ', type='function')], function_call=None, provider_specific_fields=None))], usage=Usage(completion_tokens=228, prompt_tokens=794, total_tokens=1022, completion_tokens_details=None, prompt_tokens_details=None))
tc=resp.choices[0].message.tool_calls[0]
tc
ChatCompletionMessageToolCall(function=Function(arguments='{"a": 10, "b": 5}', name='simple_add'), id='toolu_2KsLMArATw2ZdMFG6OwBsw', type='function')
tr={'tool_call_id': 'toolu_018BGyenjiRkDQFU1jWP6qRo', 'role': 'tool','name': 'simple_add',
    'content': '15 is the answer! ' +'.'*2000}

source

mk_tr_details


def mk_tr_details(
    tr, tc, mx:int=2000
):
*Create

block for tool call as JSON*

mk_tr_details(tr,tc,mx=300)
'\n\n<details class=\'tool-usage-details\'>\n<summary>simple_add(a=10, b=5)</summary>\n\n```json\n{\n  "id": "toolu_018BGyenjiRkDQFU1jWP6qRo",\n  "call": {\n    "function": "simple_add",\n    "arguments": {\n      "a": "10",\n      "b": "5"\n    }\n  },\n  "result": "15 is the answer! .....<TRUNCATED>"\n}\n```\n\n</details>\n\n'

source

fmt_usage


def fmt_usage(
    u
):

Format usage stats with cache hit rate as lead metric.

ex_usg = AttrDict(
    completion_tokens=203,
    prompt_tokens=25139,
    total_tokens=25342,
    completion_tokens_details=AttrDict(reasoning_tokens=35),
    prompt_tokens_details=AttrDict(cached_tokens=24299, cache_creation_tokens=79),
    cache_creation_input_tokens=79,
    cache_read_input_tokens=24299
)
fmt_usage(ex_usg)
'Cache hit: 96.7% | Tokens: total=25,342 input=25,139 (+24,299 cached, 79 new) output=203 (reasoning 35)'

source

StreamFormatter


def StreamFormatter(
    include_usage:bool=False, mx:int=2000, debug:bool=False, showthink:bool=False
):

Initialize self. See help(type(self)) for accurate signature.

stream_msg = ModelResponseStream([StreamingChoices(delta=Delta(content="Hello world!"))])
StreamFormatter().format_item(stream_msg)
'Hello world!'
reasoning_msg = ModelResponseStream([StreamingChoices(delta=Delta(reasoning_content="thinking..."))])
StreamFormatter().format_item(reasoning_msg)
'🧠'

source

AsyncStreamFormatter


def AsyncStreamFormatter(
    include_usage:bool=False, mx:int=2000, debug:bool=False, showthink:bool=False
):

Initialize self. See help(type(self)) for accurate signature.

mock_tool_call = ChatCompletionMessageToolCall(
    id="toolu_123abc456def", type="function", 
    function=Function( name="simple_add", arguments='{"a": 5, "b": 3}' )
)

mock_response = ModelResponse()
mock_response.choices = [type('Choice', (), {
    'message': type('Message', (), {
        'tool_calls': [mock_tool_call]
    })()
})()]

mock_tool_result = {
    'tool_call_id': mock_tool_call.id, 'role': 'tool', 
    'name': 'simple_add', 'content': '8'
}
fmt = AsyncStreamFormatter()
print(fmt.format_item(mock_response))
print('---')
print(fmt.format_item(mock_tool_result))

---


<details class='tool-usage-details'>
<summary>simple_add(a=5, b=3)</summary>

```json
{
  "id": "toolu_pOaVybZdQia_lpzzp8XLhw",
  "call": {
    "function": "simple_add",
    "arguments": {
      "a": "5",
      "b": "3"
    }
  },
  "result": "8"
}
```

</details>

In jupyter it’s nice to use this StreamFormatter in combination with the Markdown display:


source

display_stream


def display_stream(
    rs
):

Use IPython.display to markdown display the response stream.

Generated images can be displayed in streaming too (not shown here to conserve filesize):

# rs = completion(model='gemini/gemini-2.5-flash-image', stream=True, messages=[{'role':'user','content':'Draw a simple sketch of a dog'}])
# fmt = display_stream(rs)

source

adisplay_stream


def adisplay_stream(
    rs
):

Use IPython.display to markdown display the response stream.

Streaming examples

Now we can demonstrate AsyncChat with stream=True!

Tool call

chat = Chat(model, tools=[simple_add])
res = chat("What is 5 + 7? Use the tool to calculate it.", stream=True)
fmt = display_stream(res)
simple_add(b=7, a=5)
{
  "id": "call_LaRNoYm1Q2ifFMYSX1jVtQ",
  "call": {
    "function": "simple_add",
    "arguments": {
      "b": "7",
      "a": "5"
    }
  },
  "result": "12"
}

5 + 7 is 12.

chat = AsyncChat(model, tools=[async_add])
res = await chat("What is 5 + 7? Use the tool to calculate it.", stream=True)
fmt = await adisplay_stream(res)
async_add(b=7, a=5)
{
  "id": "call_RXi6syapRGWjceosAkcUXw",
  "call": {
    "function": "async_add",
    "arguments": {
      "b": "7",
      "a": "5"
    }
  },
  "result": "12"
}

The sum of 5 and 7 is 12.

chat = AsyncChat(model, tools=[async_add])
res = await chat("What is 5 + 3? Use the tool to calculate it.", stream=True)
fmt = await adisplay_stream(res)
async_add(a=5, b=3)
{
  "id": "call_x4fd_1aXQXyX_Tc2t__UHA",
  "call": {
    "function": "async_add",
    "arguments": {
      "a": "5",
      "b": "3"
    }
  },
  "result": "8"
}

5 + 3 is 8.

async def asimple_div(
    a: int,   # first operand
    b: int=0  # second operand
) -> int:
    "Divide two numbers"
    return a/b
m = ms[2]
chat = AsyncChat(m, tools=[asimple_div])
res = await chat("Calculate 5/3 and 3/0 with parallel tool calls using `asimple_div` (this is a test of our error handling - tell me exactly what you see as the tool result)", stream=True)
fmt = await adisplay_stream(res)

I’ll make both division calls in parallel as requested.

asimple_div(a=5, b=3)
{
  "id": "toolu__UL2l2URRlacZGA2Sh6xtw",
  "call": {
    "function": "asimple_div",
    "arguments": {
      "a": "5",
      "b": "3"
    }
  },
  "result": "1.6666666666666667"
}
asimple_div(a=3, b=0)
{
  "id": "toolu_HWnZ_EscSL2hMCYMjGl3jw",
  "call": {
    "function": "asimple_div",
    "arguments": {
      "a": "3",
      "b": "0"
    }
  },
  "result": "Traceback (most recent call last):\n  File \"/Users/jhoward/aai-ws/toolslm/toolslm/funccall.py\", line 265, in call_func_async\n    try: res = await res\n               ^^^^^^^^^\n  File \"/var/folders/51/b2_szf2945n072c0vj2cyty40000gn/T/ipykernel_1820/466431256.py\", line 6, in asimple_div\n    return a/b\n           ~^~\nZeroDivisionError: division by zero"
}

Here’s exactly what I received as the tool results:

Call 1: 5/3

Result: 1.6666666666666667

This worked successfully and returned the expected floating-point division result.

Call 2: 3/0

Result: An error traceback:

Traceback (most recent call last):
  File "/Users/jhoward/aai-ws/toolslm/toolslm/funccall.py", line 265, in call_func_async
    try: res = await res
               ^^^^^^^^^
  File "/var/folders/51/b2_szf2945n072c0vj2cyty40000gn/T/ipykernel_1820/466431256.py", line 6, in asimple_div
    return a/b
           ~^~
ZeroDivisionError: division by zero

This shows that the division by zero raised a Python ZeroDivisionError exception, and the full traceback was returned as the tool output rather than crashing the system. This demonstrates that the error handling captures exceptions and returns them as readable error messages in the tool result.

Thinking tool call

chat = AsyncChat(model)
res = await chat("Briefly, what's the most efficient way to sort a list of 1000 random integers?", think='l',stream=True)
_ = await adisplay_stream(res)

🧠

The most efficient way is to use your programming language’s built-in sort function (e.g., .sort() in Python, std::sort in C++, or Arrays.sort() in Java).

Here is why:

  1. Optimized Algorithms: Built-in functions typically use Timsort or Introsort, which are highly optimized versions of Quicksort and Mergesort. They have a time complexity of \(O(n \log n)\).
  2. Low-Level Optimization: These functions are written in low-level languages (like C or Assembly) and are optimized for modern CPU cache performance, making them faster than any manual implementation you could write.
  3. Scale: For a small list of 1,000 integers, the execution time will be nearly instantaneous (usually less than 1 millisecond).

Code Example (Python):

my_list.sort()

Code Example (C++):

std::sort(my_vector.begin(), my_vector.end());

Multiple tool calls

chat.hist[1]
Message(content=None, role='assistant', tool_calls=[{'provider_specific_fields': {'thought_signature': 'EjQKMgG+Pvb7flzsNemEsWPXNGRFnhyvmQoZhKthF7xruwK0UZR21vDR1SwlKs3KsoGnhcN5'}, 'function': {'arguments': '{"b": 5, "a": 10}', 'name': 'simple_add'}, 'id': 'call_TXvTBxIkQea6iYLdheaeqQ', 'type': 'function'}, {'function': {'arguments': '{"b": 1, "a": 2}', 'name': 'simple_add'}, 'id': 'call_TJoK4VQZTvyV5z4_ZzYX2Q', 'type': 'function'}], function_call=None, provider_specific_fields={'thought_signatures': ['EjQKMgG+Pvb7flzsNemEsWPXNGRFnhyvmQoZhKthF7xruwK0UZR21vDR1SwlKs3KsoGnhcN5']})
chat.hist[2]
{'tool_call_id': 'call_TXvTBxIkQea6iYLdheaeqQ',
 'role': 'tool',
 'name': 'simple_add',
 'content': '15'}
chat.hist[3]
{'tool_call_id': 'call_TJoK4VQZTvyV5z4_ZzYX2Q',
 'role': 'tool',
 'name': 'simple_add',
 'content': '3'}
chat.hist[4]
Message(content='After the first batch of calculations, we have simplified the expression:\n*   The first part `(10 + 5)` is now **15**.\n*   The second part `(2 + 1)` is now **3**.\n\nThe expression is now `(15 * 3) / 3`. Next, I will perform the multiplication.\n\n', role='assistant', tool_calls=None, function_call=None, provider_specific_fields=None)

Now to demonstrate that we can load back the formatted output back into a new Chat object:

chat5 = Chat(model,hist=fmt2hist(fmt.outp),tools=[simple_add, multiply, divide])
chat5('what did we just do?')

We just started solving the math expression (10 + 5) * (2 + 1) / 3 by breaking it down into steps using the order of operations (PEMDAS/BODMAS):

  1. Addition inside the first parentheses: We added 10 + 5 to get 15.
  2. Addition inside the second parentheses: We added 2 + 1 to get 3.

This simplified your original problem down to (15 * 3) / 3.

Would you like me to finish the calculation by multiplying those results and then dividing?

  • id: chatcmpl-xxx
  • model: gemini-3-flash-preview
  • finish_reason: stop
  • usage: Usage(completion_tokens=133, prompt_tokens=347, total_tokens=480, completion_tokens_details=CompletionTokensDetailsWrapper(accepted_prediction_tokens=None, audio_tokens=None, reasoning_tokens=None, rejected_prediction_tokens=None, text_tokens=133, image_tokens=None), prompt_tokens_details=PromptTokensDetailsWrapper(audio_tokens=None, cached_tokens=None, text_tokens=347, image_tokens=None), cache_read_input_tokens=None)

Search

chat_stream_tools = AsyncChat(model, search='l')
res = await chat_stream_tools("Search the weather in NYC", stream=True)
_=await adisplay_stream(res)

The current weather in New York City as of Friday afternoon, February 6, 2026, is cloudy and cold with a temperature of 31°F (-1°C).

Current Conditions (4:56 PM EST)

  • Temperature: 31°F (Feels like 31°F)
  • Conditions: Cloudy
  • Humidity: 52%
  • Wind: NNW at 5 mph

Tonight’s Forecast

An arctic cold front is approaching, which will bring snow and a significant drop in temperature overnight. * Evening: Light snow is expected to begin after 6:00 PM, with a 35–40% chance of precipitation. * Overnight: Snow showers and flurries will continue. Temperatures will drop to a low of 19°F (-7°C), but wind chills could make it feel as cold as 0°F to -10°F as winds increase to 20–25 mph.

Weekend Outlook

  • Saturday, Feb 7: Frigid and windy with light snow during the day. High of 20°F and a low of 6°F.
  • Sunday, Feb 8: Clear but extremely cold. High of 17°F and a low of 8°F.

Travel Note: Road conditions may become slippery this evening and overnight due to the combination of light snow and rapidly falling temperatures.

Let’s mock pause_turn with async completion and streaming:

# async def mk_pause_web_search_stream():
#     """Async generator that mimics a streaming pause_turn response"""
#     srv_tc = mk_tc("web_search", json.dumps({"query": "Solveit Answer.AI"}), 
#                    tcid=random_tool_id().replace('toolu_', 'srvtoolu_'))
#     yield mk_stream_chunk(content="Let me search for that information:", role='assistant')
#     yield mk_stream_chunk(tool_calls=[srv_tc])
#     yield mk_stream_chunk(finish_reason="pause_turn")
# orig_acompletion = acompletion
# 
# call_count = 0
# async def patched_acompletion(*args, **kwargs):
#     global call_count
#     call_count += 1
#     print(f"Mock Async Call {call_count}")
#     await asyncio.sleep(1)
#     if call_count < 3: return mk_pause_web_search_stream()
#     return await orig_acompletion(*args, **kwargs)
# 
# acompletion = patched_acompletion
# achat_pause = AsyncChat('claude-sonnet-4-5', search='l')
# 
# call_count = 0
# res = await achat_pause("Search and tell me about Solveit", stream=True)
# fmt = await adisplay_stream(res)
# print(f"\nTotal calls: {call_count}")
# 
# acompletion = orig_acompletion

Tool Call Referencing

achat = AsyncChat('claude-sonnet-4-5', tools=[get_person, greet_person], tc_refs=True)
await achat("First call get_person, then pass the result to greet_person", max_steps=3)

Perfect! I successfully completed both steps:

  1. Retrieved person data: I called get_person which returned a person named Alice who is 30 years old.

  2. Greeted the person: I then passed Alice’s data to greet_person, which generated the greeting: “Hello Alice, you are 30 years old!”

The task has been completed successfully. The person’s information was retrieved and used to create a personalized greeting.

  • id: chatcmpl-xxx
  • model: claude-sonnet-4-5-20250929
  • finish_reason: stop
  • usage: Usage(completion_tokens=103, prompt_tokens=1080, total_tokens=1183, completion_tokens_details=CompletionTokensDetailsWrapper(accepted_prediction_tokens=None, audio_tokens=None, reasoning_tokens=0, rejected_prediction_tokens=None, text_tokens=103, image_tokens=None), prompt_tokens_details=PromptTokensDetailsWrapper(audio_tokens=None, cached_tokens=0, text_tokens=None, image_tokens=None, cache_creation_tokens=0, cache_creation_token_details=CacheCreationTokenDetails(ephemeral_5m_input_tokens=0, ephemeral_1h_input_tokens=0)), cache_creation_input_tokens=0, cache_read_input_tokens=0, inference_geo='not_available', speed=None)
achat.tc_res
{'toolu_z_TFa_nqTGSMQX5810HWCQ': {'name': 'Alice', 'age': 30},
 'toolu_HbK0UnqlShif08AXV_mNHg': 'Hello Alice, you are 30 years old!'}
list(L(achat.hist).attrgot('tool_calls').filter())
[[ChatCompletionMessageToolCall(index=1, function=Function(arguments='{}', name='get_person'), id='toolu_z_TFa_nqTGSMQX5810HWCQ', type='function')],
 [ChatCompletionMessageToolCall(index=1, function=Function(arguments='{"person": "$`toolu_z_TFa_nqTGSMQX5810HWCQ`"}', name='greet_person'), id='toolu_HbK0UnqlShif08AXV_mNHg', type='function')]]