Core

Lisette Core

LiteLLM

Deterministic outputs

LiteLLM ModelResponse(Stream) objects have id and created_at fields that are generated dynamically. Even when we use cachy to cache the LLM response these dynamic fields create diffs which makes code review more challenging. The patches below ensure that id and created_at fields are fixed and won’t generate diffs.

source

patch_litellm

 patch_litellm (seed=0)

Patch litellm.ModelResponseBase such that id and created are fixed.

patch_litellm()

Completion

LiteLLM provides an convenient unified interface for most big LLM providers. Because it’s so useful to be able to switch LLM providers with just one argument. We want to make it even easier to by adding some more convenience functions and classes.

This is very similar to our other wrapper libraries for popular AI providers: claudette (Anthropic), gaspard (Gemini), cosette (OpenAI).

# litellm._turn_on_debug()

ms = ["gemini/gemini-2.5-flash", "claude-sonnet-4-5", "openai/gpt-4.1"]
msg = [{'role':'user','content':'Hey there!', 'cache_control': {'type': 'ephemeral'}}]
for m in ms:
    display(Markdown(f'**{m}:**'))
    display(completion(m,msg))

gemini/gemini-2.5-flash:

Hey there! How can I help you today?

id: chatcmpl-xxx
model: gemini-2.5-flash
finish_reason: stop
usage: Usage(completion_tokens=153, prompt_tokens=4, total_tokens=157, completion_tokens_details=CompletionTokensDetailsWrapper(accepted_prediction_tokens=None, audio_tokens=None, reasoning_tokens=143, rejected_prediction_tokens=None, text_tokens=10), prompt_tokens_details=PromptTokensDetailsWrapper(audio_tokens=None, cached_tokens=None, text_tokens=4, image_tokens=None))

claude-sonnet-4-5:

Hello! How can I help you today?

id: chatcmpl-xxx
model: claude-sonnet-4-5-20250929
finish_reason: stop
usage: Usage(completion_tokens=12, prompt_tokens=10, total_tokens=22, completion_tokens_details=None, prompt_tokens_details=PromptTokensDetailsWrapper(audio_tokens=None, cached_tokens=0, text_tokens=None, image_tokens=None), cache_creation_input_tokens=0, cache_read_input_tokens=0)

openai/gpt-4.1:

Hello! How can I help you today? 😊

id: chatcmpl-xxx
model: gpt-4.1-2025-04-14
finish_reason: stop
usage: Usage(completion_tokens=10, prompt_tokens=10, total_tokens=20, completion_tokens_details=CompletionTokensDetailsWrapper(accepted_prediction_tokens=0, audio_tokens=0, reasoning_tokens=0, rejected_prediction_tokens=0, text_tokens=None), prompt_tokens_details=PromptTokensDetailsWrapper(audio_tokens=0, cached_tokens=0, text_tokens=None, image_tokens=None))

Messages formatting

Let’s start with making it easier to pass messages into litellm’s completion function (including images, and pdf files).

source

remove_cache_ckpts

 remove_cache_ckpts (msg)

remove cache checkpoints and return msg.

source

mk_msg

 mk_msg (content, role='user', cache=False, ttl=None)

Create a LiteLLM compatible message.

	Type	Default	Details
content			Content: str, bytes (image), list of mixed content, or dict w ‘role’ and ‘content’ fields
role	str	user	Message role if content isn’t already a dict/Message
cache	bool	False	Enable Anthropic caching
ttl	NoneType	None	Cache TTL: ‘5m’ (default) or ‘1h’

Now we can use mk_msg to create different types of messages.

Simple text:

msg = mk_msg("hey")
msg

{'role': 'user', 'content': 'hey'}

Which can be passed to litellm’s completion function like this:

model = ms[1]

res = completion(model, [msg])
res

Hey! How’s it going? What’s on your mind?

id: chatcmpl-xxx
model: claude-sonnet-4-5-20250929
finish_reason: stop
usage: Usage(completion_tokens=16, prompt_tokens=8, total_tokens=24, completion_tokens_details=None, prompt_tokens_details=PromptTokensDetailsWrapper(audio_tokens=None, cached_tokens=0, text_tokens=None, image_tokens=None), cache_creation_input_tokens=0, cache_read_input_tokens=0)

We’ll add a little shortcut to make examples and testing easier here:

def c(msgs, **kw):
    msgs = [msgs] if isinstance(msgs,dict) else listify(msgs)
    return completion(model, msgs, **kw)

c(msg)

Hey! How’s it going? What’s on your mind?

id: chatcmpl-xxx
model: claude-sonnet-4-5-20250929
finish_reason: stop
usage: Usage(completion_tokens=16, prompt_tokens=8, total_tokens=24, completion_tokens_details=None, prompt_tokens_details=PromptTokensDetailsWrapper(audio_tokens=None, cached_tokens=0, text_tokens=None, image_tokens=None), cache_creation_input_tokens=0, cache_read_input_tokens=0)

Lists w just one string element are flattened for conciseness:

test_eq(mk_msg("hey"), mk_msg(["hey"]))

(LiteLLM ignores these fields when sent to other providers)

Text and images:

img_fn = Path('samples/puppy.jpg')
Image(filename=img_fn, width=200)

msg = mk_msg(['hey what in this image?',img_fn.read_bytes()])
print(json.dumps(msg,indent=1)[:200]+"...")

{
 "role": "user",
 "content": [
  {
   "type": "text",
   "text": "hey what in this image?"
  },
  {
   "type": "image_url",
   "image_url": "data:image/jpeg;base64,/9j/4AAQSkZJRgABAQAAAQABAAD/4gxUSU...

c(msg)

This image shows an adorable Cavalier King Charles Spaniel puppy! The puppy has the breed’s characteristic features:

Coloring: Brown (chestnut) and white coat
Sweet expression: Large, dark eyes and a gentle face
Setting: The puppy is lying on grass near some purple flowers (appear to be asters or similar blooms)

The puppy looks very young and has that irresistibly cute, innocent look that Cavalier puppies are famous for. The photo has a professional quality with nice lighting and composition, capturing the puppy’s endearing personality perfectly!

id: chatcmpl-xxx
model: claude-sonnet-4-5-20250929
finish_reason: stop
usage: Usage(completion_tokens=139, prompt_tokens=104, total_tokens=243, completion_tokens_details=None, prompt_tokens_details=PromptTokensDetailsWrapper(audio_tokens=None, cached_tokens=0, text_tokens=None, image_tokens=None), cache_creation_input_tokens=0, cache_read_input_tokens=0)

Let’s also demonstrate this for PDFs

pdf_fn = Path('samples/solveit.pdf')
msg = mk_msg(['Who is the author of this pdf?', pdf_fn.read_bytes()])
c(msg)

The author of this PDF is Jeremy Howard from fast.ai. He explicitly introduces himself in the document with “Hi, I’m Jeremy Howard, from fast.ai” and goes on to describe his work co-founding fast.ai with Rachel Thomas eight years ago.

id: chatcmpl-xxx
model: claude-sonnet-4-5-20250929
finish_reason: stop
usage: Usage(completion_tokens=59, prompt_tokens=1610, total_tokens=1669, completion_tokens_details=None, prompt_tokens_details=PromptTokensDetailsWrapper(audio_tokens=None, cached_tokens=0, text_tokens=None, image_tokens=None), cache_creation_input_tokens=0, cache_read_input_tokens=0)

Caching

Some providers such as Anthropic require manually opting into caching. Let’s try it:

def cpr(i): return f'{i} '*1024 + 'This is a caching test. Report back only what number you see repeated above.'

disable_cachy()

msg = mk_msg(cpr(1), cache=True)
res = c(msg)
res

id: chatcmpl-xxx
model: claude-sonnet-4-5-20250929
finish_reason: stop
usage: Usage(completion_tokens=5, prompt_tokens=3, total_tokens=8, completion_tokens_details=None, prompt_tokens_details=PromptTokensDetailsWrapper(audio_tokens=None, cached_tokens=0, text_tokens=None, image_tokens=None), cache_creation_input_tokens=2070, cache_read_input_tokens=0)

Anthropic has a maximum of 4 cache checkpoints, so we remove previous ones as we go:

res = c([remove_cache_ckpts(msg), mk_msg(res), mk_msg(cpr(2), cache=True)])
res

id: chatcmpl-xxx
model: claude-sonnet-4-5-20250929
finish_reason: stop
usage: Usage(completion_tokens=5, prompt_tokens=2073, total_tokens=2078, completion_tokens_details=None, prompt_tokens_details=PromptTokensDetailsWrapper(audio_tokens=None, cached_tokens=2070, text_tokens=None, image_tokens=None), cache_creation_input_tokens=2074, cache_read_input_tokens=2070)

We see that the first message was cached, and this extra message has been written to cache:

res.usage.prompt_tokens_details

PromptTokensDetailsWrapper(audio_tokens=None, cached_tokens=2070, text_tokens=None, image_tokens=None)

We can add a bunch of large messages in a loop to see how the number of cached tokens used grows.

We do this for 25 times to ensure it still works for more than >20 content blocks, which is a known anthropic issue.

The code below is commented by default, because it’s slow. Please uncomment when working on caching.

# h = []
# msg = mk_msg(cpr(1), cache=True)

# for o in range(2,25):
#     h += [remove_cache_ckpts(msg), mk_msg(res)]
#     msg = mk_msg(cpr(o), cache=True)
#     res = c(h+[msg])
#     detls = res.usage.prompt_tokens_details
#     print(o, detls.cached_tokens, detls.cache_creation_tokens, end='; ')

enable_cachy()

Reconstructing formatted outputs

Lisette can call multiple tools in a loop. Further down this notebook, we’ll provide convenience functions for formatting such a sequence of toolcalls and responses into one formatted output string.

For now, we’ll show an example and show how to transform such a formatted output string back into a valid LiteLLM history.

fmt_outp = '''
I'll solve this step-by-step, using parallel calls where possible.

<details class='tool-usage-details'>

```json
{
  "id": "toolu_01KjnQH2Nsz2viQ7XYpLW3Ta",
  "call": { "function": "simple_add", "arguments": { "a": 10, "b": 5 } },
  "result": "15"
}
```

</details>

<details class='tool-usage-details'>

```json
{
  "id": "toolu_01Koi2EZrGZsBbnQ13wuuvzY",
  "call": { "function": "simple_add", "arguments": { "a": 2, "b": 1 } },
  "result": "3"
}
```

</details>

Now I need to multiply 15 * 3 before I can do the final division:

<details class='tool-usage-details'>

```json
{
  "id": "toolu_0141NRaWUjmGtwxZjWkyiq6C",
  "call": { "function": "multiply", "arguments": { "a": 15, "b": 3 } },
  "result": "45"
}
```

</details>
'''

We can split into chunks of (text,toolstr,json):

sp = re_tools.split(fmt_outp)
for o in list(chunked(sp, 3, pad=True)): print('- ', o)

-  ["\nI'll solve this step-by-step, using parallel calls where possible.\n\n", '<details class=\'tool-usage-details\'>\n\n```json\n{\n  "id": "toolu_01KjnQH2Nsz2viQ7XYpLW3Ta",\n  "call": { "function": "simple_add", "arguments": { "a": 10, "b": 5 } },\n  "result": "15"\n}\n```\n\n</details>', '{\n  "id": "toolu_01KjnQH2Nsz2viQ7XYpLW3Ta",\n  "call": { "function": "simple_add", "arguments": { "a": 10, "b": 5 } },\n  "result": "15"\n}']
-  ['\n\n', '<details class=\'tool-usage-details\'>\n\n```json\n{\n  "id": "toolu_01Koi2EZrGZsBbnQ13wuuvzY",\n  "call": { "function": "simple_add", "arguments": { "a": 2, "b": 1 } },\n  "result": "3"\n}\n```\n\n</details>', '{\n  "id": "toolu_01Koi2EZrGZsBbnQ13wuuvzY",\n  "call": { "function": "simple_add", "arguments": { "a": 2, "b": 1 } },\n  "result": "3"\n}']
-  ['\n\nNow I need to multiply 15 * 3 before I can do the final division:\n\n', '<details class=\'tool-usage-details\'>\n\n```json\n{\n  "id": "toolu_0141NRaWUjmGtwxZjWkyiq6C",\n  "call": { "function": "multiply", "arguments": { "a": 15, "b": 3 } },\n  "result": "45"\n}\n```\n\n</details>', '{\n  "id": "toolu_0141NRaWUjmGtwxZjWkyiq6C",\n  "call": { "function": "multiply", "arguments": { "a": 15, "b": 3 } },\n  "result": "45"\n}']
-  ['\n', None, None]

source

fmt2hist

 fmt2hist (outp:str)

Transform a formatted output into a LiteLLM compatible history

See how we can turn that one formatted output string back into a list of Messages:

from pprint import pprint

h = fmt2hist(fmt_outp)
pprint(h)

[Message(content="I'll solve this step-by-step, using parallel calls where possible.", role='assistant', tool_calls=[ChatCompletionMessageToolCall(function=Function(arguments='{"a":10,"b":5}', name='simple_add'), id='toolu_01KjnQH2Nsz2viQ7XYpLW3Ta', type='function')], function_call=None, provider_specific_fields=None),
 {'content': '15',
  'name': 'simple_add',
  'role': 'tool',
  'tool_call_id': 'toolu_01KjnQH2Nsz2viQ7XYpLW3Ta'},
 Message(content='', role='assistant', tool_calls=[ChatCompletionMessageToolCall(function=Function(arguments='{"a":2,"b":1}', name='simple_add'), id='toolu_01Koi2EZrGZsBbnQ13wuuvzY', type='function')], function_call=None, provider_specific_fields=None),
 {'content': '3',
  'name': 'simple_add',
  'role': 'tool',
  'tool_call_id': 'toolu_01Koi2EZrGZsBbnQ13wuuvzY'},
 Message(content='Now I need to multiply 15 * 3 before I can do the final division:', role='assistant', tool_calls=[ChatCompletionMessageToolCall(function=Function(arguments='{"a":15,"b":3}', name='multiply'), id='toolu_0141NRaWUjmGtwxZjWkyiq6C', type='function')], function_call=None, provider_specific_fields=None),
 {'content': '45',
  'name': 'multiply',
  'role': 'tool',
  'tool_call_id': 'toolu_0141NRaWUjmGtwxZjWkyiq6C'},
 Message(content='.', role='assistant', tool_calls=None, function_call=None, provider_specific_fields=None)]

`mk_msgs`

We will skip tool use blocks and tool results during caching

Now lets make it easy to provide entire conversations:

source

mk_msgs

 mk_msgs (msgs, cache=False, cache_idxs=[-1], ttl=None)

Create a list of LiteLLM compatible messages.

	Type	Default	Details
msgs			List of messages (each: str, bytes, list, or dict w ‘role’ and ‘content’ fields)
cache	bool	False	Enable Anthropic caching
cache_idxs	list	[-1]	Cache breakpoint idxs
ttl	NoneType	None	Cache TTL: ‘5m’ (default) or ‘1h’

With mk_msgs you can easily provide a whole conversation:

msgs = mk_msgs(['Hey!',"Hi there!","How are you?","I'm doing fine and you?"])
msgs

[{'role': 'user', 'content': 'Hey!'},
 {'role': 'assistant', 'content': 'Hi there!'},
 {'role': 'user', 'content': 'How are you?'},
 {'role': 'assistant', 'content': "I'm doing fine and you?"}]

By defualt the last message will be cached when cache=True:

msgs = mk_msgs(['Hey!',"Hi there!","How are you?","I'm doing fine and you?"], cache=True)
msgs

[{'role': 'user', 'content': 'Hey!'},
 {'role': 'assistant', 'content': 'Hi there!'},
 {'role': 'user', 'content': 'How are you?'},
 {'role': 'assistant',
  'content': [{'type': 'text',
    'text': "I'm doing fine and you?",
    'cache_control': {'type': 'ephemeral'}}]}]

test_eq('cache_control' in msgs[-1]['content'][0], True)

Alternatively, users can provide custom cache_idxs. Tool call blocks and results are skipped during caching:

msgs = mk_msgs(['Hello!','Hi! How can I help you?','Call some functions!',fmt_outp], cache=True, cache_idxs=[0,-2,-1])
msgs

[{'role': 'user',
  'content': [{'type': 'text',
    'text': 'Hello!',
    'cache_control': {'type': 'ephemeral'}}]},
 {'role': 'assistant', 'content': 'Hi! How can I help you?'},
 {'role': 'user',
  'content': [{'type': 'text',
    'text': 'Call some functions!',
    'cache_control': {'type': 'ephemeral'}}]},
 Message(content="I'll solve this step-by-step, using parallel calls where possible.", role='assistant', tool_calls=[ChatCompletionMessageToolCall(function=Function(arguments='{"a":10,"b":5}', name='simple_add'), id='toolu_01KjnQH2Nsz2viQ7XYpLW3Ta', type='function')], function_call=None, provider_specific_fields=None),
 {'role': 'tool',
  'tool_call_id': 'toolu_01KjnQH2Nsz2viQ7XYpLW3Ta',
  'name': 'simple_add',
  'content': '15'},
 Message(content='', role='assistant', tool_calls=[ChatCompletionMessageToolCall(function=Function(arguments='{"a":2,"b":1}', name='simple_add'), id='toolu_01Koi2EZrGZsBbnQ13wuuvzY', type='function')], function_call=None, provider_specific_fields=None),
 {'role': 'tool',
  'tool_call_id': 'toolu_01Koi2EZrGZsBbnQ13wuuvzY',
  'name': 'simple_add',
  'content': '3'},
 Message(content='Now I need to multiply 15 * 3 before I can do the final division:', role='assistant', tool_calls=[ChatCompletionMessageToolCall(function=Function(arguments='{"a":15,"b":3}', name='multiply'), id='toolu_0141NRaWUjmGtwxZjWkyiq6C', type='function')], function_call=None, provider_specific_fields=None),
 {'role': 'tool',
  'tool_call_id': 'toolu_0141NRaWUjmGtwxZjWkyiq6C',
  'name': 'multiply',
  'content': '45'},
 Message(content=[{'type': 'text', 'text': '.', 'cache_control': {'type': 'ephemeral'}}], role='assistant', tool_calls=None, function_call=None, provider_specific_fields=None)]

test_eq('cache_control' in msgs[0]['content'][0], True)
test_eq('cache_control' in msgs[2]['content'][0], True) # shifted idxs to skip tools
test_eq('cache_control' in msgs[-1]['content'][0], True)

Who’s speaking at when is automatically inferred. Even when there are multiple tools being called in parallel (which LiteLLM supports!).

msgs = mk_msgs(['Tell me the weather in Paris and Rome',
                'Assistant calls weather tool two times',
                {'role':'tool','content':'Weather in Paris is ...'},
                {'role':'tool','content':'Weather in Rome is ...'},
                'Assistant returns weather',
                'Thanks!'])
msgs

[{'role': 'user', 'content': 'Tell me the weather in Paris and Rome'},
 {'role': 'assistant', 'content': 'Assistant calls weather tool two times'},
 {'role': 'tool', 'content': 'Weather in Paris is ...'},
 {'role': 'tool', 'content': 'Weather in Rome is ...'},
 {'role': 'assistant', 'content': 'Assistant returns weather'},
 {'role': 'user', 'content': 'Thanks!'}]

For ease of use, if msgs is not already in a list, it will automatically be wrapped inside one. This way you can pass a single prompt into mk_msgs and get back a LiteLLM compatible msg history.

msgs = mk_msgs("Hey")
msgs

[{'role': 'user', 'content': 'Hey'}]

msgs = mk_msgs(['Hey!',"Hi there!","How are you?","I'm fine, you?"])
msgs

[{'role': 'user', 'content': 'Hey!'},
 {'role': 'assistant', 'content': 'Hi there!'},
 {'role': 'user', 'content': 'How are you?'},
 {'role': 'assistant', 'content': "I'm fine, you?"}]

However, beware that if you use mk_msgs for a single message, consisting of multiple parts. Then you should be explicit, and make sure to wrap those multiple messages in two lists:

One list to show that they belong together in one message (the inner list).
Another, because mk_msgs expects a list of multiple messages (the outer list).

This is common when working with images for example:

msgs = mk_msgs([['Whats in this img?',img_fn.read_bytes()]])
print(json.dumps(msgs,indent=1)[:200]+"...")

[
 {
  "role": "user",
  "content": [
   {
    "type": "text",
    "text": "Whats in this img?"
   },
   {
    "type": "image_url",
    "image_url": "data:image/jpeg;base64,/9j/4AAQSkZJRgABAQAAAQABAAD...

Streaming

LiteLLM supports streaming responses. That’s really useful if you want to show intermediate results, instead of having to wait until the whole response is finished.

We create this helper function that returns the entire response at the end of the stream. This is useful when you want to store the whole response somewhere after having displayed the intermediate results.

source

stream_with_complete

 stream_with_complete (gen, postproc=<function noop>)

Extend streaming response chunks with the complete response

r = c(mk_msgs("Hey!"), stream=True)
r2 = SaveReturn(stream_with_complete(r))

for o in r2:
    cts = o.choices[0].delta.content
    if cts: print(cts, end='')

Hey! How's it going? 😊 What can I help you with today?

r2.value

Hey! How’s it going? 😊 What can I help you with today?

id: chatcmpl-xxx
model: claude-sonnet-4-5
finish_reason: stop
usage: Usage(completion_tokens=22, prompt_tokens=9, total_tokens=31, completion_tokens_details=CompletionTokensDetailsWrapper(accepted_prediction_tokens=None, audio_tokens=None, reasoning_tokens=0, rejected_prediction_tokens=None, text_tokens=None), prompt_tokens_details=None)

Tools

source

lite_mk_func

 lite_mk_func (f)

def simple_add(
    a: int,   # first operand
    b: int=0  # second operand
) -> int:
    "Add two numbers together"
    return a + b

toolsc = lite_mk_func(simple_add)
toolsc

{'type': 'function',
 'function': {'name': 'simple_add',
  'description': 'Add two numbers together\n\nReturns:\n- type: integer',
  'parameters': {'type': 'object',
   'properties': {'a': {'type': 'integer', 'description': 'first operand'},
    'b': {'type': 'integer', 'description': 'second operand', 'default': 0}},
   'required': ['a']}}}

tmsg = mk_msg("What is 5478954793+547982745? How about 5479749754+9875438979? Always use tools for calculations, and describe what you'll do before using a tool. Where multiple tool calls are required, do them in a single response where possible. ")
r = c(tmsg, tools=[toolsc])

display(r)

I’ll help you calculate both of those sums using the addition tool.

Let me break down what I’ll do: 1. First calculation: 5478954793 + 547982745 2. Second calculation: 5479749754 + 9875438979

Since these are independent calculations, I’ll perform both at the same time.

🔧 simple_add({“a”: 5478954793, “b”: 547982745})

🔧 simple_add({“a”: 5479749754, “b”: 9875438979})

id: chatcmpl-xxx
model: claude-sonnet-4-5-20250929
finish_reason: tool_calls
usage: Usage(completion_tokens=211, prompt_tokens=659, total_tokens=870, completion_tokens_details=None, prompt_tokens_details=PromptTokensDetailsWrapper(audio_tokens=None, cached_tokens=0, text_tokens=None, image_tokens=None), cache_creation_input_tokens=0, cache_read_input_tokens=0)

A tool response can be a string or a list of tool blocks (e.g., an image url block). To allow users to specify if a response should not be immediately stringified, we provide the ToolResponse datatype users can wrap their return statement in.

source

ToolResponse

 ToolResponse (content:list[str,str])

tcs = [_lite_call_func(o, ns=globals()) for o in r.choices[0].message.tool_calls]
tcs

[{'tool_call_id': 'toolu_01KATe5b5tmd4tK5D9BUZE5S',
  'role': 'tool',
  'name': 'simple_add',
  'content': '6026937538'},
 {'tool_call_id': 'toolu_01E4WQj8RkQj8Z7QLJ6ireTe',
  'role': 'tool',
  'name': 'simple_add',
  'content': '15355188733'}]

def delta_text(msg):
    "Extract printable content from streaming delta, return None if nothing to print"
    c = msg.choices[0]
    if not c: return c
    if not hasattr(c,'delta'): return None #f'{c}'
    delta = c.delta
    if delta.content: return delta.content
    if delta.tool_calls:
        res = ''.join(f"🔧 {tc.function.name}" for tc in delta.tool_calls if tc.id and tc.function.name)
        if res: return f'\n{res}\n'
    if hasattr(delta,'reasoning_content'): return '🧠' if delta.reasoning_content else '\n\n'
    return None

r = c(tmsg, stream=True, tools=[toolsc])
r2 = SaveReturn(stream_with_complete(r))
for o in r2: print(delta_text(o) or '', end='')

I'll help you calculate those two sums using the addition tool.

Let me break down what I need to do:
1. Calculate 5478954793 + 547982745
2. Calculate 5479749754 + 9875438979

Since these are independent calculations, I'll perform both additions at once.
🔧 simple_add

🔧 simple_add

r2.value

I’ll help you calculate those two sums using the addition tool.

Let me break down what I need to do: 1. Calculate 5478954793 + 547982745 2. Calculate 5479749754 + 9875438979

Since these are independent calculations, I’ll perform both additions at once.

🔧 simple_add({“a”: 5478954793, “b”: 547982745})

🔧 simple_add({“a”: 5479749754, “b”: 9875438979})

id: chatcmpl-xxx
model: claude-sonnet-4-5
finish_reason: tool_calls
usage: Usage(completion_tokens=206, prompt_tokens=659, total_tokens=865, completion_tokens_details=CompletionTokensDetailsWrapper(accepted_prediction_tokens=None, audio_tokens=None, reasoning_tokens=0, rejected_prediction_tokens=None, text_tokens=None), prompt_tokens_details=None)

msg = mk_msg("Solve this complex math problem: What is the derivative of x^3 + 2x^2 - 5x + 1?")
r = c(msg, stream=True, reasoning_effort="low")
r2 = SaveReturn(stream_with_complete(r))
for o in r2: print(delta_text(o) or '', end='')

🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠

# Derivative Solution

To find the derivative of **f(x) = x³ + 2x² - 5x + 1**, I'll apply the power rule to each term.

## Using the Power Rule: d/dx(xⁿ) = n·xⁿ⁻¹

**Term by term:**
- d/dx(x³) = 3x²
- d/dx(2x²) = 4x
- d/dx(-5x) = -5
- d/dx(1) = 0

## Answer:
**f'(x) = 3x² + 4x - 5**

r2.value

Derivative Solution

To find the derivative of f(x) = x³ + 2x² - 5x + 1, I’ll apply the power rule to each term.

Using the Power Rule: d/dx(xⁿ) = n·xⁿ⁻¹

Term by term: - d/dx(x³) = 3x² - d/dx(2x²) = 4x - d/dx(-5x) = -5 - d/dx(1) = 0

Answer:

f’(x) = 3x² + 4x - 5

id: chatcmpl-xxx
model: claude-sonnet-4-5
finish_reason: stop
usage: Usage(completion_tokens=328, prompt_tokens=66, total_tokens=394, completion_tokens_details=CompletionTokensDetailsWrapper(accepted_prediction_tokens=None, audio_tokens=None, reasoning_tokens=148, rejected_prediction_tokens=None, text_tokens=None), prompt_tokens_details=None)

Search

LiteLLM provides search, not via tools, but via the special web_search_options param.

Note: Not all models support web search. LiteLLM’s supports_web_search field should indicate this, but it’s unreliable for some models like claude-sonnet-4-20250514. Checking both supports_web_search and search_context_cost_per_query provides more accurate detection.

for m in ms: print(m, _has_search(m))

gemini/gemini-2.5-flash True
claude-sonnet-4-5 True
openai/gpt-4.1 False

When search is supported it can be used like this:

smsg = mk_msg("Search the web and tell me very briefly about otters")
r = c(smsg, web_search_options={"search_context_size": "low"})  # or 'medium' / 'high'
r

Otters are carnivorous mammals in the subfamily Lutrinae and members of the weasel family. The 14 extant otter species are all semiaquatic, both freshwater and marine. They’re found on every continent except Australia and Antarctica.

Otters are distinguished by their long, slim bodies, powerful webbed feet for swimming, and their dense fur, which keeps them warm and buoyant in water. In fact, otters have the densest fur of any animal—as many as a million hairs per square inch in places.

All otters are expert hunters that eat fish, crustaceans, and other critters. They’re known for being playful animals and sea otters famously use rocks as tools to crack open shellfish. When it’s time to nap, sea otters entangle themselves in kelp so they don’t float away.

Many otter species were historically hunted nearly to extinction for their fur but have since recovered in some areas, though several species remain threatened by pollution and habitat loss.

id: chatcmpl-xxx
model: claude-sonnet-4-5-20250929
finish_reason: stop
usage: Usage(completion_tokens=382, prompt_tokens=18089, total_tokens=18471, completion_tokens_details=None, prompt_tokens_details=PromptTokensDetailsWrapper(audio_tokens=None, cached_tokens=0, text_tokens=None, image_tokens=None), server_tool_use=ServerToolUse(web_search_requests=1), cache_creation_input_tokens=0, cache_read_input_tokens=0)

Citations

Next, lets handle Anthropic’s search citations.

When not using streaming, all citations are placed in a separate key in the response:

r.choices[0].message.provider_specific_fields['citations'][0]

[{'type': 'web_search_result_location',
  'cited_text': 'Otters are carnivorous mammals in the subfamily Lutrinae. ',
  'url': 'https://en.wikipedia.org/wiki/Otter',
  'title': 'Otter - Wikipedia',
  'encrypted_index': 'Eo8BCioICBgCIiQ4ODk4YTFkYy0yMTNkLTRhNmYtOTljYi03ZTBlNTUzZDc0NWISDMlacTT8THSDML7nuhoMyB3Xp2StEfWJOx72IjATEIYmZbwZDH+a0KRLuOHQx4nipGzmvy//B4ItZEaDN4t55aF0a+SnmlUY390IN18qE+y/CtqixJ/kgvGL2GCYkFhQRxMYBA=='}]

However, when streaming the results are not captured this way. Instead, we provide this helper function that adds the citation to the content field in markdown format:

source

cite_footnotes

 cite_footnotes (stream_list)

Add markdown footnote citations to stream deltas

source

cite_footnote

 cite_footnote (msg)

r = list(c(smsg, stream=True, web_search_options={"search_context_size": "low"}))
cite_footnotes(r)
stream_chunk_builder(r)

Otters are * charismatic members of the weasel family, found on every continent except Australia and Antarctica. * * There are 13-14 species in total, ranging from the small-clawed otter to the giant otter.

These aquatic mammals are known for * their short ears and noses, elongated bodies, long tails, and soft, dense fur. In fact, * otters have the densest fur of any animal—as many as a million hairs per square inch, which keeps them warm in water since they lack blubber.

* All otters are expert hunters that eat fish, crustaceans, and other critters. * Sea otters will float on their backs, place a rock on their chests, then smash mollusks down on it until they break open. * River otters are especially playful, gamboling on land and splashing into rivers and streams. They’re highly adapted for water with webbed feet, and * can stay submerged for more than 5 minutes, with river otters able to hold their breath for up to 8 minutes.

id: chatcmpl-xxx
model: claude-sonnet-4-5
finish_reason: stop
usage: Usage(completion_tokens=431, prompt_tokens=15055, total_tokens=15486, completion_tokens_details=CompletionTokensDetailsWrapper(accepted_prediction_tokens=None, audio_tokens=None, reasoning_tokens=0, rejected_prediction_tokens=None, text_tokens=None), prompt_tokens_details=None)

Chat

LiteLLM is pretty bare bones. It doesnt keep track of conversation history or what tools have been added in the conversation so far.

So lets make a Claudette style wrapper so we can do streaming, toolcalling, and toolloops without problems.

When the tool uses are about to be exhausted it is important to alert the AI so that it knows to use its final steps for communicating the user current progress and next steps

source

Chat

 Chat (model:str, sp='', temp=0, search=False, tools:list=None,
       hist:list=None, ns:Optional[dict]=None, cache=False,
       cache_idxs:list=[-1], ttl=None)

LiteLLM chat client.

	Type	Default	Details
model	str		LiteLLM compatible model name
sp	str		System prompt
temp	int	0	Temperature
search	bool	False	Search (l,m,h), if model supports it
tools	list	None	Add tools
hist	list	None	Chat history
ns	Optional	None	Custom namespace for tool calling
cache	bool	False	Anthropic prompt caching
cache_idxs	list	[-1]	Anthropic cache breakpoint idxs, use `0` for sys prompt if provided
ttl	NoneType	None	Anthropic prompt caching ttl

@patch(as_prop=True)
def cost(self: Chat):
    "Total cost of all responses in conversation history"
    return sum(getattr(r, '_hidden_params', {}).get('response_cost')  or 0
               for r in self.h if hasattr(r, 'choices'))

Examples

History tracking

chat = Chat(model)
res = chat("Hey my name is Rens")
res

Hey Rens! Nice to meet you. How can I help you today?

id: chatcmpl-xxx
model: claude-sonnet-4-5-20250929
finish_reason: stop
usage: Usage(completion_tokens=20, prompt_tokens=14, total_tokens=34, completion_tokens_details=None, prompt_tokens_details=PromptTokensDetailsWrapper(audio_tokens=None, cached_tokens=0, text_tokens=None, image_tokens=None), cache_creation_input_tokens=0, cache_read_input_tokens=0)

chat("Whats my name")

Your name is Rens! You told me that when you introduced yourself at the start of our conversation.

id: chatcmpl-xxx
model: claude-sonnet-4-5-20250929
finish_reason: stop
usage: Usage(completion_tokens=25, prompt_tokens=42, total_tokens=67, completion_tokens_details=None, prompt_tokens_details=PromptTokensDetailsWrapper(audio_tokens=None, cached_tokens=0, text_tokens=None, image_tokens=None), cache_creation_input_tokens=0, cache_read_input_tokens=0)

See now we keep track of history!

History is stored in the hist attribute:

chat.hist

[{'role': 'user', 'content': 'Hey my name is Rens'},
 Message(content='Hey Rens! Nice to meet you. How can I help you today?', role='assistant', tool_calls=None, function_call=None, provider_specific_fields={'citations': None, 'thinking_blocks': None}),
 {'role': 'user', 'content': 'Whats my name'},
 Message(content='Your name is Rens! You told me that when you introduced yourself at the start of our conversation.', role='assistant', tool_calls=None, function_call=None, provider_specific_fields={'citations': None, 'thinking_blocks': None})]

You can also pass an old chat history into new Chat objects:

chat2 = Chat(model, hist=chat.hist)
chat2("What was my name again?")

Your name is Rens! You’ve asked me a couple times now - just checking if I’m paying attention? 😊

id: chatcmpl-xxx
model: claude-sonnet-4-5-20250929
finish_reason: stop
usage: Usage(completion_tokens=30, prompt_tokens=76, total_tokens=106, completion_tokens_details=None, prompt_tokens_details=PromptTokensDetailsWrapper(audio_tokens=None, cached_tokens=0, text_tokens=None, image_tokens=None), cache_creation_input_tokens=0, cache_read_input_tokens=0)

Synthetic History Creation

Lets build chat history step by step. That way we can tweak anything we need to during testing.

pr = "What is 5 + 7? Use the tool to calculate it."
c = Chat(model, tools=[simple_add])
res = c(pr)

source

Chat.print_hist

 Chat.print_hist ()

Print each message on a different line

Whereas normally without tools we would get one user input and one assistant response. Here we get two extra messages in between. - An assistant message requesting the tools with arguments. - A tool response with the result to the tool call.

c.print_hist()

{'role': 'user', 'content': 'What is 5 + 7? Use the tool to calculate it.'}

Message(content=None, role='assistant', tool_calls=[{'index': 0, 'function': {'arguments': '{"a": 5, "b": 7}', 'name': 'simple_add'}, 'id': 'toolu_012bi9eSyzhwaG3TgGpytJbc', 'type': 'function'}], function_call=None, provider_specific_fields={'citations': None, 'thinking_blocks': None})

{'tool_call_id': 'toolu_012bi9eSyzhwaG3TgGpytJbc', 'role': 'tool', 'name': 'simple_add', 'content': '12'}

{'role': 'assistant', 'content': 'You have no more tool uses. Please summarize your findings. If you did not complete your goal please tell the user what further work needs to be done so they can choose how best to proceed.'}

Message(content='\n\nThe result of 5 + 7 is **12**.', role='assistant', tool_calls=None, function_call=None, provider_specific_fields={'citations': None, 'thinking_blocks': None})

Lets try to build this up manually so we have full control over the inputs.

source

random_tool_id

 random_tool_id ()

Generate a random tool ID with ‘toolu_’ prefix

random_tool_id()

'toolu_0UAqFzWsDK4FrUMp48Y3tT3QD'

A tool call request can contain one more or more tool calls. Lets make one.

source

mk_tc

 mk_tc (func, args, tcid=None, idx=1)

tc = mk_tc(simple_add.__name__, json.dumps(dict(a=5, b=7)))
tc

{'index': 1,
 'function': {'arguments': '{"a": 5, "b": 7}', 'name': 'simple_add'},
 'id': 'toolu_gAL47D1qXIaSyZPaE1pu1lJo7',
 'type': 'function'}

This can then be packged into the full Message object produced by the assitant.

def mk_tc_req(content, tcs): return Message(content=content, role='assistant', tool_calls=tcs, function_call=None)

tc_cts = "I'll use the simple_add tool to calculate 5 + 7 for you."
tcq = mk_tc_req(tc_cts, [tc])
tcq

Message(content="I'll use the simple_add tool to calculate 5 + 7 for you.", role='assistant', tool_calls=[ChatCompletionMessageToolCall(index=1, function=Function(arguments='{"a": 5, "b": 7}', name='simple_add'), id='toolu_gAL47D1qXIaSyZPaE1pu1lJo7', type='function')], function_call=None, provider_specific_fields=None)

Notice how Message instantiation creates a list of ChatCompletionMessageToolCalls by default. When the tools are executed this is converted back to a dictionary, for consistency we want to keep these as dictionaries from the beginning.

source

mk_tc_req

 mk_tc_req (content, tcs)

tcq = mk_tc_req(tc_cts, [tc])
tcq

Message(content="I'll use the simple_add tool to calculate 5 + 7 for you.", role='assistant', tool_calls=[{'index': 1, 'function': {'arguments': '{"a": 5, "b": 7}', 'name': 'simple_add'}, 'id': 'toolu_gAL47D1qXIaSyZPaE1pu1lJo7', 'type': 'function'}], function_call=None, provider_specific_fields=None)

c = Chat(model, tools=[simple_add], hist=[pr, tcq])

c.print_hist()

{'role': 'user', 'content': 'What is 5 + 7? Use the tool to calculate it.'}

Message(content="I'll use the simple_add tool to calculate 5 + 7 for you.", role='assistant', tool_calls=[{'index': 1, 'function': {'arguments': '{"a": 5, "b": 7}', 'name': 'simple_add'}, 'id': 'toolu_gAL47D1qXIaSyZPaE1pu1lJo7', 'type': 'function'}], function_call=None, provider_specific_fields=None)

Looks good so far! Now we will want to provide the actual result!

source

mk_tc_result

 mk_tc_result (tc, result)

Note we might have more than one tool call if more than one was passed in, here we just will make one result.

tcq.tool_calls[0]

{'index': 1,
 'function': {'arguments': '{"a": 5, "b": 7}', 'name': 'simple_add'},
 'id': 'toolu_gAL47D1qXIaSyZPaE1pu1lJo7',
 'type': 'function'}

mk_tc_result(tcq.tool_calls[0], '12')

{'tool_call_id': 'toolu_gAL47D1qXIaSyZPaE1pu1lJo7',
 'role': 'tool',
 'name': 'simple_add',
 'content': '12'}

source

mk_tc_results

 mk_tc_results (tcq, results)

Same for here tcq.tool_calls will match the number of results passed in the results list.

tcq

Message(content="I'll use the simple_add tool to calculate 5 + 7 for you.", role='assistant', tool_calls=[{'index': 1, 'function': {'arguments': '{"a": 5, "b": 7}', 'name': 'simple_add'}, 'id': 'toolu_gAL47D1qXIaSyZPaE1pu1lJo7', 'type': 'function'}], function_call=None, provider_specific_fields=None)

tcr = mk_tc_results(tcq, ['12'])
tcr

[{'tool_call_id': 'toolu_gAL47D1qXIaSyZPaE1pu1lJo7',
  'role': 'tool',
  'name': 'simple_add',
  'content': '12'}]

Now we can call it with this synthetic data to see what the response is!

c(tcr[0])

The result of 5 + 7 is 12.

id: chatcmpl-xxx
model: claude-sonnet-4-5-20250929
finish_reason: stop
usage: Usage(completion_tokens=17, prompt_tokens=720, total_tokens=737, completion_tokens_details=None, prompt_tokens_details=PromptTokensDetailsWrapper(audio_tokens=None, cached_tokens=0, text_tokens=None, image_tokens=None), cache_creation_input_tokens=0, cache_read_input_tokens=0)

c.print_hist()

{'role': 'user', 'content': 'What is 5 + 7? Use the tool to calculate it.'}

Message(content="I'll use the simple_add tool to calculate 5 + 7 for you.", role='assistant', tool_calls=[{'index': 1, 'function': {'arguments': '{"a": 5, "b": 7}', 'name': 'simple_add'}, 'id': 'toolu_gAL47D1qXIaSyZPaE1pu1lJo7', 'type': 'function'}], function_call=None, provider_specific_fields=None)

{'tool_call_id': 'toolu_gAL47D1qXIaSyZPaE1pu1lJo7', 'role': 'tool', 'name': 'simple_add', 'content': '12'}

Message(content='The result of 5 + 7 is **12**.', role='assistant', tool_calls=None, function_call=None, provider_specific_fields={'citations': None, 'thinking_blocks': None})

Lets try this again, but lets give it something that is clearly wrong for fun.

c = Chat(model, tools=[simple_add], hist=[pr, tcq])

tcr = mk_tc_results(tcq, ['13'])
tcr

[{'tool_call_id': 'toolu_gAL47D1qXIaSyZPaE1pu1lJo7',
  'role': 'tool',
  'name': 'simple_add',
  'content': '13'}]

c(tcr[0])

The result of 5 + 7 is 12.

id: chatcmpl-xxx
model: claude-sonnet-4-5-20250929
finish_reason: stop
usage: Usage(completion_tokens=17, prompt_tokens=720, total_tokens=737, completion_tokens_details=None, prompt_tokens_details=PromptTokensDetailsWrapper(audio_tokens=None, cached_tokens=0, text_tokens=None, image_tokens=None), cache_creation_input_tokens=0, cache_read_input_tokens=0)

Lets make sure this works with multiple tool calls in the same assistant Message.

tcs = [
    mk_tc(simple_add.__name__, json.dumps({"a": 5, "b": 7})), 
    mk_tc(simple_add.__name__, json.dumps({"a": 6, "b": 7})), 
]

tcq = mk_tc_req("I will calculate these for you!", tcs)
tcq

Message(content='I will calculate these for you!', role='assistant', tool_calls=[{'index': 1, 'function': {'arguments': '{"a": 5, "b": 7}', 'name': 'simple_add'}, 'id': 'toolu_XBetF5gIRHYH7LKBKxJsllLOD', 'type': 'function'}, {'index': 1, 'function': {'arguments': '{"a": 6, "b": 7}', 'name': 'simple_add'}, 'id': 'toolu_fU25035HyRrY03K6JBO94XfLE', 'type': 'function'}], function_call=None, provider_specific_fields=None)

tcr = mk_tc_results(tcq, ['12', '13'])

c = Chat(model, tools=[simple_add], hist=[pr, tcq, tcr[0]])

c(tcr[1])

5 + 7 = 12

id: chatcmpl-xxx
model: claude-sonnet-4-5-20250929
finish_reason: stop
usage: Usage(completion_tokens=13, prompt_tokens=812, total_tokens=825, completion_tokens_details=None, prompt_tokens_details=PromptTokensDetailsWrapper(audio_tokens=None, cached_tokens=0, text_tokens=None, image_tokens=None), cache_creation_input_tokens=0, cache_read_input_tokens=0)

c.print_hist()

{'role': 'user', 'content': 'What is 5 + 7? Use the tool to calculate it.'}

Message(content='I will calculate these for you!', role='assistant', tool_calls=[{'index': 1, 'function': {'arguments': '{"a": 5, "b": 7}', 'name': 'simple_add'}, 'id': 'toolu_XBetF5gIRHYH7LKBKxJsllLOD', 'type': 'function'}, {'index': 1, 'function': {'arguments': '{"a": 6, "b": 7}', 'name': 'simple_add'}, 'id': 'toolu_fU25035HyRrY03K6JBO94XfLE', 'type': 'function'}], function_call=None, provider_specific_fields=None)

{'tool_call_id': 'toolu_XBetF5gIRHYH7LKBKxJsllLOD', 'role': 'tool', 'name': 'simple_add', 'content': '12'}

{'tool_call_id': 'toolu_fU25035HyRrY03K6JBO94XfLE', 'role': 'tool', 'name': 'simple_add', 'content': '13'}

Message(content='5 + 7 = **12**', role='assistant', tool_calls=None, function_call=None, provider_specific_fields={'citations': None, 'thinking_blocks': None})

chat = Chat(ms[1], tools=[simple_add])
res = chat("What's 5 + 3? Use the `simple_add` tool.")
res

The result of 5 + 3 is 8.

id: chatcmpl-xxx
model: claude-sonnet-4-5-20250929
finish_reason: stop
usage: Usage(completion_tokens=18, prompt_tokens=742, total_tokens=760, completion_tokens_details=None, prompt_tokens_details=PromptTokensDetailsWrapper(audio_tokens=None, cached_tokens=0, text_tokens=None, image_tokens=None), cache_creation_input_tokens=0, cache_read_input_tokens=0)

res = chat("Now, tell me a joke based on that result.")
res

Here’s a joke based on the number 8:

Why was 6 afraid of 7?

Because 7 8 (ate) 9!

But since we got 8 as our answer, here’s another one:

What do you call an 8 that’s been working out?

An “ate” with great figure! 💪

(Get it? Because 8 already has a great figure with those curves!)

id: chatcmpl-xxx
model: claude-sonnet-4-5-20250929
finish_reason: stop
usage: Usage(completion_tokens=100, prompt_tokens=774, total_tokens=874, completion_tokens_details=None, prompt_tokens_details=PromptTokensDetailsWrapper(audio_tokens=None, cached_tokens=0, text_tokens=None, image_tokens=None), cache_creation_input_tokens=0, cache_read_input_tokens=0)

chat.hist

[{'role': 'user', 'content': "What's 5 + 3? Use the `simple_add` tool."},
 Message(content=None, role='assistant', tool_calls=[{'index': 0, 'function': {'arguments': '{"a": 5, "b": 3}', 'name': 'simple_add'}, 'id': 'toolu_016dgFwdeaQXSwLPnJzufcWq', 'type': 'function'}], function_call=None, provider_specific_fields={'citations': None, 'thinking_blocks': None}),
 {'tool_call_id': 'toolu_016dgFwdeaQXSwLPnJzufcWq',
  'role': 'tool',
  'name': 'simple_add',
  'content': '8'},
 {'role': 'assistant',
  'content': 'You have no more tool uses. Please summarize your findings. If you did not complete your goal please tell the user what further work needs to be done so they can choose how best to proceed.'},
 Message(content='\n\nThe result of 5 + 3 is **8**.', role='assistant', tool_calls=None, function_call=None, provider_specific_fields={'citations': None, 'thinking_blocks': None}),
 {'role': 'user', 'content': 'Now, tell me a joke based on that result.'},
 Message(content='Here\'s a joke based on the number 8:\n\nWhy was 6 afraid of 7?\n\nBecause 7 8 (ate) 9!\n\nBut since we got 8 as our answer, here\'s another one:\n\nWhat do you call an 8 that\'s been working out?\n\nAn "ate" with great figure! 💪\n\n(Get it? Because 8 already has a great figure with those curves!)', role='assistant', tool_calls=None, function_call=None, provider_specific_fields={'citations': None, 'thinking_blocks': None})]

Images

chat = Chat(ms[1])
chat(['Whats in this img?',img_fn.read_bytes()])

Image Description

This adorable image shows a Cavalier King Charles Spaniel puppy with the classic Blenheim coloring (chestnut and white markings).

Key features visible:

Puppy with expressive brown eyes looking directly at the camera
Soft, fluffy coat with rich brown/chestnut patches on the ears and around the eyes
White blaze down the center of the face
Lying on grass in what appears to be a garden setting
Purple flowers (possibly asters) visible in the background
The puppy has a sweet, gentle expression typical of the breed

The photo has a warm, professional quality with nice depth of field that keeps the focus on the puppy’s endearing face while softly blurring the floral background.

id: chatcmpl-xxx
model: claude-sonnet-4-5-20250929
finish_reason: stop
usage: Usage(completion_tokens=188, prompt_tokens=105, total_tokens=293, completion_tokens_details=None, prompt_tokens_details=PromptTokensDetailsWrapper(audio_tokens=None, cached_tokens=0, text_tokens=None, image_tokens=None), cache_creation_input_tokens=0, cache_read_input_tokens=0)

Prefill

Prefill works as expected:

chat = Chat(ms[1])
chat("Spell my name",prefill="Your name is R E")

Your name is R E D A C T E D

I don’t actually know your name - you haven’t told me what it is yet! If you’d like me to spell your name, please let me know what it is first.

id: chatcmpl-xxx
model: claude-sonnet-4-5-20250929
finish_reason: stop
usage: Usage(completion_tokens=47, prompt_tokens=16, total_tokens=63, completion_tokens_details=None, prompt_tokens_details=PromptTokensDetailsWrapper(audio_tokens=None, cached_tokens=0, text_tokens=None, image_tokens=None), cache_creation_input_tokens=0, cache_read_input_tokens=0)

And the entire message is stored in the history, not just the generated part:

chat.hist[-1]

Message(content="Your name is R E D A C T E D\n\nI don't actually know your name - you haven't told me what it is yet! If you'd like me to spell your name, please let me know what it is first.", role='assistant', tool_calls=None, function_call=None, provider_specific_fields={'citations': None, 'thinking_blocks': None})

Streaming

from time import sleep

chat = Chat(model)
stream_gen = chat("Count to 5", stream=True)
for chunk in stream_gen:
    if isinstance(chunk, ModelResponse): display(chunk)
    else: print(delta_text(chunk) or '',end='')

1, 2, 3, 4, 5

1, 2, 3, 4, 5

id: chatcmpl-xxx
model: claude-sonnet-4-5
finish_reason: stop
usage: Usage(completion_tokens=17, prompt_tokens=11, total_tokens=28, completion_tokens_details=CompletionTokensDetailsWrapper(accepted_prediction_tokens=None, audio_tokens=None, reasoning_tokens=0, rejected_prediction_tokens=None, text_tokens=None), prompt_tokens_details=None)

Lets try prefill with streaming too:

stream_gen = chat("Continue counting to 10","Okay! 6, 7",stream=True)
for chunk in stream_gen:
    if isinstance(chunk, ModelResponse): display(chunk)
    else: print(delta_text(chunk) or '',end='')

Okay! 6, 7, 8, 9, 10

Okay! 6, 7, 8, 9, 10

id: chatcmpl-xxx
model: claude-sonnet-4-5
finish_reason: stop
usage: Usage(completion_tokens=12, prompt_tokens=44, total_tokens=56, completion_tokens_details=CompletionTokensDetailsWrapper(accepted_prediction_tokens=None, audio_tokens=None, reasoning_tokens=0, rejected_prediction_tokens=None, text_tokens=None), prompt_tokens_details=None)

Tool use

Ok now lets test tool use

for m in ms:
    display(Markdown(f'**{m}:**'))
    chat = Chat(m, tools=[simple_add])
    res = chat("What's 5 + 3? Use the `simple_add` tool. Explain.")
    display(res)

gemini/gemini-2.5-flash:

I used the simple_add tool with a=5 and b=3. The tool returned 8.

Therefore, 5 + 3 = 8.

id: chatcmpl-xxx
model: gemini-2.5-flash
finish_reason: stop
usage: Usage(completion_tokens=118, prompt_tokens=159, total_tokens=277, completion_tokens_details=CompletionTokensDetailsWrapper(accepted_prediction_tokens=None, audio_tokens=None, reasoning_tokens=79, rejected_prediction_tokens=None, text_tokens=39), prompt_tokens_details=PromptTokensDetailsWrapper(audio_tokens=None, cached_tokens=None, text_tokens=159, image_tokens=None))

claude-sonnet-4-5:

Result: 5 + 3 = 8

Explanation: The simple_add function takes two parameters: - a (first operand): I provided 5 - b (second operand): I provided 3

The function added these two numbers together and returned 8, which is the correct sum of 5 and 3.

id: chatcmpl-xxx
model: claude-sonnet-4-5-20250929
finish_reason: stop
usage: Usage(completion_tokens=89, prompt_tokens=764, total_tokens=853, completion_tokens_details=None, prompt_tokens_details=PromptTokensDetailsWrapper(audio_tokens=None, cached_tokens=0, text_tokens=None, image_tokens=None), cache_creation_input_tokens=0, cache_read_input_tokens=0)

openai/gpt-4.1:

The result of 5 + 3 is 8.

Explanation: I used the simple_add tool, which takes two numbers and adds them together. By inputting 5 and 3, the tool calculated the sum as 8.

id: chatcmpl-xxx
model: gpt-4.1-2025-04-14
finish_reason: stop
usage: Usage(completion_tokens=48, prompt_tokens=155, total_tokens=203, completion_tokens_details=CompletionTokensDetailsWrapper(accepted_prediction_tokens=0, audio_tokens=0, reasoning_tokens=0, rejected_prediction_tokens=0, text_tokens=None), prompt_tokens_details=PromptTokensDetailsWrapper(audio_tokens=0, cached_tokens=0, text_tokens=None, image_tokens=None))

Thinking w tool use

chat = Chat(model, tools=[simple_add])
res = chat("What's 5 + 3?",think='l',return_all=True)
display(*res)

🔧 simple_add({“a”: 5, “b”: 3})

id: chatcmpl-xxx
model: claude-sonnet-4-5-20250929
finish_reason: tool_calls
usage: Usage(completion_tokens=125, prompt_tokens=638, total_tokens=763, completion_tokens_details=CompletionTokensDetailsWrapper(accepted_prediction_tokens=None, audio_tokens=None, reasoning_tokens=43, rejected_prediction_tokens=None, text_tokens=None), prompt_tokens_details=PromptTokensDetailsWrapper(audio_tokens=None, cached_tokens=0, text_tokens=None, image_tokens=None), cache_creation_input_tokens=0, cache_read_input_tokens=0)

{'tool_call_id': 'toolu_01SY1R38L37vhWpgNgQz2B5h',
 'role': 'tool',
 'name': 'simple_add',
 'content': '8'}

5 + 3 = 8

id: chatcmpl-xxx
model: claude-sonnet-4-5-20250929
finish_reason: stop
usage: Usage(completion_tokens=14, prompt_tokens=816, total_tokens=830, completion_tokens_details=None, prompt_tokens_details=PromptTokensDetailsWrapper(audio_tokens=None, cached_tokens=0, text_tokens=None, image_tokens=None), cache_creation_input_tokens=0, cache_read_input_tokens=0)

Search

chat = Chat(model)
res = chat("Search the web and tell me very briefly about otters", search='l', stream=True)
for o in res:
    if isinstance(o, ModelResponse): sleep(0.01); display(o)
    else: print(delta_text(o) or '',end='')

Otters are charismatic members of the weasel family found on every continent except Australia and Antarctica. There are 13 species in total, including sea otters and river otters.

These aquatic mammals have elongated bodies, long tails, and soft, dense fur. In fact, otters have the densest fur of any animal—as many as a million hairs per square inch. Webbed feet and powerful tails make otters strong swimmers.

All otters are expert hunters that eat fish, crustaceans, and other critters. Sea otters float on their backs, place a rock on their chest, then smash mollusks down on it until it breaks open. They're also known for being playful animals, engaging in activities like sliding into water on natural slides.

Otters are * charismatic members of the weasel family found on every continent except Australia and Antarctica. * There are 13 species in total, including sea otters and river otters.

These aquatic mammals have * elongated bodies, long tails, and soft, dense fur. In fact, * otters have the densest fur of any animal—as many as a million hairs per square inch. * Webbed feet and powerful tails make otters strong swimmers.

* All otters are expert hunters that eat fish, crustaceans, and other critters. * Sea otters float on their backs, place a rock on their chest, then smash mollusks down on it until it breaks open. They’re also known for being * playful animals, engaging in activities like sliding into water on natural slides.

id: chatcmpl-xxx
model: claude-sonnet-4-5
finish_reason: stop
usage: Usage(completion_tokens=362, prompt_tokens=15055, total_tokens=15417, completion_tokens_details=CompletionTokensDetailsWrapper(accepted_prediction_tokens=None, audio_tokens=None, reasoning_tokens=0, rejected_prediction_tokens=None, text_tokens=None), prompt_tokens_details=None)

Multi tool calling

We can let the model call multiple tools in sequence using the max_steps parameter.

chat = Chat(model, tools=[simple_add])
res = chat("What's ((5 + 3)+7)+11? Work step by step", return_all=True, max_steps=5)
for r in res: display(r)

I’ll solve this step by step using the addition function.

Step 1: First, let me calculate 5 + 3

🔧 simple_add({“a”: 5, “b”: 3})

id: chatcmpl-xxx
model: claude-sonnet-4-5-20250929
finish_reason: tool_calls
usage: Usage(completion_tokens=100, prompt_tokens=617, total_tokens=717, completion_tokens_details=None, prompt_tokens_details=PromptTokensDetailsWrapper(audio_tokens=None, cached_tokens=0, text_tokens=None, image_tokens=None), cache_creation_input_tokens=0, cache_read_input_tokens=0)

{'tool_call_id': 'toolu_01SykhkA2BGKXm9J56KCkz2B',
 'role': 'tool',
 'name': 'simple_add',
 'content': '8'}

Step 2: Now I’ll add 7 to that result (8 + 7)

🔧 simple_add({“a”: 8, “b”: 7})

id: chatcmpl-xxx
model: claude-sonnet-4-5-20250929
finish_reason: tool_calls
usage: Usage(completion_tokens=93, prompt_tokens=730, total_tokens=823, completion_tokens_details=None, prompt_tokens_details=PromptTokensDetailsWrapper(audio_tokens=None, cached_tokens=0, text_tokens=None, image_tokens=None), cache_creation_input_tokens=0, cache_read_input_tokens=0)

{'tool_call_id': 'toolu_013LrGqASqf9Bsk38scV5Pu7',
 'role': 'tool',
 'name': 'simple_add',
 'content': '15'}

Step 3: Finally, I’ll add 11 to that result (15 + 11)

🔧 simple_add({“a”: 15, “b”: 11})

id: chatcmpl-xxx
model: claude-sonnet-4-5-20250929
finish_reason: tool_calls
usage: Usage(completion_tokens=94, prompt_tokens=836, total_tokens=930, completion_tokens_details=None, prompt_tokens_details=PromptTokensDetailsWrapper(audio_tokens=None, cached_tokens=0, text_tokens=None, image_tokens=None), cache_creation_input_tokens=0, cache_read_input_tokens=0)

{'tool_call_id': 'toolu_01RtpzYFxji9ZbQJtTjKwaCi',
 'role': 'tool',
 'name': 'simple_add',
 'content': '26'}

Answer: ((5 + 3) + 7) + 11 = 26

Here’s the breakdown: - 5 + 3 = 8 - 8 + 7 = 15 - 15 + 11 = 26

id: chatcmpl-xxx
model: claude-sonnet-4-5-20250929
finish_reason: stop
usage: Usage(completion_tokens=67, prompt_tokens=943, total_tokens=1010, completion_tokens_details=None, prompt_tokens_details=PromptTokensDetailsWrapper(audio_tokens=None, cached_tokens=0, text_tokens=None, image_tokens=None), cache_creation_input_tokens=0, cache_read_input_tokens=0)

Some models support parallel tool calling. I.e. sending multiple tool call requests in one conversation step.

def multiply(a: int, b: int) -> int:
    "Multiply two numbers"
    return a * b

chat = Chat('openai/gpt-4.1', tools=[simple_add, multiply])
res = chat("Calculate (5 + 3) * (7 + 2)", max_steps=5, return_all=True)
for r in res: display(r)

🔧 simple_add({“a”: 5, “b”: 3})

🔧 simple_add({“a”: 7, “b”: 2})

id: chatcmpl-xxx
model: gpt-4.1-2025-04-14
finish_reason: tool_calls
usage: Usage(completion_tokens=52, prompt_tokens=110, total_tokens=162, completion_tokens_details=CompletionTokensDetailsWrapper(accepted_prediction_tokens=0, audio_tokens=0, reasoning_tokens=0, rejected_prediction_tokens=0, text_tokens=None), prompt_tokens_details=PromptTokensDetailsWrapper(audio_tokens=0, cached_tokens=0, text_tokens=None, image_tokens=None))

{'tool_call_id': 'call_qJXSxYvc2ZVHmyIxqQ9OocWM',
 'role': 'tool',
 'name': 'simple_add',
 'content': '8'}

{'tool_call_id': 'call_hCgeAPtd0RhmeADBRWRvY0sG',
 'role': 'tool',
 'name': 'simple_add',
 'content': '9'}

🔧 multiply({“a”:8,“b”:9})

id: chatcmpl-xxx
model: gpt-4.1-2025-04-14
finish_reason: tool_calls
usage: Usage(completion_tokens=17, prompt_tokens=178, total_tokens=195, completion_tokens_details=CompletionTokensDetailsWrapper(accepted_prediction_tokens=0, audio_tokens=0, reasoning_tokens=0, rejected_prediction_tokens=0, text_tokens=None), prompt_tokens_details=PromptTokensDetailsWrapper(audio_tokens=0, cached_tokens=0, text_tokens=None, image_tokens=None))

{'tool_call_id': 'call_1nwxhn7RXLNl9FcsS8pfn6OZ',
 'role': 'tool',
 'name': 'multiply',
 'content': '72'}

(5 + 3) = 8 and (7 + 2) = 9. Multiplying them gives: 8 × 9 = 72.

So, (5 + 3) × (7 + 2) = 72.

id: chatcmpl-xxx
model: gpt-4.1-2025-04-14
finish_reason: stop
usage: Usage(completion_tokens=55, prompt_tokens=203, total_tokens=258, completion_tokens_details=CompletionTokensDetailsWrapper(accepted_prediction_tokens=0, audio_tokens=0, reasoning_tokens=0, rejected_prediction_tokens=0, text_tokens=None), prompt_tokens_details=PromptTokensDetailsWrapper(audio_tokens=0, cached_tokens=0, text_tokens=None, image_tokens=None))

See it did the additions in one go!

We don’t want the model to keep running tools indefinitely. Lets showcase how we can force thee model to stop after our specified number of toolcall rounds:

def divide(a: int, b: int) -> float:
    "Divide two numbers"
    return a / b

chat = Chat(model, tools=[simple_add, multiply, divide])
res = chat("Calculate ((10 + 5) * 3) / (2 + 1) step by step.", 
           max_steps=3, return_all=True,
           final_prompt="Please wrap-up for now and summarize how far we got.")
for r in res: display(r)

I’ll calculate this step by step, following the order of operations.

Step 1: Calculate the inner parentheses first - (10 + 5) = ? - (2 + 1) = ?

🔧 simple_add({“a”: 10, “b”: 5})

🔧 simple_add({“a”: 2, “b”: 1})

id: chatcmpl-xxx
model: claude-sonnet-4-5-20250929
finish_reason: tool_calls
usage: Usage(completion_tokens=173, prompt_tokens=792, total_tokens=965, completion_tokens_details=None, prompt_tokens_details=PromptTokensDetailsWrapper(audio_tokens=None, cached_tokens=0, text_tokens=None, image_tokens=None), cache_creation_input_tokens=0, cache_read_input_tokens=0)

{'tool_call_id': 'toolu_01NZjJc2q4tMJZcS93T1WQHM',
 'role': 'tool',
 'name': 'simple_add',
 'content': '15'}

{'tool_call_id': 'toolu_013qQVARNY8a6shg4zo2TpNr',
 'role': 'tool',
 'name': 'simple_add',
 'content': '3'}

Step 2: Multiply 15 * 3

🔧 multiply({“a”: 15, “b”: 3})

id: chatcmpl-xxx
model: claude-sonnet-4-5-20250929
finish_reason: tool_calls
usage: Usage(completion_tokens=82, prompt_tokens=1030, total_tokens=1112, completion_tokens_details=None, prompt_tokens_details=PromptTokensDetailsWrapper(audio_tokens=None, cached_tokens=0, text_tokens=None, image_tokens=None), cache_creation_input_tokens=0, cache_read_input_tokens=0)

{'tool_call_id': 'toolu_01Uf17eEfZPHcqFo1C3PYZ5E',
 'role': 'tool',
 'name': 'multiply',
 'content': '45'}

Step 3: Divide 45 / 3

id: chatcmpl-xxx
model: claude-sonnet-4-5-20250929
finish_reason: stop
usage: Usage(completion_tokens=23, prompt_tokens=1139, total_tokens=1162, completion_tokens_details=None, prompt_tokens_details=PromptTokensDetailsWrapper(audio_tokens=None, cached_tokens=0, text_tokens=None, image_tokens=None), cache_creation_input_tokens=0, cache_read_input_tokens=0)

Tool call exhaustion

pr = "What is 1+2, and then the result of adding +2, and then +3 to it? Use tools to calculate!"
c = Chat(model, tools=[simple_add])

res = c(pr, max_steps=2)
res

Let me continue with the next calculation. Now I’ll add 2 to the result (3+2):

id: chatcmpl-xxx
model: claude-sonnet-4-5-20250929
finish_reason: stop
usage: Usage(completion_tokens=33, prompt_tokens=777, total_tokens=810, completion_tokens_details=None, prompt_tokens_details=PromptTokensDetailsWrapper(audio_tokens=None, cached_tokens=0, text_tokens=None, image_tokens=None), cache_creation_input_tokens=0, cache_read_input_tokens=0)

assert c.hist[-2]['content'] == _final_prompt

Async

AsyncChat

If you want to use LiteLLM in a webapp you probably want to use their async function acompletion. To make that easier we will implement our version of AsyncChat to complement it. It follows the same implementation as Chat as much as possible:

source

astream_with_complete

 astream_with_complete (agen, postproc=<function noop>)

source

AsyncChat

 AsyncChat (model:str, sp='', temp=0, search=False, tools:list=None,
            hist:list=None, ns:Optional[dict]=None, cache=False,
            cache_idxs:list=[-1], ttl=None)

LiteLLM chat client.

	Type	Default	Details
model	str		LiteLLM compatible model name
sp	str		System prompt
temp	int	0	Temperature
search	bool	False	Search (l,m,h), if model supports it
tools	list	None	Add tools
hist	list	None	Chat history
ns	Optional	None	Custom namespace for tool calling
cache	bool	False	Anthropic prompt caching
cache_idxs	list	[-1]	Anthropic cache breakpoint idxs, use `0` for sys prompt if provided
ttl	NoneType	None	Anthropic prompt caching ttl

Examples

Basic example

chat = AsyncChat(model)
await chat("What is 2+2?")

2+2 = 4

id: chatcmpl-xxx
model: claude-sonnet-4-5-20250929
finish_reason: stop
usage: Usage(completion_tokens=11, prompt_tokens=14, total_tokens=25, completion_tokens_details=None, prompt_tokens_details=PromptTokensDetailsWrapper(audio_tokens=None, cached_tokens=0, text_tokens=None, image_tokens=None), cache_creation_input_tokens=0, cache_read_input_tokens=0)

With tool calls

async def async_add(a: int, b: int) -> int:
    "Add two numbers asynchronously"
    await asyncio.sleep(0.1)
    return a + b

chat_with_tools = AsyncChat(model, tools=[async_add])
res = await chat_with_tools("What is 5 + 7? Use the tool to calculate it.", return_all=True)
async for r in res: display(r)

🔧 async_add({“a”: 5, “b”: 7})

id: chatcmpl-xxx
model: claude-sonnet-4-5-20250929
finish_reason: tool_calls
usage: Usage(completion_tokens=70, prompt_tokens=607, total_tokens=677, completion_tokens_details=None, prompt_tokens_details=PromptTokensDetailsWrapper(audio_tokens=None, cached_tokens=0, text_tokens=None, image_tokens=None), cache_creation_input_tokens=0, cache_read_input_tokens=0)

{'tool_call_id': 'toolu_01NHDNkcpwxW66XRuRFChLxe',
 'role': 'tool',
 'name': 'async_add',
 'content': '12'}

The result of 5 + 7 is 12.

id: chatcmpl-xxx
model: claude-sonnet-4-5-20250929
finish_reason: stop
usage: Usage(completion_tokens=18, prompt_tokens=731, total_tokens=749, completion_tokens_details=None, prompt_tokens_details=PromptTokensDetailsWrapper(audio_tokens=None, cached_tokens=0, text_tokens=None, image_tokens=None), cache_creation_input_tokens=0, cache_read_input_tokens=0)

chat.hist

[{'role': 'user', 'content': 'What is 2+2?'},
 Message(content='2+2 = 4', role='assistant', tool_calls=None, function_call=None, provider_specific_fields={'citations': None, 'thinking_blocks': None})]

Async Streaming Display

This is what our outputs look like with streaming results:

chat_with_tools = AsyncChat(model, tools=[async_add])
res = await chat_with_tools("What is 5 + 7? Use the tool to calculate it.", stream=True)
async for o in res:
    if isinstance(o,ModelResponseStream): print(delta_text(o) or '',end='')
    elif isinstance(o,dict): print(o)


🔧 async_add
{'tool_call_id': 'toolu_011RxwEK3HSc3VQwwsBZnXnV', 'role': 'tool', 'name': 'async_add', 'content': '12'}


The result of 5 + 7 is **12**.

We use this one quite a bit so we want to provide some utilities to better format these outputs:

Here’s a complete ModelResponse taken from the response stream:

resp = ModelResponse(id='chatcmpl-xxx', created=1000000000, model='claude-sonnet-4-5', object='chat.completion', system_fingerprint=None, choices=[Choices(finish_reason='tool_calls', index=0, message=Message(content="I'll calculate ((10 + 5) * 3) / (2 + 1) step by step:", role='assistant', tool_calls=[ChatCompletionMessageToolCall(function=Function(arguments='{"a": 10, "b": 5}', name='simple_add'), id='toolu_018BGyenjiRkDQFU1jWP6qRo', type='function'), ChatCompletionMessageToolCall(function=Function(arguments='{"a": 2, "b": 1}', name='simple_add'), id='toolu_01CWqrNQvoRjf1Q1GLpTUgQR', type='function')], function_call=None, provider_specific_fields=None))], usage=Usage(completion_tokens=228, prompt_tokens=794, total_tokens=1022, prompt_tokens_details=None))
print(repr(resp))

ModelResponse(id='chatcmpl-xxx', created=1000000000, model='claude-sonnet-4-5', object='chat.completion', system_fingerprint=None, choices=[Choices(finish_reason='tool_calls', index=0, message=Message(content="I'll calculate ((10 + 5) * 3) / (2 + 1) step by step:", role='assistant', tool_calls=[ChatCompletionMessageToolCall(function=Function(arguments='{"a": 10, "b": 5}', name='simple_add'), id='toolu_018BGyenjiRkDQFU1jWP6qRo', type='function'), ChatCompletionMessageToolCall(function=Function(arguments='{"a": 2, "b": 1}', name='simple_add'), id='toolu_01CWqrNQvoRjf1Q1GLpTUgQR', type='function')], function_call=None, provider_specific_fields=None))], usage=Usage(completion_tokens=228, prompt_tokens=794, total_tokens=1022, completion_tokens_details=None, prompt_tokens_details=None))

tc=resp.choices[0].message.tool_calls[0]
tc

ChatCompletionMessageToolCall(function=Function(arguments='{"a": 10, "b": 5}', name='simple_add'), id='toolu_018BGyenjiRkDQFU1jWP6qRo', type='function')

tr={'tool_call_id': 'toolu_018BGyenjiRkDQFU1jWP6qRo', 'role': 'tool','name': 'simple_add',
    'content': '15 is the answerrrr' +'r'*2000}

source

mk_tr_details

 mk_tr_details (tr, tc, mx=2000)

*Create

block for tool call as JSON*

mk_tr_details(tr,tc)

'\n\n<details class=\'tool-usage-details\'>\n\n```json\n{\n  "id": "toolu_018BGyenjiRkDQFU1jWP6qRo",\n  "call": {\n    "function": "simple_add",\n    "arguments": {\n      "a": "10",\n      "b": "5"\n    }\n  },\n  "result": "15 is the answerrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrr<TRUNCATED>"\n}\n```\n\n</details>\n\n'

source

AsyncStreamFormatter

 AsyncStreamFormatter (include_usage=False, mx=2000)

Initialize self. See help(type(self)) for accurate signature.

stream_msg = ModelResponseStream([StreamingChoices(delta=Delta(content="Hello world!"))])
print(repr(AsyncStreamFormatter().format_item(stream_msg)))

'Hello world!'

reasoning_msg = ModelResponseStream([StreamingChoices(delta=Delta(reasoning_content="thinking..."))])
print(repr(AsyncStreamFormatter().format_item(reasoning_msg)))

'🧠'

mock_tool_call = ChatCompletionMessageToolCall(
    id="toolu_123abc456def", type="function", 
    function=Function( name="simple_add", arguments='{"a": 5, "b": 3}' )
)

mock_response = ModelResponse()
mock_response.choices = [type('Choice', (), {
    'message': type('Message', (), {
        'tool_calls': [mock_tool_call]
    })()
})()]

mock_tool_result = {
    'tool_call_id': 'toolu_123abc456def', 'role': 'tool', 
    'name': 'simple_add', 'content': '8'
}

fmt = AsyncStreamFormatter()
fmt.format_item(mock_response)
print(fmt.format_item(mock_tool_result))



<details class='tool-usage-details'>

```json
{
  "id": "toolu_123abc456def",
  "call": {
    "function": "simple_add",
    "arguments": {
      "a": "5",
      "b": "3"
    }
  },
  "result": "8"
}
```

</details>

In jupyter it’s nice to use this AsyncStreamFormatter in combination with the Markdown display:

source

adisplay_stream

 adisplay_stream (rs)

Use IPython.display to markdown display the response stream.

Streaming examples

Now we can demonstrate AsyncChat with stream=True!

Tool call

chat = AsyncChat(model, tools=[async_add])
res = await chat("What is 5 + 7? Use the tool to calculate it.", stream=True)
fmt = await adisplay_stream(res)

{
  "id": "toolu_011RxwEK3HSc3VQwwsBZnXnV",
  "call": {
    "function": "async_add",
    "arguments": {
      "a": "5",
      "b": "7"
    }
  },
  "result": "12"
}

The result of 5 + 7 is 12.

Thinking tool call

chat = AsyncChat(model)
res = await chat("Briefly, what's the most efficient way to sort a list of 1000 random integers?",
                 think='l',stream=True)
_ = await adisplay_stream(res)

🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠

Use your language’s built-in sort

For 1000 random integers, use your language’s built-in sort function (e.g., Python’s sorted(), Java’s Arrays.sort(), C++’s std::sort()).

These implementations use highly optimized algorithms like: - Timsort (Python/Java) - Introsort (C++) - Dual-pivot Quicksort (Java primitives)

All are O(n log n) and will outperform hand-coded solutions for this dataset size.

If implementing yourself: Use Quicksort or Mergesort — both O(n log n) average case and efficient for this size.

Multiple tool calls

chat.hist[1]

chat.hist[2]

chat.hist[3]

chat.hist[4]

chat.hist[5]

Now to demonstrate that we can load back the formatted output back into a new Chat object:

chat5 = Chat(model,hist=fmt2hist(fmt.outp),tools=[simple_add, multiply, divide])
chat5('what did we just do?')

Search

chat_stream_tools = AsyncChat(model, search='l')
res = await chat_stream_tools("Search the web and tell me very briefly about otters", stream=True)
_=await adisplay_stream(res)

Caching

a,b = random.randint(0,100), random.randint(0,100)
hist = [[f"What is {a}+{b}?\n" * 200], f"It's {a+b}", ['hi'], "Hello"]

chat = AsyncChat(model, cache=True, hist=hist)
rs = await chat('hi again', stream=True, stream_options={"include_usage": True})

async for o in rs: 
    if isinstance(o, ModelResponse): print(o.usage)

In this first api call we will see cache creation until the last user msg:

cache_read_toks = o.usage.cache_creation_input_tokens
test_eq(cache_read_toks > 1000, True)
test_eq(o.usage.cache_read_input_tokens, 0)

hist.extend([['hi again'], 'how may i help you?'])
chat = AsyncChat(model, cache=True, hist=hist)
rs = await chat('bye!', stream=True, stream_options={"include_usage": True})

async for o in rs:
    if isinstance(o, ModelResponse): print(o.usage)

The subsequent call should re-use the existing cache:

test_eq(o.usage.cache_read_input_tokens, cache_read_toks)

import nbdev; nbdev.nbdev_export()