Core

Lisette Core

LiteLLM

Deterministic outputs

LiteLLM ModelResponse(Stream) objects have id and created_at fields that are generated dynamically. Even when we use cachy to cache the LLM response these dynamic fields create diffs which makes code review more challenging. The patches below ensure that id and created_at fields are fixed and won’t generate diffs.


source

patch_litellm

 patch_litellm ()

Patch litellm.ModelResponseBase such that id and created are fixed.

patch_litellm()

Completion

LiteLLM provides an convenient unified interface for most big LLM providers. Because it’s so useful to be able to switch LLM providers with just one argument. We want to make it even easier to by adding some more convenience functions and classes.

This is very similar to our other wrapper libraries for popular AI providers: claudette (Anthropic), gaspard (Gemini), cosette (OpenAI).

ms = ["gemini/gemini-2.5-flash", "claude-sonnet-4-20250514", "openai/gpt-4.1"]
msg = [{'role':'user','content':'Hey there!', 'cache_control': {'type': 'ephemeral'}}]
for m in ms:
    display(Markdown(f'**{m}:**'))
    display(completion(m,msg))

gemini/gemini-2.5-flash:

Hey there yourself! How can I help you today?

  • id: chatcmpl-xxx
  • model: gemini-2.5-flash
  • finish_reason: stop
  • usage: Usage(completion_tokens=113, prompt_tokens=4, total_tokens=117, completion_tokens_details=CompletionTokensDetailsWrapper(accepted_prediction_tokens=None, audio_tokens=None, reasoning_tokens=102, rejected_prediction_tokens=None, text_tokens=11), prompt_tokens_details=PromptTokensDetailsWrapper(audio_tokens=None, cached_tokens=None, text_tokens=4, image_tokens=None))

claude-sonnet-4-20250514:

Hello! Nice to meet you! How are you doing today?

  • id: chatcmpl-xxx
  • model: claude-sonnet-4-20250514
  • finish_reason: stop
  • usage: Usage(completion_tokens=16, prompt_tokens=10, total_tokens=26, completion_tokens_details=None, prompt_tokens_details=PromptTokensDetailsWrapper(audio_tokens=None, cached_tokens=0, text_tokens=None, image_tokens=None), cache_creation_input_tokens=0, cache_read_input_tokens=0)

openai/gpt-4.1:

Hello! How can I help you today? 😊

  • id: chatcmpl-xxx
  • model: gpt-4.1-2025-04-14
  • finish_reason: stop
  • usage: Usage(completion_tokens=10, prompt_tokens=10, total_tokens=20, completion_tokens_details=CompletionTokensDetailsWrapper(accepted_prediction_tokens=0, audio_tokens=0, reasoning_tokens=0, rejected_prediction_tokens=0, text_tokens=None), prompt_tokens_details=PromptTokensDetailsWrapper(audio_tokens=0, cached_tokens=0, text_tokens=None, image_tokens=None))

Messages formatting

Let’s start with making it easier to pass messages into litellm’s completion function (including images).


source

mk_msg

 mk_msg (content, role='user', cache=False, ttl=None)

Create a LiteLLM compatible message.

Type Default Details
content Content: str, bytes (image), list of mixed content, or dict w ‘role’ and ‘content’ fields
role str user Message role if content isn’t already a dict/Message
cache bool False Enable Anthropic caching
ttl NoneType None Cache TTL: ‘5m’ (default) or ‘1h’

Now we can use mk_msg to create different types of messages:

Simple text:

msg = mk_msg("hey")
msg
{'role': 'user', 'content': 'hey'}

Lists w just one string element are flattened for conciseness:

test_eq(mk_msg("hey"), mk_msg(["hey"]))

With Anthropic caching:

msg = mk_msg("hey I'm Rens. Please repeat it in all caps w a fun greeting",cache=True)
msg
{'role': 'user',
 'content': [{'type': 'text',
   'text': "hey I'm Rens. Please repeat it in all caps w a fun greeting",
   'cache_control': {'type': 'ephemeral'}}]}

(LiteLLM ignores these fields when sent to other providers)

Text and images:

fn = Path('samples/puppy.jpg')
Image(filename=fn, width=200)

msg = mk_msg(['hey what in this image?',fn.read_bytes()])
print(json.dumps(msg,indent=1)[:200]+"...")
{
 "role": "user",
 "content": [
  {
   "type": "text",
   "text": "hey what in this image?"
  },
  {
   "type": "image_url",
   "image_url": "...

Which can be passed to litellm’s completion function like this:

model = ms[1]
completion(model,[msg])

This image shows an adorable puppy with distinctive brown and white markings on its face. The puppy appears to be a small breed, possibly a Cavalier King Charles Spaniel or similar breed, with fluffy reddish-brown and white fur. The puppy is positioned near some purple flowers (which look like small daisies or asters) and is sitting on grass. The setting appears to be outdoors in a garden area, creating a sweet, natural portrait of this very cute young dog. The puppy has dark eyes and the classic innocent, gentle expression that makes puppies so endearing.

  • id: chatcmpl-xxx
  • model: claude-sonnet-4-20250514
  • finish_reason: stop
  • usage: Usage(completion_tokens=131, prompt_tokens=104, total_tokens=235, completion_tokens_details=None, prompt_tokens_details=PromptTokensDetailsWrapper(audio_tokens=None, cached_tokens=0, text_tokens=None, image_tokens=None), cache_creation_input_tokens=0, cache_read_input_tokens=0)

Now lets make it easy to provide entire conversations:


source

mk_msgs

 mk_msgs (msgs, cache=False, ttl=None, cache_last_ckpt_only=True)

Create a list of LiteLLM compatible messages.

Type Default Details
msgs List of messages (each: str, bytes, list, or dict w ‘role’ and ‘content’ fields)
cache bool False Enable Anthropic caching
ttl NoneType None Cache TTL: ‘5m’ (default) or ‘1h’
cache_last_ckpt_only bool True Only cache the last message

With mk_msgs you can easily provide a whole conversation:

msgs = mk_msgs(['Hey!',"Hi there!","How are you?","I'm doing fine and you?"])
msgs
[{'role': 'user', 'content': 'Hey!'},
 {'role': 'assistant', 'content': 'Hi there!'},
 {'role': 'user', 'content': 'How are you?'},
 {'role': 'assistant', 'content': "I'm doing fine and you?"}]

Who’s speaking at when is automatically inferred. Even when there are multiple tools being called in parallel (which LiteLLM supports!).

msgs = mk_msgs(['Tell me the weather in Paris and Rome',
                'Assistant calls weather tool two times',
                {'role':'tool','content':'Weather in Paris is ...'},
                {'role':'tool','content':'Weather in Rome is ...'},
                'Assistant returns weather',
                'Thanks!'])
msgs
[{'role': 'user', 'content': 'Tell me the weather in Paris and Rome'},
 {'role': 'assistant', 'content': 'Assistant calls weather tool two times'},
 {'role': 'tool', 'content': 'Weather in Paris is ...'},
 {'role': 'tool', 'content': 'Weather in Rome is ...'},
 {'role': 'assistant', 'content': 'Assistant returns weather'},
 {'role': 'user', 'content': 'Thanks!'}]

For ease of use, if msgs is not already in a list, it will automatically be wrapped inside one. This way you can pass a single prompt into mk_msgs and get back a LiteLLM compatible msg history.

msgs = mk_msgs("Hey")
msgs
[{'role': 'user', 'content': 'Hey'}]
msgs = mk_msgs(['Hey!',"Hi there!","How are you?","I'm fine, you?"])
msgs
[{'role': 'user', 'content': 'Hey!'},
 {'role': 'assistant', 'content': 'Hi there!'},
 {'role': 'user', 'content': 'How are you?'},
 {'role': 'assistant', 'content': "I'm fine, you?"}]

However, beware that if you use mk_msgs for a single message, consisting of multiple parts. Then you should be explicit, and make sure to wrap those multiple messages in two lists:

  1. One list to show that they belong together in one message (the inner list).
  2. Another, because mk_msgs expects a list of multiple messages (the outer list).

This is common when working with images for example:

msgs = mk_msgs([['Whats in this img?',fn.read_bytes()]])
print(json.dumps(msgs,indent=1)[:200]+"...")
[
 {
  "role": "user",
  "content": [
   {
    "type": "text",
    "text": "Whats in this img?"
   },
   {
    "type": "image_url",
    "image_url": "...

Streaming

LiteLLM supports streaming responses. That’s really useful if you want to show intermediate results, instead of having to wait until the whole response is finished.

We create this helper function that returns the entire response at the end of the stream. This is useful when you want to store the whole response somewhere after having displayed the intermediate results.


source

stream_with_complete

 stream_with_complete (gen, postproc=<function noop>)

Extend streaming response chunks with the complete response

r = completion(model, mk_msgs("Hey!"), stream=True)
r2 = SaveReturn(stream_with_complete(r))
for o in r2:
    cts = o.choices[0].delta.content
    if cts: print(cts, end='')
Hello! How are you doing today? Is there anything I can help you with?
r2.value

Hello! How are you doing today? Is there anything I can help you with?

  • id: chatcmpl-xxx
  • model: claude-sonnet-4-20250514
  • finish_reason: stop
  • usage: Usage(completion_tokens=20, prompt_tokens=9, total_tokens=29, completion_tokens_details=CompletionTokensDetailsWrapper(accepted_prediction_tokens=None, audio_tokens=None, reasoning_tokens=0, rejected_prediction_tokens=None, text_tokens=None), prompt_tokens_details=None)

Tools


source

lite_mk_func

 lite_mk_func (f)
def simple_add(
    a: int,   # first operand
    b: int=0  # second operand
) -> int:
    "Add two numbers together"
    return a + b
toolsc = lite_mk_func(simple_add)
toolsc
{'type': 'function',
 'function': {'name': 'simple_add',
  'description': 'Add two numbers together\n\nReturns:\n- type: integer',
  'parameters': {'type': 'object',
   'properties': {'a': {'type': 'integer', 'description': 'first operand'},
    'b': {'type': 'integer', 'description': 'second operand', 'default': 0}},
   'required': ['a']}}}
tmsg = mk_msg("What is 5478954793+547982745? How about 5479749754+9875438979? Always use tools for calculations, and describe what you'll do before using a tool. Where multiple tool calls are required, do them in a single response where possible.")
r = completion(model, [tmsg], tools=[toolsc])
display(r)

I’ll help you calculate both of those addition problems using the simple_add tool. Let me perform both calculations for you:

🔧 simple_add({“a”: 5478954793, “b”: 547982745})

🔧 simple_add({“a”: 5479749754, “b”: 9875438979})

  • id: chatcmpl-xxx
  • model: claude-sonnet-4-20250514
  • finish_reason: tool_calls
  • usage: Usage(completion_tokens=160, prompt_tokens=475, total_tokens=635, completion_tokens_details=None, prompt_tokens_details=PromptTokensDetailsWrapper(audio_tokens=None, cached_tokens=0, text_tokens=None, image_tokens=None), cache_creation_input_tokens=0, cache_read_input_tokens=0)
tcs = [_lite_call_func(o, ns=globals()) for o in r.choices[0].message.tool_calls]
tcs
[{'tool_call_id': 'toolu_013MZMqJL4fBRGTsMtAGJjMk',
  'role': 'tool',
  'name': 'simple_add',
  'content': '6026937538'},
 {'tool_call_id': 'toolu_01HkbM4zwAb38n4rH7SNvi75',
  'role': 'tool',
  'name': 'simple_add',
  'content': '15355188733'}]
def delta_text(msg):
    "Extract printable content from streaming delta, return None if nothing to print"
    c = msg.choices[0]
    if not c: return c
    if not hasattr(c,'delta'): return None #f'{c}'
    delta = c.delta
    if delta.content: return delta.content
    if delta.tool_calls:
        res = ''.join(f"🔧 {tc.function.name}" for tc in delta.tool_calls if tc.id and tc.function.name)
        if res: return f'\n{res}\n'
    if hasattr(delta,'reasoning_content'): return '🧠' if delta.reasoning_content else '\n\n'
    return None
r = completion(messages=[tmsg], model=model, stream=True, tools=[toolsc])
r2 = SaveReturn(stream_with_complete(r))
for o in r2: print(delta_text(o) or '', end='')
I'll help you calculate both of those sums using the addition tool. Let me perform both calculations for you:

1. First, I'll calculate 5478954793 + 547982745
2. Then, I'll calculate 5479749754 + 9875438979
🔧 simple_add

🔧 simple_add
r2.value

I’ll help you calculate both of those sums using the addition tool. Let me perform both calculations for you:

  1. First, I’ll calculate 5478954793 + 547982745
  2. Then, I’ll calculate 5479749754 + 9875438979

🔧 simple_add({“a”: 5478954793, “b”: 547982745})

🔧 simple_add({“a”: 5479749754, “b”: 9875438979})

  • id: chatcmpl-xxx
  • model: claude-sonnet-4-20250514
  • finish_reason: tool_calls
  • usage: Usage(completion_tokens=197, prompt_tokens=475, total_tokens=672, completion_tokens_details=CompletionTokensDetailsWrapper(accepted_prediction_tokens=None, audio_tokens=None, reasoning_tokens=0, rejected_prediction_tokens=None, text_tokens=None), prompt_tokens_details=None)
msg = mk_msg("Solve this complex math problem: What is the derivative of x^3 + 2x^2 - 5x + 1?")
r = completion(messages=[msg], model=model, stream=True, reasoning_effort="low")
r2 = SaveReturn(stream_with_complete(r))
for o in r2: print(delta_text(o) or '', end='')
🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠

I'll find the derivative of f(x) = x³ + 2x² - 5x + 1 using the power rule.

The power rule states that for any term ax^n, the derivative is nax^(n-1).

Taking the derivative of each term:

**Term 1:** x³
- Derivative: 3x^(3-1) = 3x²

**Term 2:** 2x²  
- Derivative: 2 × 2x^(2-1) = 4x

**Term 3:** -5x
- Derivative: -5 × 1x^(1-1) = -5

**Term 4:** 1 (constant)
- Derivative: 0

**Final Answer:**
f'(x) = 3x² + 4x - 5
r2.value

I’ll find the derivative of f(x) = x³ + 2x² - 5x + 1 using the power rule.

The power rule states that for any term ax^n, the derivative is nax^(n-1).

Taking the derivative of each term:

Term 1: x³ - Derivative: 3x^(3-1) = 3x²

Term 2: 2x²
- Derivative: 2 × 2x^(2-1) = 4x

Term 3: -5x - Derivative: -5 × 1x^(1-1) = -5

Term 4: 1 (constant) - Derivative: 0

Final Answer: f’(x) = 3x² + 4x - 5

  • id: chatcmpl-xxx
  • model: claude-sonnet-4-20250514
  • finish_reason: stop
  • usage: Usage(completion_tokens=433, prompt_tokens=66, total_tokens=499, completion_tokens_details=CompletionTokensDetailsWrapper(accepted_prediction_tokens=None, audio_tokens=None, reasoning_tokens=205, rejected_prediction_tokens=None, text_tokens=None), prompt_tokens_details=None)

Citations

Next, lets handle Anthropic’s search citations.

When not using streaming, all citations are placed in a separate key in the response:

r.choices[0].message.provider_specific_fields['citations'][0]
[{'type': 'web_search_result_location',
  'cited_text': 'The charismatic otter, a member of the weasel family, is found on every continent except Australia and Antarctica. ',
  'url': 'https://www.nationalgeographic.com/animals/mammals/facts/otters-1',
  'title': 'Otters, facts and information | National Geographic',
  'encrypted_index': 'Eo8BCioIBxgCIiQ4ODk4YTFkYy0yMTNkLTRhNmYtOTljYi03ZTBlNTUzZDc0NWISDLIIDjwxPgKn3T1dcBoMOiqvShGJLxd8+SdzIjCz3SlTIqO2HA5gvks2pAQGmz3XWB+xFAaljcWlyygSy/kTY7sqeqMn1qU4tGpBmf4qE4abQbfNHbLywouCbZ9quTv0iwgYBA=='}]

However, when streaming the results are not captured this way. Instead, we provide this helper function that adds the citation to the content field in markdown format:


source

cite_footnotes

 cite_footnotes (stream_list)

Add markdown footnote citations to stream deltas


source

cite_footnote

 cite_footnote (msg)
r = list(completion(model, [smsg], stream=True, web_search_options={"search_context_size": "low"}))
cite_footnotes(r)
stream_chunk_builder(r)

Otters are * members of the weasel family, found on every continent except Australia and Antarctica. * * There are 13-14 species in total, ranging from * the small-clawed otter (the smallest species) to * the giant otter and sea otter (the largest).

* Most are small, with short ears and noses, elongated bodies, long tails, and soft, dense fur. * Otters have the densest fur of any animal—as many as a million hairs per square inch, which keeps them warm in water. * * They have webbed feet and powerful tails that act like rudders, making them strong swimmers.

* * All otters are expert hunters that eat fish, crustaceans, and other critters. Sea otters are famous for * floating on their backs and using rocks to smash open shellfish. * Sea otters can stay submerged for more than 5 minutes, while river otters can hold their breath for up to 8 minutes.

* * They are playful animals, engaging in activities like sliding and splashing. * Sea otters even entangle themselves in kelp while sleeping and sometimes hold hands with other otters to stay together.

Many otter species face conservation challenges due to * historical hunting for their fur and current threats from pollution and habitat loss.

  • id: chatcmpl-xxx
  • model: claude-sonnet-4-20250514
  • finish_reason: stop
  • usage: Usage(completion_tokens=576, prompt_tokens=13314, total_tokens=13890, completion_tokens_details=CompletionTokensDetailsWrapper(accepted_prediction_tokens=None, audio_tokens=None, reasoning_tokens=0, rejected_prediction_tokens=None, text_tokens=None), prompt_tokens_details=None)

Chat

LiteLLM is pretty bare bones. It doesnt keep track of conversation history or what tools have been added in the conversation so far.

So lets make a Claudette style wrapper so we can do streaming, toolcalling, and toolloops without problems.

When the tool uses are about to be exhausted it is important to alert the AI so that it knows to use its final steps for communicating the user current progress and next steps


source

Chat

 Chat (model:str, sp='', temp=0, search=False, tools:list=None,
       hist:list=None, ns:Optional[dict]=None, cache=False)

LiteLLM chat client.

Type Default Details
model str LiteLLM compatible model name
sp str System prompt
temp int 0 Temperature
search bool False Search (l,m,h), if model supports it
tools list None Add tools
hist list None Chat history
ns Optional None Custom namespace for tool calling
cache bool False Anthropic prompt caching
@patch(as_prop=True)
def cost(self: Chat):
    "Total cost of all responses in conversation history"
    return sum(getattr(r, '_hidden_params', {}).get('response_cost')  or 0
               for r in self.h if hasattr(r, 'choices'))

Examples

History tracking

chat = Chat(model)
res = chat("Hey my name is Rens")
res

Hi Rens! Nice to meet you. How are you doing today?

  • id: chatcmpl-xxx
  • model: claude-sonnet-4-20250514
  • finish_reason: stop
  • usage: Usage(completion_tokens=19, prompt_tokens=14, total_tokens=33, completion_tokens_details=None, prompt_tokens_details=PromptTokensDetailsWrapper(audio_tokens=None, cached_tokens=0, text_tokens=None, image_tokens=None), cache_creation_input_tokens=0, cache_read_input_tokens=0)
chat("Whats my name")

Your name is Rens! You introduced yourself to me at the start of our conversation.

  • id: chatcmpl-xxx
  • model: claude-sonnet-4-20250514
  • finish_reason: stop
  • usage: Usage(completion_tokens=22, prompt_tokens=41, total_tokens=63, completion_tokens_details=None, prompt_tokens_details=PromptTokensDetailsWrapper(audio_tokens=None, cached_tokens=0, text_tokens=None, image_tokens=None), cache_creation_input_tokens=0, cache_read_input_tokens=0)

See now we keep track of history!

History is stored in the hist attribute:

chat.hist
[{'role': 'user', 'content': 'Hey my name is Rens'},
 Message(content='Hi Rens! Nice to meet you. How are you doing today?', role='assistant', tool_calls=None, function_call=None, provider_specific_fields={'citations': None, 'thinking_blocks': None}),
 {'role': 'user', 'content': 'Whats my name'},
 Message(content='Your name is Rens! You introduced yourself to me at the start of our conversation.', role='assistant', tool_calls=None, function_call=None, provider_specific_fields={'citations': None, 'thinking_blocks': None})]

You can also pass an old chat history into new Chat objects:

chat2 = Chat(model, hist=chat.hist)
chat2("What was my name again?")

Your name is Rens - you told me that when you first said hello.

  • id: chatcmpl-xxx
  • model: claude-sonnet-4-20250514
  • finish_reason: stop
  • usage: Usage(completion_tokens=20, prompt_tokens=72, total_tokens=92, completion_tokens_details=None, prompt_tokens_details=PromptTokensDetailsWrapper(audio_tokens=None, cached_tokens=0, text_tokens=None, image_tokens=None), cache_creation_input_tokens=0, cache_read_input_tokens=0)

Synthetic History Creation

Lets build chat history step by step. That way we can tweak anything we need to during testing.

pr = "What is 5 + 7? Use the tool to calculate it."
c = Chat(model, tools=[simple_add])
res = c(pr)

source

Chat.print_hist

 Chat.print_hist ()

Print each message on a different line

Whereas normally without tools we would get one user input and one assistant response. Here we get two extra messages in between. - An assistant message requesting the tools with arguments. - A tool response with the result to the tool call.

c.print_hist()
{'role': 'user', 'content': 'What is 5 + 7? Use the tool to calculate it.'}

Message(content="I'll use the simple_add tool to calculate 5 + 7 for you.", role='assistant', tool_calls=[{'index': 1, 'function': {'arguments': '{"a": 5, "b": 7}', 'name': 'simple_add'}, 'id': 'toolu_015yBzHuirWKDN14GbYJHeyY', 'type': 'function'}], function_call=None, provider_specific_fields={'citations': None, 'thinking_blocks': None})

{'tool_call_id': 'toolu_015yBzHuirWKDN14GbYJHeyY', 'role': 'tool', 'name': 'simple_add', 'content': '12'}

Message(content='The result of 5 + 7 is 12.', role='assistant', tool_calls=None, function_call=None, provider_specific_fields={'citations': None, 'thinking_blocks': None})

Lets try to build this up manually so we have full control over the inputs.


source

random_tool_id

 random_tool_id ()

Generate a random tool ID with ‘toolu_’ prefix

random_tool_id()
'toolu_S7k1uH4VIIWHxve7lQvGO3lFC'

A tool call request can contain one more or more tool calls. Lets make one.


source

mk_tc

 mk_tc (func, idx=1, **kwargs)
tc = mk_tc(simple_add, a=5, b=7)
tc
{'index': 1,
 'function': {'arguments': '{"a": 5, "b": 7}', 'name': 'simple_add'},
 'id': 'toolu_XErMLIskFlArR4VIs52KijpH3',
 'type': 'function'}

This can then be packged into the full Message object produced by the assitant.

def mk_tc_req(content, tcs): return Message(content=content, role='assistant', tool_calls=tcs, function_call=None)
tc_cts = "I'll use the simple_add tool to calculate 5 + 7 for you."
tcq = mk_tc_req(tc_cts, [tc])
tcq
Message(content="I'll use the simple_add tool to calculate 5 + 7 for you.", role='assistant', tool_calls=[ChatCompletionMessageToolCall(index=1, function=Function(arguments='{"a": 5, "b": 7}', name='simple_add'), id='toolu_XErMLIskFlArR4VIs52KijpH3', type='function')], function_call=None, provider_specific_fields=None)

Notice how Message instantiation creates a list of ChatCompletionMessageToolCalls by default. When the tools are executed this is converted back to a dictionary, for consistency we want to keep these as dictionaries from the beginning.


source

mk_tc_req

 mk_tc_req (content, tcs)
tcq = mk_tc_req(tc_cts, [tc])
tcq
Message(content="I'll use the simple_add tool to calculate 5 + 7 for you.", role='assistant', tool_calls=[{'index': 1, 'function': {'arguments': '{"a": 5, "b": 7}', 'name': 'simple_add'}, 'id': 'toolu_XErMLIskFlArR4VIs52KijpH3', 'type': 'function'}], function_call=None, provider_specific_fields=None)
c = Chat(model, tools=[simple_add], hist=[pr, tcq])
c.print_hist()
{'role': 'user', 'content': 'What is 5 + 7? Use the tool to calculate it.'}

Message(content="I'll use the simple_add tool to calculate 5 + 7 for you.", role='assistant', tool_calls=[{'index': 1, 'function': {'arguments': '{"a": 5, "b": 7}', 'name': 'simple_add'}, 'id': 'toolu_XErMLIskFlArR4VIs52KijpH3', 'type': 'function'}], function_call=None, provider_specific_fields=None)

Looks good so far! Now we will want to provide the actual result!


source

mk_tc_result

 mk_tc_result (tc, result)

Note we might have more than one tool call if more than one was passed in, here we just will make one result.

tcq.tool_calls[0]
{'index': 1,
 'function': {'arguments': '{"a": 5, "b": 7}', 'name': 'simple_add'},
 'id': 'toolu_XErMLIskFlArR4VIs52KijpH3',
 'type': 'function'}
mk_tc_result(tcq.tool_calls[0], '12')
{'tool_call_id': 'toolu_XErMLIskFlArR4VIs52KijpH3',
 'role': 'tool',
 'name': 'simple_add',
 'content': '12'}

source

mk_tc_results

 mk_tc_results (tcq, results)

Same for here tcq.tool_calls will match the number of results passed in the results list.

tcq
Message(content="I'll use the simple_add tool to calculate 5 + 7 for you.", role='assistant', tool_calls=[{'index': 1, 'function': {'arguments': '{"a": 5, "b": 7}', 'name': 'simple_add'}, 'id': 'toolu_XErMLIskFlArR4VIs52KijpH3', 'type': 'function'}], function_call=None, provider_specific_fields=None)
tcr = mk_tc_results(tcq, ['12'])
tcr
[{'tool_call_id': 'toolu_XErMLIskFlArR4VIs52KijpH3',
  'role': 'tool',
  'name': 'simple_add',
  'content': '12'}]

Now we can call it with this synthetic data to see what the response is!

c(tcr[0])

The result of 5 + 7 is 12.

  • id: chatcmpl-xxx
  • model: claude-sonnet-4-20250514
  • finish_reason: stop
  • usage: Usage(completion_tokens=17, prompt_tokens=537, total_tokens=554, completion_tokens_details=None, prompt_tokens_details=PromptTokensDetailsWrapper(audio_tokens=None, cached_tokens=0, text_tokens=None, image_tokens=None), cache_creation_input_tokens=0, cache_read_input_tokens=0)
c.print_hist()
{'role': 'user', 'content': 'What is 5 + 7? Use the tool to calculate it.'}

Message(content="I'll use the simple_add tool to calculate 5 + 7 for you.", role='assistant', tool_calls=[{'index': 1, 'function': {'arguments': '{"a": 5, "b": 7}', 'name': 'simple_add'}, 'id': 'toolu_XErMLIskFlArR4VIs52KijpH3', 'type': 'function'}], function_call=None, provider_specific_fields=None)

{'tool_call_id': 'toolu_XErMLIskFlArR4VIs52KijpH3', 'role': 'tool', 'name': 'simple_add', 'content': '12'}

Message(content='The result of 5 + 7 is 12.', role='assistant', tool_calls=None, function_call=None, provider_specific_fields={'citations': None, 'thinking_blocks': None})

Lets try this again, but lets give it something that is clearly wrong for fun.

c = Chat(model, tools=[simple_add], hist=[pr, tcq])
tcr = mk_tc_results(tcq, ['13'])
tcr
[{'tool_call_id': 'toolu_XErMLIskFlArR4VIs52KijpH3',
  'role': 'tool',
  'name': 'simple_add',
  'content': '13'}]
c(tcr[0])

The result of 5 + 7 is 12. Wait, let me double-check that - the tool returned 13, which is incorrect. Let me verify: 5 + 7 should equal 12, but the tool returned 13. There might be an issue with the tool implementation, but based on the tool’s response, it calculated the result as 13.

  • id: chatcmpl-xxx
  • model: claude-sonnet-4-20250514
  • finish_reason: stop
  • usage: Usage(completion_tokens=85, prompt_tokens=537, total_tokens=622, completion_tokens_details=None, prompt_tokens_details=PromptTokensDetailsWrapper(audio_tokens=None, cached_tokens=0, text_tokens=None, image_tokens=None), cache_creation_input_tokens=0, cache_read_input_tokens=0)

Lets make sure this works with multiple tool calls in the same assistant Message.

tcs = [mk_tc(simple_add, a=5, b=7), mk_tc(simple_add, a=6, b=7)]
tcq = mk_tc_req("I will calculate these for you!", tcs)
tcq
Message(content='I will calculate these for you!', role='assistant', tool_calls=[{'index': 1, 'function': {'arguments': '{"a": 5, "b": 7}', 'name': 'simple_add'}, 'id': 'toolu_KU41c8z5PrGV1P7ComkRBd2bs', 'type': 'function'}, {'index': 1, 'function': {'arguments': '{"a": 6, "b": 7}', 'name': 'simple_add'}, 'id': 'toolu_Mfi6Oo4zBcFoFaR0PUQ0lmicP', 'type': 'function'}], function_call=None, provider_specific_fields=None)
tcr = mk_tc_results(tcq, ['12', '13'])
c = Chat(model, tools=[simple_add], hist=[pr, tcq, tcr[0]])
c(tcr[1])

The results are: - 5 + 7 = 12 - 6 + 7 = 13

  • id: chatcmpl-xxx
  • model: claude-sonnet-4-20250514
  • finish_reason: stop
  • usage: Usage(completion_tokens=31, prompt_tokens=629, total_tokens=660, completion_tokens_details=None, prompt_tokens_details=PromptTokensDetailsWrapper(audio_tokens=None, cached_tokens=0, text_tokens=None, image_tokens=None), cache_creation_input_tokens=0, cache_read_input_tokens=0)
c.print_hist()
{'role': 'user', 'content': 'What is 5 + 7? Use the tool to calculate it.'}

Message(content='I will calculate these for you!', role='assistant', tool_calls=[{'index': 1, 'function': {'arguments': '{"a": 5, "b": 7}', 'name': 'simple_add'}, 'id': 'toolu_KU41c8z5PrGV1P7ComkRBd2bs', 'type': 'function'}, {'index': 1, 'function': {'arguments': '{"a": 6, "b": 7}', 'name': 'simple_add'}, 'id': 'toolu_Mfi6Oo4zBcFoFaR0PUQ0lmicP', 'type': 'function'}], function_call=None, provider_specific_fields=None)

{'tool_call_id': 'toolu_KU41c8z5PrGV1P7ComkRBd2bs', 'role': 'tool', 'name': 'simple_add', 'content': '12'}

{'tool_call_id': 'toolu_Mfi6Oo4zBcFoFaR0PUQ0lmicP', 'role': 'tool', 'name': 'simple_add', 'content': '13'}

Message(content='The results are:\n- 5 + 7 = 12\n- 6 + 7 = 13', role='assistant', tool_calls=None, function_call=None, provider_specific_fields={'citations': None, 'thinking_blocks': None})

Images

chat(['Whats in this img?',fn.read_bytes()])

This image shows an adorable puppy! It appears to be a Cavalier King Charles Spaniel or a similar breed with beautiful reddish-brown and white fur. The puppy has sweet, dark eyes and is lying on grass near some purple flowers (they look like small daisies or asters). The puppy looks very young and has that irresistibly cute, fluffy appearance that makes you want to give it a cuddle. It’s a really lovely, heartwarming photo!

  • id: chatcmpl-xxx
  • model: claude-sonnet-4-20250514
  • finish_reason: stop
  • usage: Usage(completion_tokens=109, prompt_tokens=164, total_tokens=273, completion_tokens_details=None, prompt_tokens_details=PromptTokensDetailsWrapper(audio_tokens=None, cached_tokens=0, text_tokens=None, image_tokens=None), cache_creation_input_tokens=0, cache_read_input_tokens=0)

Prefill

Prefill works as expected:

chat("Spell my name",prefill="Your name is R E")

Your name is R E N S.

  • id: chatcmpl-xxx
  • model: claude-sonnet-4-20250514
  • finish_reason: stop
  • usage: Usage(completion_tokens=6, prompt_tokens=285, total_tokens=291, completion_tokens_details=None, prompt_tokens_details=PromptTokensDetailsWrapper(audio_tokens=None, cached_tokens=0, text_tokens=None, image_tokens=None), cache_creation_input_tokens=0, cache_read_input_tokens=0)

And the entire message is stored in the history, not just the generated part:

chat.hist[-1]
Message(content='Your name is R E N S.', role='assistant', tool_calls=None, function_call=None, provider_specific_fields={'citations': None, 'thinking_blocks': None})

Streaming

from time import sleep
chat = Chat(model)
stream_gen = chat("Count to 5", stream=True)
for chunk in stream_gen:
    if isinstance(chunk, ModelResponse): display(chunk)
    else: print(delta_text(chunk) or '',end='')
1
2
3
4
5

1 2 3 4 5

  • id: chatcmpl-xxx
  • model: claude-sonnet-4-20250514
  • finish_reason: stop
  • usage: Usage(completion_tokens=13, prompt_tokens=11, total_tokens=24, completion_tokens_details=CompletionTokensDetailsWrapper(accepted_prediction_tokens=None, audio_tokens=None, reasoning_tokens=0, rejected_prediction_tokens=None, text_tokens=None), prompt_tokens_details=None)

Lets try prefill with streaming too:

stream_gen = chat("Continue counting to 10","Okay! 6, 7",stream=True)
for chunk in stream_gen:
    if isinstance(chunk, ModelResponse): display(chunk)
    else: print(delta_text(chunk) or '',end='')
Okay! 6, 7, 8, 9, 10.

Okay! 6, 7, 8, 9, 10.

  • id: chatcmpl-xxx
  • model: claude-sonnet-4-20250514
  • finish_reason: stop
  • usage: Usage(completion_tokens=13, prompt_tokens=40, total_tokens=53, completion_tokens_details=CompletionTokensDetailsWrapper(accepted_prediction_tokens=None, audio_tokens=None, reasoning_tokens=0, rejected_prediction_tokens=None, text_tokens=None), prompt_tokens_details=None)

Tool use

Ok now lets test tool use

for m in ms:
    display(Markdown(f'**{m}:**'))
    chat = Chat(m, tools=[simple_add])
    res = chat("What's 5 + 3? Use the `simple_add` tool. Explain.")
    display(res)

gemini/gemini-2.5-flash:

I used the simple_add tool with a=5 and b=3. The tool returned 8.

Therefore, 5 + 3 = 8.

  • id: chatcmpl-xxx
  • model: gemini-2.5-flash
  • finish_reason: stop
  • usage: Usage(completion_tokens=118, prompt_tokens=159, total_tokens=277, completion_tokens_details=CompletionTokensDetailsWrapper(accepted_prediction_tokens=None, audio_tokens=None, reasoning_tokens=79, rejected_prediction_tokens=None, text_tokens=39), prompt_tokens_details=PromptTokensDetailsWrapper(audio_tokens=None, cached_tokens=None, text_tokens=159, image_tokens=None))

claude-sonnet-4-20250514:

The result is 8.

Explanation: I used the simple_add function with the parameters: - a = 5 (the first operand) - b = 3 (the second operand)

The function performed the addition operation and returned 8, which is the correct sum of 5 + 3.

  • id: chatcmpl-xxx
  • model: claude-sonnet-4-20250514
  • finish_reason: stop
  • usage: Usage(completion_tokens=81, prompt_tokens=584, total_tokens=665, completion_tokens_details=None, prompt_tokens_details=PromptTokensDetailsWrapper(audio_tokens=None, cached_tokens=0, text_tokens=None, image_tokens=None), cache_creation_input_tokens=0, cache_read_input_tokens=0)

openai/gpt-4.1:

The result of 5 + 3 is 8.

Explanation: I used the simple_add tool, which takes two numbers and adds them together. By inputting 5 and 3, the tool calculated the sum as 8.

  • id: chatcmpl-xxx
  • model: gpt-4.1-2025-04-14
  • finish_reason: stop
  • usage: Usage(completion_tokens=48, prompt_tokens=155, total_tokens=203, completion_tokens_details=CompletionTokensDetailsWrapper(accepted_prediction_tokens=0, audio_tokens=0, reasoning_tokens=0, rejected_prediction_tokens=0, text_tokens=None), prompt_tokens_details=PromptTokensDetailsWrapper(audio_tokens=0, cached_tokens=0, text_tokens=None, image_tokens=None))

Thinking w tool use

chat = Chat(model, tools=[simple_add])
res = chat("What's 5 + 3?",think='l',return_all=True)
display(*res)

🔧 simple_add({“a”: 5, “b”: 3})

  • id: chatcmpl-xxx
  • model: claude-sonnet-4-20250514
  • finish_reason: tool_calls
  • usage: Usage(completion_tokens=165, prompt_tokens=455, total_tokens=620, completion_tokens_details=CompletionTokensDetailsWrapper(accepted_prediction_tokens=None, audio_tokens=None, reasoning_tokens=81, rejected_prediction_tokens=None, text_tokens=None), prompt_tokens_details=PromptTokensDetailsWrapper(audio_tokens=None, cached_tokens=0, text_tokens=None, image_tokens=None), cache_creation_input_tokens=0, cache_read_input_tokens=0)
{'tool_call_id': 'toolu_01QBLkGLtd85Yj8eEiFVBu4v',
 'role': 'tool',
 'name': 'simple_add',
 'content': '8'}

5 + 3 = 8

  • id: chatcmpl-xxx
  • model: claude-sonnet-4-20250514
  • finish_reason: stop
  • usage: Usage(completion_tokens=13, prompt_tokens=673, total_tokens=686, completion_tokens_details=None, prompt_tokens_details=PromptTokensDetailsWrapper(audio_tokens=None, cached_tokens=0, text_tokens=None, image_tokens=None), cache_creation_input_tokens=0, cache_read_input_tokens=0)

Search

chat = Chat(model)
res = chat("Search the web and tell me very briefly about otters", search='l', stream=True)
for o in res:
    if isinstance(o, ModelResponse): sleep(0.01); display(o)
    else: print(delta_text(o) or '',end='')
Otters are charismatic members of the weasel family, found on every continent except Australia and Antarctica. There are 13-14 species in total, ranging from the Asian small-clawed otter (smallest) to the giant otter and sea otter (largest).

These semi-aquatic mammals have elongated bodies, long tails, and soft, dense fur. Otters have the densest fur of any animal—as many as a million hairs per square inch. They're equipped with powerful webbed feet for swimming and seal-like abilities for holding breath underwater, with river otters able to hold their breath for up to 8 minutes.

All otters are expert hunters that eat fish, crustaceans, and other critters. Sea otters have an ingenious method to open shellfish by floating on their backs, placing a rock on their chest, then smashing mollusks against it. River otters are especially playful, gamboling on land and splashing into rivers.

Baby otters stay with their mothers until they're up to a year old, and otters can live up to 16 years. Unfortunately, many species remain at risk from pollution and habitat loss after being extensively hunted for their fur.

Otters are * charismatic members of the weasel family, found on every continent except Australia and Antarctica. * * There are 13-14 species in total, ranging from * the Asian small-clawed otter (smallest) to the giant otter and sea otter (largest).

These semi-aquatic mammals have * elongated bodies, long tails, and soft, dense fur. * Otters have the densest fur of any animal—as many as a million hairs per square inch. They’re * equipped with powerful webbed feet for swimming and seal-like abilities for holding breath underwater, with * * river otters able to hold their breath for up to 8 minutes.

* All otters are expert hunters that eat fish, crustaceans, and other critters. * Sea otters have an ingenious method to open shellfish by floating on their backs, placing a rock on their chest, then smashing mollusks against it. * River otters are especially playful, gamboling on land and splashing into rivers.

* Baby otters stay with their mothers until they’re up to a year old, and * otters can live up to 16 years. Unfortunately, * many species remain at risk from pollution and habitat loss after being extensively hunted for their fur.

  • id: chatcmpl-xxx
  • model: claude-sonnet-4-20250514
  • finish_reason: stop
  • usage: Usage(completion_tokens=551, prompt_tokens=13314, total_tokens=13865, completion_tokens_details=CompletionTokensDetailsWrapper(accepted_prediction_tokens=None, audio_tokens=None, reasoning_tokens=0, rejected_prediction_tokens=None, text_tokens=None), prompt_tokens_details=None)

Multi tool calling

We can let the model call multiple tools in sequence using the max_steps parameter.

chat = Chat(model, tools=[simple_add])
res = chat("What's ((5 + 3)+7)+11? Work step by step", return_all=True, max_steps=5)
for r in res: display(r)

I’ll solve this step by step using the addition function.

First, let me calculate 5 + 3:

🔧 simple_add({“a”: 5, “b”: 3})

  • id: chatcmpl-xxx
  • model: claude-sonnet-4-20250514
  • finish_reason: tool_calls
  • usage: Usage(completion_tokens=96, prompt_tokens=434, total_tokens=530, completion_tokens_details=None, prompt_tokens_details=PromptTokensDetailsWrapper(audio_tokens=None, cached_tokens=0, text_tokens=None, image_tokens=None), cache_creation_input_tokens=0, cache_read_input_tokens=0)
{'tool_call_id': 'toolu_01PbJNpks9G6o9UYjRs5nd4b',
 'role': 'tool',
 'name': 'simple_add',
 'content': '8'}

Now I’ll add 7 to that result (8 + 7):

🔧 simple_add({“a”: 8, “b”: 7})

  • id: chatcmpl-xxx
  • model: claude-sonnet-4-20250514
  • finish_reason: tool_calls
  • usage: Usage(completion_tokens=88, prompt_tokens=543, total_tokens=631, completion_tokens_details=None, prompt_tokens_details=PromptTokensDetailsWrapper(audio_tokens=None, cached_tokens=0, text_tokens=None, image_tokens=None), cache_creation_input_tokens=0, cache_read_input_tokens=0)
{'tool_call_id': 'toolu_01PAuvDwaBNgM2JbEwJwxov5',
 'role': 'tool',
 'name': 'simple_add',
 'content': '15'}

Finally, I’ll add 11 to that result (15 + 11):

🔧 simple_add({“a”: 15, “b”: 11})

  • id: chatcmpl-xxx
  • model: claude-sonnet-4-20250514
  • finish_reason: tool_calls
  • usage: Usage(completion_tokens=89, prompt_tokens=644, total_tokens=733, completion_tokens_details=None, prompt_tokens_details=PromptTokensDetailsWrapper(audio_tokens=None, cached_tokens=0, text_tokens=None, image_tokens=None), cache_creation_input_tokens=0, cache_read_input_tokens=0)
{'tool_call_id': 'toolu_01KeELPbBSpCePrk4c8EzbyB',
 'role': 'tool',
 'name': 'simple_add',
 'content': '26'}

So working step by step: - 5 + 3 = 8 - 8 + 7 = 15
- 15 + 11 = 26

Therefore, ((5 + 3) + 7) + 11 = 26

  • id: chatcmpl-xxx
  • model: claude-sonnet-4-20250514
  • finish_reason: stop
  • usage: Usage(completion_tokens=68, prompt_tokens=746, total_tokens=814, completion_tokens_details=None, prompt_tokens_details=PromptTokensDetailsWrapper(audio_tokens=None, cached_tokens=0, text_tokens=None, image_tokens=None), cache_creation_input_tokens=0, cache_read_input_tokens=0)

Some models support parallel tool calling. I.e. sending multiple tool call requests in one conversation step.

def multiply(a: int, b: int) -> int:
    "Multiply two numbers"
    return a * b

chat = Chat('openai/gpt-4.1', tools=[simple_add, multiply])
res = chat("Calculate (5 + 3) * (7 + 2)", max_steps=5, return_all=True)
for r in res: display(r)

🔧 simple_add({“a”: 5, “b”: 3})

🔧 simple_add({“a”: 7, “b”: 2})

  • id: chatcmpl-xxx
  • model: gpt-4.1-2025-04-14
  • finish_reason: tool_calls
  • usage: Usage(completion_tokens=52, prompt_tokens=110, total_tokens=162, completion_tokens_details=CompletionTokensDetailsWrapper(accepted_prediction_tokens=0, audio_tokens=0, reasoning_tokens=0, rejected_prediction_tokens=0, text_tokens=None), prompt_tokens_details=PromptTokensDetailsWrapper(audio_tokens=0, cached_tokens=0, text_tokens=None, image_tokens=None))
{'tool_call_id': 'call_8ZH9yZH3N9TEmaIOmV6H81mU',
 'role': 'tool',
 'name': 'simple_add',
 'content': '8'}
{'tool_call_id': 'call_uDxT76wyUI8WY9ugRPwCnlb9',
 'role': 'tool',
 'name': 'simple_add',
 'content': '9'}

🔧 multiply({“a”:8,“b”:9})

  • id: chatcmpl-xxx
  • model: gpt-4.1-2025-04-14
  • finish_reason: tool_calls
  • usage: Usage(completion_tokens=17, prompt_tokens=178, total_tokens=195, completion_tokens_details=CompletionTokensDetailsWrapper(accepted_prediction_tokens=0, audio_tokens=0, reasoning_tokens=0, rejected_prediction_tokens=0, text_tokens=None), prompt_tokens_details=PromptTokensDetailsWrapper(audio_tokens=0, cached_tokens=0, text_tokens=None, image_tokens=None))
{'tool_call_id': 'call_Dm0jjJPQf2elRy4z7yBhOMf5',
 'role': 'tool',
 'name': 'multiply',
 'content': '72'}

(5 + 3) = 8 and (7 + 2) = 9. Multiplying them together: 8 × 9 = 72.

So, (5 + 3) * (7 + 2) = 72.

  • id: chatcmpl-xxx
  • model: gpt-4.1-2025-04-14
  • finish_reason: stop
  • usage: Usage(completion_tokens=55, prompt_tokens=203, total_tokens=258, completion_tokens_details=CompletionTokensDetailsWrapper(accepted_prediction_tokens=0, audio_tokens=0, reasoning_tokens=0, rejected_prediction_tokens=0, text_tokens=None), prompt_tokens_details=PromptTokensDetailsWrapper(audio_tokens=0, cached_tokens=0, text_tokens=None, image_tokens=None))

See it did the additions in one go!

We don’t want the model to keep running tools indefinitely. Lets showcase how we can force thee model to stop after our specified number of toolcall rounds:

def divide(a: int, b: int) -> float:
    "Divide two numbers"
    return a / b

chat = Chat(model, tools=[simple_add, multiply, divide])
res = chat("Calculate ((10 + 5) * 3) / (2 + 1) step by step.", 
           max_steps=3, return_all=True,
           final_prompt="Please wrap-up for now and summarize how far we got.")
for r in res: display(r)

I’ll calculate this step by step using the available functions, following the order of operations (parentheses first, then multiplication/division from left to right).

Step 1: Calculate (10 + 5)

🔧 simple_add({“a”: 10, “b”: 5})

  • id: chatcmpl-xxx
  • model: claude-sonnet-4-20250514
  • finish_reason: tool_calls
  • usage: Usage(completion_tokens=116, prompt_tokens=609, total_tokens=725, completion_tokens_details=None, prompt_tokens_details=PromptTokensDetailsWrapper(audio_tokens=None, cached_tokens=0, text_tokens=None, image_tokens=None), cache_creation_input_tokens=0, cache_read_input_tokens=0)
{'tool_call_id': 'toolu_013mAGGxvDDy3ftqSS3u2qPh',
 'role': 'tool',
 'name': 'simple_add',
 'content': '15'}

Step 2: Calculate (2 + 1)

🔧 simple_add({“a”: 2, “b”: 1})

  • id: chatcmpl-xxx
  • model: claude-sonnet-4-20250514
  • finish_reason: tool_calls
  • usage: Usage(completion_tokens=83, prompt_tokens=738, total_tokens=821, completion_tokens_details=None, prompt_tokens_details=PromptTokensDetailsWrapper(audio_tokens=None, cached_tokens=0, text_tokens=None, image_tokens=None), cache_creation_input_tokens=0, cache_read_input_tokens=0)
{'tool_call_id': 'toolu_019KZ7aFBcAoHieWEFX6hYwA',
 'role': 'tool',
 'name': 'simple_add',
 'content': '3'}

Step 3: Multiply 15 * 3

  • id: chatcmpl-xxx
  • model: claude-sonnet-4-20250514
  • finish_reason: stop
  • usage: Usage(completion_tokens=22, prompt_tokens=848, total_tokens=870, completion_tokens_details=None, prompt_tokens_details=PromptTokensDetailsWrapper(audio_tokens=None, cached_tokens=0, text_tokens=None, image_tokens=None), cache_creation_input_tokens=0, cache_read_input_tokens=0)

Tool call exhaustian

pr = "What is 1+2, and then the result of adding +2, and then +3 to it? Use tools to calculate!"
c = Chat(model, tools=[simple_add])
res = c(pr, max_steps=2)
res

I was only able to complete the first calculation: 1 + 2 = 3.

To complete your request, I would need to make two additional tool calls: 1. Add 2 to the result (3 + 2 = 5) 2. Add 3 to that result (5 + 3 = 8)

The final answer should be 8, but I wasn’t able to use the tools for all the steps. Would you like me to continue with the remaining calculations if you provide more tool uses?

  • id: chatcmpl-xxx
  • model: claude-sonnet-4-20250514
  • finish_reason: stop
  • usage: Usage(completion_tokens=123, prompt_tokens=584, total_tokens=707, completion_tokens_details=None, prompt_tokens_details=PromptTokensDetailsWrapper(audio_tokens=None, cached_tokens=0, text_tokens=None, image_tokens=None), cache_creation_input_tokens=0, cache_read_input_tokens=0)
assert c.hist[-2]['content'] == _final_prompt

Async

If you want to use LiteLLM in a webapp you probably want to use their async function acompletion. To make that easier we will implement our version of AsyncChat to complement it. It follows the same implementation as Chat as much as possible:


source

astream_with_complete

 astream_with_complete (agen, postproc=<function noop>)

source

AsyncChat

 AsyncChat (model:str, sp='', temp=0, search=False, tools:list=None,
            hist:list=None, ns:Optional[dict]=None, cache=False)

LiteLLM chat client.

Type Default Details
model str LiteLLM compatible model name
sp str System prompt
temp int 0 Temperature
search bool False Search (l,m,h), if model supports it
tools list None Add tools
hist list None Chat history
ns Optional None Custom namespace for tool calling
cache bool False Anthropic prompt caching

Examples

Basic example

chat = AsyncChat(model)
await chat("What is 2+2?")

2 + 2 = 4

  • id: chatcmpl-xxx
  • model: claude-sonnet-4-20250514
  • finish_reason: stop
  • usage: Usage(completion_tokens=13, prompt_tokens=14, total_tokens=27, completion_tokens_details=None, prompt_tokens_details=PromptTokensDetailsWrapper(audio_tokens=None, cached_tokens=0, text_tokens=None, image_tokens=None), cache_creation_input_tokens=0, cache_read_input_tokens=0)

With tool calls

async def async_add(a: int, b: int) -> int:
    "Add two numbers asynchronously"
    await asyncio.sleep(0.1)
    return a + b
chat_with_tools = AsyncChat(model, tools=[async_add])
res = await chat_with_tools("What is 5 + 7? Use the tool to calculate it.", return_all=True)
async for r in res: display(r)

I’ll use the async_add tool to calculate 5 + 7 for you.

🔧 async_add({“a”: 5, “b”: 7})

  • id: chatcmpl-xxx
  • model: claude-sonnet-4-20250514
  • finish_reason: tool_calls
  • usage: Usage(completion_tokens=91, prompt_tokens=424, total_tokens=515, completion_tokens_details=None, prompt_tokens_details=PromptTokensDetailsWrapper(audio_tokens=None, cached_tokens=0, text_tokens=None, image_tokens=None), cache_creation_input_tokens=0, cache_read_input_tokens=0)
{'tool_call_id': 'toolu_01X1GDcGawNLq77xytxEGDBw',
 'role': 'tool',
 'name': 'async_add',
 'content': '12'}

The calculation is complete! Using the async_add tool, I calculated that 5 + 7 = 12.

  • id: chatcmpl-xxx
  • model: claude-sonnet-4-20250514
  • finish_reason: stop
  • usage: Usage(completion_tokens=30, prompt_tokens=568, total_tokens=598, completion_tokens_details=None, prompt_tokens_details=PromptTokensDetailsWrapper(audio_tokens=None, cached_tokens=0, text_tokens=None, image_tokens=None), cache_creation_input_tokens=0, cache_read_input_tokens=0)
chat.hist
[{'role': 'user', 'content': 'What is 2+2?'},
 Message(content='2 + 2 = 4', role='assistant', tool_calls=None, function_call=None, provider_specific_fields={'citations': None, 'thinking_blocks': None})]

Async Streaming Display

This is what our outputs look like with streaming results:

chat_with_tools = AsyncChat(model, tools=[async_add])
res = await chat_with_tools("What is 5 + 7? Use the tool to calculate it.", stream=True)
async for o in res:
    if isinstance(o,ModelResponseStream): print(delta_text(o) or '',end='')
    elif isinstance(o,dict): print(o)
I'll use the async_add tool to calculate 5 + 7 for you.
🔧 async_add
{'tool_call_id': 'toolu_01NTrwnCZYd6K7ZeNQtUyx9k', 'role': 'tool', 'name': 'async_add', 'content': '12'}


The calculation is complete! Using the async_add tool, I calculated that 5 + 7 = 12.

We use this one quite a bit so we want to provide some utilities to better format these outputs:


source

aformat_stream

 aformat_stream (rs)

Format the response stream for markdown display.


source

adisplay_stream

 adisplay_stream (rs)

Use IPython.display to markdown display the response stream.

Streaming examples

Now we can demonstrate AsynChat with stream=True!

Tool call

chat = AsyncChat(model, tools=[async_add])
res = await chat("What is 5 + 7? Use the tool to calculate it.", stream=True)
await adisplay_stream(res)
I’ll use the async_add tool to calculate 5 + 7 for you.

async_add({"a": 5, "b": 7}) - 12

The calculation is complete! Using the async_add tool, I calculated that 5 + 7 = 12.

Thinking tool call

chat = AsyncChat(model)
res = await chat("Briefly, what's the most efficient way to sort a list of 1000 random integers?", think='l',stream=True)
await adisplay_stream(res)

🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠

For 1000 random integers, the most efficient approach is to use your programming language’s built-in sort function (like sort() in Python, Arrays.sort() in Java, or std::sort() in C++).

These implementations typically use highly optimized hybrid algorithms like: - Introsort (introspective sort) - starts with quicksort, switches to heapsort for worst-case scenarios - Timsort (Python) - optimized for real-world data patterns

If implementing from scratch, quicksort would be most efficient for random data at this size, with O(n log n) average performance.

For integers specifically, radix sort could also be very fast (O(kn) where k is the number of digits), but built-in functions are still usually the best choice due to their optimization and testing.

Multiple tool calls

chat.hist[1]
Message(content="I'll calculate ((10 + 5) * 3) / (2 + 1) step by step using the available functions.\n\n**Step 1: Calculate (10 + 5)**", role='assistant', tool_calls=[{'function': {'arguments': '{"a": 10, "b": 5}', 'name': 'simple_add'}, 'id': 'toolu_017SE1tgPMBDtS5xLLMGLg9b', 'type': 'function'}], function_call=None, provider_specific_fields=None)
chat.hist[2]
{'tool_call_id': 'toolu_017SE1tgPMBDtS5xLLMGLg9b',
 'role': 'tool',
 'name': 'simple_add',
 'content': '15'}
chat.hist[3]
Message(content='**Step 2: Calculate (2 + 1)**', role='assistant', tool_calls=[{'function': {'arguments': '{"a": 2, "b": 1}', 'name': 'simple_add'}, 'id': 'toolu_013N4boUAcDB1X2h9aNNivug', 'type': 'function'}], function_call=None, provider_specific_fields=None)

Search

chat_stream_tools = AsyncChat(model, search='l')
res = await chat_stream_tools("Search the web and tell me very briefly about otters", stream=True)
await adisplay_stream(res)

Otters are * charismatic members of the weasel family, found on every continent except Australia and Antarctica. * * There are 13-14 species in total, ranging from * the Asian small-clawed otter (smallest) to the giant otter and sea otter (largest).

These semi-aquatic mammals have * elongated bodies, long tails, and soft, dense fur. * Otters have the densest fur of any animal—as many as a million hairs per square inch. They’re * equipped with powerful webbed feet for swimming and seal-like abilities for holding breath underwater, with * * river otters able to hold their breath for up to 8 minutes.

* All otters are expert hunters that eat fish, crustaceans, and other critters. * Sea otters have an ingenious method to open shellfish by floating on their backs, placing a rock on their chest, then smashing mollusks against it. * River otters are especially playful, gamboling on land and splashing into rivers.

* Baby otters stay with their mothers until they’re up to a year old, and * otters can live up to 16 years. Unfortunately, * many species remain at risk from pollution and habitat loss after being extensively hunted for their fur.