from lisette import *Lisette
NB: If you are reading this in GitHub’s readme, we recommend you instead read the much more nicely formatted documentation format of this tutorial.
Lisette is a wrapper for the LiteLLM Python SDK, which provides unified access to 100+ LLM providers using the OpenAI API format.
LiteLLM provides a unified interface to access multiple LLMs, but it’s quite low level: it leaves the developer to do a lot of stuff manually. Lisette automates pretty much everything that can be automated, whilst providing full control. Amongst the features provided:
- A
Chatclass that creates stateful dialogs across any LiteLLM-supported model - Convenient message creation utilities for text, images, and mixed content
- Simple and convenient support for tool calling with automatic execution
- Built-in support for web search capabilities (including citations for supporting models)
- Streaming responses with formatting
- Full async support with
AsyncChat - Prompt caching (for supporting models)
To use Lisette, you’ll need to set the appropriate API keys as environment variables for whichever LLM providers you want to use.
Get started
LiteLLM will automatically be installed with Lisette, if you don’t already have it.
!pip install lisette -qq
Lisette only exports the symbols that are needed to use the library, so you can use import * to import them. Here’s a quick example showing how easy it is to switch between different LLM providers:
Chat
models = ["gemini/gemini-3-flash-preview", "claude-opus-4-6", "openai/gpt-4.1"]
for model in models:
chat = Chat(model)
res = chat("Please tell me about yourself in one brief sentence.")
display(res)I am a large language model, trained by Google.
- id:
chatcmpl-xxx - model:
gemini-3-flash-preview - finish_reason:
stop - usage:
Usage(completion_tokens=11, prompt_tokens=11, total_tokens=22, completion_tokens_details=CompletionTokensDetailsWrapper(accepted_prediction_tokens=None, audio_tokens=None, reasoning_tokens=None, rejected_prediction_tokens=None, text_tokens=11, image_tokens=None), prompt_tokens_details=PromptTokensDetailsWrapper(audio_tokens=None, cached_tokens=None, text_tokens=11, image_tokens=None), cache_read_input_tokens=None)
I’m Claude, an AI assistant made by Anthropic, designed to be helpful, harmless, and honest in conversations across a wide range of topics.
- id:
chatcmpl-xxx - model:
claude-opus-4-6 - finish_reason:
stop - usage:
Usage(completion_tokens=35, prompt_tokens=17, total_tokens=52, completion_tokens_details=CompletionTokensDetailsWrapper(accepted_prediction_tokens=None, audio_tokens=None, reasoning_tokens=0, rejected_prediction_tokens=None, text_tokens=35, image_tokens=None), prompt_tokens_details=PromptTokensDetailsWrapper(audio_tokens=None, cached_tokens=0, text_tokens=None, image_tokens=None, cache_creation_tokens=0, cache_creation_token_details=CacheCreationTokenDetails(ephemeral_5m_input_tokens=0, ephemeral_1h_input_tokens=0)), cache_creation_input_tokens=0, cache_read_input_tokens=0, inference_geo='global', speed=None)
I am an AI language model created by OpenAI, designed to assist with information, writing, and problem-solving tasks.
- id:
chatcmpl-xxx - model:
gpt-4.1-2025-04-14 - finish_reason:
stop - usage:
Usage(completion_tokens=24, prompt_tokens=17, total_tokens=41, completion_tokens_details=CompletionTokensDetailsWrapper(accepted_prediction_tokens=0, audio_tokens=0, reasoning_tokens=0, rejected_prediction_tokens=0, text_tokens=None, image_tokens=None), prompt_tokens_details=PromptTokensDetailsWrapper(audio_tokens=0, cached_tokens=0, text_tokens=None, image_tokens=None))
That’s it! Lisette handles all the provider-specific details automatically. Each model will respond in its own style, but the interface remains the same.
To extract just the text content from a response, use the contents() helper which returns res.choices[0].message:
contents(res).content'I am an AI language model created by OpenAI, designed to assist with information, writing, and problem-solving tasks.'
Message formatting
Multiple messages
Lisette accepts multiple messages in one go:
chat = Chat(models[0])
res = chat(['Hi! My favorite drink coffee.', 'Hello!', 'Whats my favorite drink?'])
display(res)Your favorite drink is coffee!
- id:
chatcmpl-xxx - model:
gemini-3-flash-preview - finish_reason:
stop - usage:
Usage(completion_tokens=8, prompt_tokens=15, total_tokens=23, completion_tokens_details=CompletionTokensDetailsWrapper(accepted_prediction_tokens=None, audio_tokens=None, reasoning_tokens=None, rejected_prediction_tokens=None, text_tokens=8, image_tokens=None), prompt_tokens_details=PromptTokensDetailsWrapper(audio_tokens=None, cached_tokens=None, text_tokens=15, image_tokens=None), cache_read_input_tokens=None)
If you have a pre-existing message history, you can also pass it when you create the Chat object:
chat = Chat(models[0],hist=['Hi! My favorite drink is coffee.', 'Hello!'])
res = chat('Whats my favorite drink?')
display(res)Your favorite drink is coffee!
- id:
chatcmpl-xxx - model:
gemini-3-flash-preview - finish_reason:
stop - usage:
Usage(completion_tokens=8, prompt_tokens=18, total_tokens=26, completion_tokens_details=CompletionTokensDetailsWrapper(accepted_prediction_tokens=None, audio_tokens=None, reasoning_tokens=None, rejected_prediction_tokens=None, text_tokens=8, image_tokens=None), prompt_tokens_details=PromptTokensDetailsWrapper(audio_tokens=None, cached_tokens=None, text_tokens=18, image_tokens=None), cache_read_input_tokens=None)
Images
Lisette also makes it easy to include images in your prompts:
from pathlib import Path
from IPython.display import Imagefn = Path('samples/puppy.jpg')
img = fn.read_bytes()
Image(img)
All you have to do is read it in as bytes:
img[:20]b'\xff\xd8\xff\xe0\x00\x10JFIF\x00\x01\x01\x00\x00\x01\x00\x01\x00\x00'
And you can pass it inside a Chat object:
chat = Chat(models[0])
chat([img, "What's in this image? Be brief."])A Cavalier King Charles Spaniel puppy lying in the grass next to purple flowers.
- id:
chatcmpl-xxx - model:
gemini-3-flash-preview - finish_reason:
stop - usage:
Usage(completion_tokens=17, prompt_tokens=1091, total_tokens=1108, completion_tokens_details=CompletionTokensDetailsWrapper(accepted_prediction_tokens=None, audio_tokens=None, reasoning_tokens=None, rejected_prediction_tokens=None, text_tokens=17, image_tokens=None), prompt_tokens_details=PromptTokensDetailsWrapper(audio_tokens=None, cached_tokens=None, text_tokens=11, image_tokens=1080), cache_read_input_tokens=None)
Prefill
Some providers (e.g. Anthropic) support prefill, allowing you to specify how the assistant’s response should begin:”
chat = Chat(models[0])
chat("Concisely, what's the meaning of life?", prefill="According to Douglas Adams,")The meaning of life is subjective and self-created.
Biologically, it is to propagate life; philosophically, it is to find or create purpose through connection, contribution, and the pursuit of what makes you feel most alive.
- id:
chatcmpl-xxx - model:
gemini-3-flash-preview - finish_reason:
stop - usage:
Usage(completion_tokens=55, prompt_tokens=13, total_tokens=68, completion_tokens_details=CompletionTokensDetailsWrapper(accepted_prediction_tokens=None, audio_tokens=None, reasoning_tokens=None, rejected_prediction_tokens=None, text_tokens=55, image_tokens=None), prompt_tokens_details=PromptTokensDetailsWrapper(audio_tokens=None, cached_tokens=None, text_tokens=13, image_tokens=None), cache_read_input_tokens=None)
Tools
Lisette makes it easy to give LLMs access to Python functions. Just define a function with type hints and a docstring:
def add_numbers(
a: int, # First number to add
b: int # Second number to add
) -> int:
"Add two numbers together"
return a + bNow pass the function to Chat and the model can use it automatically:
chat = Chat(models[0], tools=[add_numbers])
res = chat("What's 47 + 23? Use the tool.")
res47 + 23 is 70.
- id:
chatcmpl-xxx - model:
gemini-3-flash-preview - finish_reason:
stop - usage:
Usage(completion_tokens=11, prompt_tokens=129, total_tokens=140, completion_tokens_details=CompletionTokensDetailsWrapper(accepted_prediction_tokens=None, audio_tokens=None, reasoning_tokens=None, rejected_prediction_tokens=None, text_tokens=11, image_tokens=None), prompt_tokens_details=PromptTokensDetailsWrapper(audio_tokens=None, cached_tokens=None, text_tokens=129, image_tokens=None), cache_read_input_tokens=None)
If you want to see all intermediate messages and outputs you can use the return_all=True feature.
chat = Chat(models[0], tools=[add_numbers])
res = chat("What's 47 + 23 + 59? Use the tool.",max_steps=3,return_all=True)
display(*res)🔧 add_numbers({“a”: 47, “b”: 23})
- id:
chatcmpl-xxx - model:
gemini-3-flash-preview - finish_reason:
tool_calls - usage:
Usage(completion_tokens=20, prompt_tokens=99, total_tokens=119, completion_tokens_details=CompletionTokensDetailsWrapper(accepted_prediction_tokens=None, audio_tokens=None, reasoning_tokens=None, rejected_prediction_tokens=None, text_tokens=20, image_tokens=None), prompt_tokens_details=PromptTokensDetailsWrapper(audio_tokens=None, cached_tokens=None, text_tokens=99, image_tokens=None), cache_read_input_tokens=None)
{'tool_call_id': 'call_9yi0_kJITjqKXS80a6qUVQ',
'role': 'tool',
'name': 'add_numbers',
'content': '70'}
🔧 add_numbers({“b”: 59, “a”: 70})
- id:
chatcmpl-xxx - model:
gemini-3-flash-preview - finish_reason:
tool_calls - usage:
Usage(completion_tokens=20, prompt_tokens=133, total_tokens=153, completion_tokens_details=CompletionTokensDetailsWrapper(accepted_prediction_tokens=None, audio_tokens=None, reasoning_tokens=None, rejected_prediction_tokens=None, text_tokens=20, image_tokens=None), prompt_tokens_details=PromptTokensDetailsWrapper(audio_tokens=None, cached_tokens=None, text_tokens=133, image_tokens=None), cache_read_input_tokens=None)
{'tool_call_id': 'call_6xFns2epQ3i8ZcHlguLmYg',
'role': 'tool',
'name': 'add_numbers',
'content': '129'}
47 + 23 + 59 = 129.
- id:
chatcmpl-xxx - model:
gemini-3-flash-preview - finish_reason:
stop - usage:
Usage(completion_tokens=16, prompt_tokens=168, total_tokens=184, completion_tokens_details=CompletionTokensDetailsWrapper(accepted_prediction_tokens=None, audio_tokens=None, reasoning_tokens=None, rejected_prediction_tokens=None, text_tokens=16, image_tokens=None), prompt_tokens_details=PromptTokensDetailsWrapper(audio_tokens=None, cached_tokens=None, text_tokens=168, image_tokens=None), cache_read_input_tokens=None)
It shows the intermediate tool calls, and the tool results!
Web search
Some models support web search capabilities. Lisette makes this easy to use:
chat = Chat(models[0], search='l') # 'l'ow, 'm'edium, or 'h'igh search context
res = chat("Please tell me one fun fact about otters. Keep it brief")
resSea otters often hold hands while they sleep to keep from drifting apart in the water. These groups of resting otters are called “rafts.”
- id:
chatcmpl-xxx - model:
gemini-3-flash-preview - finish_reason:
stop - usage:
Usage(completion_tokens=31, prompt_tokens=14, total_tokens=45, completion_tokens_details=CompletionTokensDetailsWrapper(accepted_prediction_tokens=None, audio_tokens=None, reasoning_tokens=None, rejected_prediction_tokens=None, text_tokens=31, image_tokens=None), prompt_tokens_details=PromptTokensDetailsWrapper(audio_tokens=None, cached_tokens=None, text_tokens=14, image_tokens=None), cache_read_input_tokens=None)
Some providers (like Anthropic) provide citations for their search results.
res.choices[0].message.provider_specific_fields{'thought_signatures': ['EjQKMgG+Pvb78POvW+fyMUQX7rTpoltcIJbCLisGdH/ZV4FRN0DfkkgClNAm24aBvvTmdfb9']}
Streaming
For real-time responses, use stream=True to get chunks as they’re generated rather than waiting for the complete response:
chat = Chat(models[0])
res_gen = chat("Concisely, what are the top 10 biggest animals?", stream=True)from litellm import ModelResponse, ModelResponseStreamYou can loop over the generator to get the partial responses:
for chunk in res_gen:
if isinstance(chunk,ModelResponseStream): print(chunk.choices[0].delta.content,end='')Ranked by maximum weight, here are the 10 largest animals on Earth:
1. **Blue Whale:** The largest animal ever known (up to 190 tons).
2. **North Pacific Right Whale:** Massive baleen whale (up to 120 tons).
3. **Southern Right Whale:** Heavily built filter feeder (up to 110 tons).
4. **Fin Whale:** The second-longest animal (up to 80 tons).
5. **Bowhead Whale:** Possesses the largest mouth of any animal (up to 75 tons).
6. **Sperm Whale:** The largest toothed predator (up to 60 tons).
7. **Humpback Whale:** Known for long pectoral fins (up to 45 tons).
8. **Sei Whale:** One of the fastest swimmers (up to 30 tons).
9. **Whale Shark:** The largest non-mammalian vertebrate (up to 21 tons).
10. **African Bush Elephant:** The largest living land animal (up to 11 tons).NoneNone
And the final chunk is the complete ModelResponse:
chunkRanked by maximum weight, here are the 10 largest animals on Earth:
- Blue Whale: The largest animal ever known (up to 190 tons).
- North Pacific Right Whale: Massive baleen whale (up to 120 tons).
- Southern Right Whale: Heavily built filter feeder (up to 110 tons).
- Fin Whale: The second-longest animal (up to 80 tons).
- Bowhead Whale: Possesses the largest mouth of any animal (up to 75 tons).
- Sperm Whale: The largest toothed predator (up to 60 tons).
- Humpback Whale: Known for long pectoral fins (up to 45 tons).
- Sei Whale: One of the fastest swimmers (up to 30 tons).
- Whale Shark: The largest non-mammalian vertebrate (up to 21 tons).
- African Bush Elephant: The largest living land animal (up to 11 tons).
- id:
chatcmpl-xxx - model:
gemini-3-flash-preview - finish_reason:
stop - usage:
Usage(completion_tokens=244, prompt_tokens=15, total_tokens=259, completion_tokens_details=CompletionTokensDetailsWrapper(accepted_prediction_tokens=None, audio_tokens=None, reasoning_tokens=0, rejected_prediction_tokens=None, text_tokens=None, image_tokens=None), prompt_tokens_details=None)
Async
For web applications and concurrent operations, like in FastHTML, we recommend using AsyncChat:
chat = AsyncChat(models[0])
await chat("Hi there")Hello! How can I help you today?
- id:
chatcmpl-xxx - model:
gemini-3-flash-preview - finish_reason:
stop - usage:
Usage(completion_tokens=9, prompt_tokens=3, total_tokens=12, completion_tokens_details=CompletionTokensDetailsWrapper(accepted_prediction_tokens=None, audio_tokens=None, reasoning_tokens=None, rejected_prediction_tokens=None, text_tokens=9, image_tokens=None), prompt_tokens_details=PromptTokensDetailsWrapper(audio_tokens=None, cached_tokens=None, text_tokens=3, image_tokens=None), cache_read_input_tokens=None)
To wrap up, we’ll show an example of async + streaming + toolcalling + search:
chat = AsyncChat(models[1], search='l', tools=[add_numbers])
res = await chat("""\
Search the web for the avg weight, in kgs, of male African and Asian elephants. Then add the two.
Keep your replies ultra concise! Dont search the web more than once please.
""", max_steps=4, stream=True)
_=await adisplay_stream(res) # this is a convenience function to make async streaming look great in notebooks!Based on the search results, here are good average figures:
Now let me add them:
add_numbers(a=5000, b=5000)
{
"id": "toolu_GEbUJMF8QnmjxmEvSCaGcw",
"call": {
"function": "add_numbers",
"arguments": {
"a": "5000",
"b": "5000"
}
},
"result": "10000"
}Here are the averages from the sources:
Sum: 5,000 + 5,000 = 10,000 kg ✅
Next steps
Ready to dive deeper?
- Check out the rest of the documentation.
- Visit the GitHub repository to contribute or report issues.
- Join our Discord community!