# Usage


<!-- WARNING: THIS FILE WAS AUTOGENERATED! DO NOT EDIT! -->

``` python
import litellm, importlib, httpx
from lisette.core import Chat, AsyncChat, patch_litellm
from cachy import enable_cachy, disable_cachy
from fastcore.test import *
```

## Lisette Usage Logger

``` python
importlib.reload(litellm); # to re-run the notebook without kernel restart
```

``` python
# litellm._turn_on_debug()
```

``` python
patch_litellm()
```

------------------------------------------------------------------------

<a
href="https://github.com/AnswerDotAI/lisette/blob/main/lisette/usage.py#L16"
target="_blank" style="float:right; font-size:smaller">source</a>

### Usage

``` python

def Usage(
    args:VAR_POSITIONAL, kwargs:VAR_KEYWORD
):

```

*Initialize self. See help(type(self)) for accurate signature.*

Anthropic provides web search request counts directly via
`usage.server_tool_use.web_search_requests`, billed at $10 per 1,000
searches
([pricing](https://docs.claude.com/en/docs/about-claude/pricing)).
Gemini returns queries in `groundingMetadata.webSearchQueries`—each
query counts as a separate billable use—with 5,000 free prompts per
month, then $14 per 1,000 search queries (coming soon)
([pricing](https://ai.google.dev/gemini-api/docs/pricing), [grounding
docs](https://ai.google.dev/gemini-api/docs/google-search)).

------------------------------------------------------------------------

<a
href="https://github.com/AnswerDotAI/lisette/blob/main/lisette/usage.py#L19"
target="_blank" style="float:right; font-size:smaller">source</a>

### search_count

``` python

def search_count(
    r
):

```

The precomputed response cost provided is available in
`kwargs['response_cost']` according to the [litellm
docs](https://docs.litellm.ai/docs/observability/custom_callback#whats-available-in-kwargs):

------------------------------------------------------------------------

<a
href="https://github.com/AnswerDotAI/lisette/blob/main/lisette/usage.py#L27"
target="_blank" style="float:right; font-size:smaller">source</a>

### LisetteUsageLogger

``` python

def LisetteUsageLogger(
    db_path
):

```

*Args:* turn_off_message_logging: bool - if True, the message logging
will be turned off. Message and response will be redacted from
StandardLoggingPayload. message_logging: bool - deprecated param, use
`turn_off_message_logging` instead

## Cost Utils

``` python
class PrefixDict(dict):
    def __getitem__(self, key):
        if key in self.keys(): return super().__getitem__(key)
        for k in self.keys(): 
            if key.startswith(k): return super().__getitem__(k)
        raise KeyError(key)
```

``` python
model_prices = PrefixDict({'claude-sonnet-4-5':
    dict(input_prc = 3/1e6, cache_write_prc = 3.75/1e6, cache_read_prc = 0.3/1e6, output_prc = 15/1e6, web_search_prc = 10/1e3)
})
```

Simplified cost utils to demonstrate total cost calculation (use
`Usage.response_cost` in prod):

``` python
@patch(as_prop=True)
def inp_cost(self:Usage):         return model_prices[self.model]['input_prc'] * (self.prompt_tokens - self.cache_read_tokens)
@patch(as_prop=True)
def cache_write_cost(self:Usage): return model_prices[self.model]['cache_write_prc'] * self.cache_creation_tokens
@patch(as_prop=True)
def cache_read_cost(self:Usage):  return model_prices[self.model]['cache_read_prc'] * self.cache_read_tokens
@patch(as_prop=True)
def out_cost(self:Usage):         return model_prices[self.model]['output_prc'] * self.completion_tokens
@patch(as_prop=True)
def web_cost(self:Usage):         return model_prices[self.model]['web_search_prc'] * ifnone(self.web_search_requests, 0)
@patch(as_prop=True)
def cost(self:Usage):             return self.inp_cost + self.cache_write_cost + self.cache_read_cost + self.out_cost + self.web_cost
```

A mapping of model pricing is also available in litellm, which is used
to calculate the `response_cost`

``` python
model_pricing = dict2obj(httpx.get(litellm.model_cost_map_url).json())
```

``` python
# model_pricing['claude-sonnet-4-5']
```

``` python
# model_pricing['gemini-3-pro-preview']
```

## Examples

``` python
from tempfile import NamedTemporaryFile
tf =NamedTemporaryFile(suffix='.db')
```

``` python
@patch
def user_id_fn(self:LisetteUsageLogger): return 'user-123'
tf=NamedTemporaryFile(suffix='.db')
logger = LisetteUsageLogger(tf.name)
litellm.callbacks = [logger]
```

``` python
slc = ','.join('id model user_id prompt_tokens completion_tokens total_tokens cached_tokens cache_creation_tokens cache_read_tokens web_search_requests response_cost'.split())
```

``` python
# litellm.set_verbose = True
```

A simple example:

``` python
chat = Chat('claude-sonnet-4-5-20250929')
r = chat("What is 2+2?")
```

``` python
time.sleep(0.1)
```

``` python
u = logger.usage(select=slc)[-1]; u
```

    Usage(id=1, timestamp=UNSET, model='claude-sonnet-4-5-20250929', user_id='user-123', prompt_tokens=14, completion_tokens=11, total_tokens=25, cached_tokens=0, cache_creation_tokens=0, cache_read_tokens=0, web_search_requests=0, response_cost=0.000207)

Our calculated cost matches litellm’s `response_cost`. In some cases it
might be better to use the custom calculation as we’ll see in the
remaining of this notebook:

``` python
test_eq(u.cost, u.response_cost)
```

Now, let’s test with streaming:

``` python
chat = Chat('claude-sonnet-4-5')
res = chat("Count from 1 to 5", stream=True)
for o in res: pass
```

``` python
time.sleep(0.1)
u = logger.usage(select=slc)[-1]; u
```

    Usage(id=2, timestamp=UNSET, model='claude-sonnet-4-5', user_id='user-123', prompt_tokens=15, completion_tokens=17, total_tokens=32, cached_tokens=0, cache_creation_tokens=0, cache_read_tokens=0, web_search_requests=0, response_cost=0.00030000000000000003)

``` python
test_eq(u.cost, u.response_cost)
```

Streaming logged successfully. Let’s also verify async chat calls are
logged properly.

``` python
chat_async = AsyncChat('claude-sonnet-4-5-20250929')
await chat_async("What is 3+3?")
```

3 + 3 = 6

<details>

- id: `chatcmpl-xxx`
- model: `claude-sonnet-4-5-20250929`
- finish_reason: `stop`
- usage:
  `Usage(completion_tokens=13, prompt_tokens=14, total_tokens=27, completion_tokens_details=CompletionTokensDetailsWrapper(accepted_prediction_tokens=None, audio_tokens=None, reasoning_tokens=0, rejected_prediction_tokens=None, text_tokens=13, image_tokens=None), prompt_tokens_details=PromptTokensDetailsWrapper(audio_tokens=None, cached_tokens=0, text_tokens=None, image_tokens=None, cache_creation_tokens=0, cache_creation_token_details=CacheCreationTokenDetails(ephemeral_5m_input_tokens=0, ephemeral_1h_input_tokens=0)), cache_creation_input_tokens=0, cache_read_input_tokens=0, inference_geo='not_available', speed=None)`

</details>

``` python
time.sleep(0.1)
u = logger.usage(select=slc)[-1]; u
```

    Usage(id=3, timestamp=UNSET, model='claude-sonnet-4-5-20250929', user_id='user-123', prompt_tokens=14, completion_tokens=13, total_tokens=27, cached_tokens=0, cache_creation_tokens=0, cache_read_tokens=0, web_search_requests=0, response_cost=0.00023700000000000001)

``` python
test_eq(u.cost, u.response_cost)
```

Finally, let’s test async streaming to ensure all API patterns are
covered.

``` python
res = await chat_async("Count from 10 to 15", stream=True)
async for o in res: pass
print(o)
```

    ModelResponse(id='chatcmpl-xxx', created=1000000000, model='claude-sonnet-4-5-20250929', object='chat.completion', system_fingerprint=None, choices=[Choices(finish_reason='stop', index=0, message=Message(content='10, 11, 12, 13, 14, 15', role='assistant', tool_calls=None, function_call=None, provider_specific_fields=None))], usage=Usage(completion_tokens=20, prompt_tokens=38, total_tokens=58, completion_tokens_details=CompletionTokensDetailsWrapper(accepted_prediction_tokens=None, audio_tokens=None, reasoning_tokens=0, rejected_prediction_tokens=None, text_tokens=None, image_tokens=None), prompt_tokens_details=None))

``` python
time.sleep(0.1)
u = logger.usage(select=slc)[-1]; u
```

    Usage(id=4, timestamp=UNSET, model='claude-sonnet-4-5-20250929', user_id='user-123', prompt_tokens=38, completion_tokens=20, total_tokens=58, cached_tokens=0, cache_creation_tokens=0, cache_read_tokens=0, web_search_requests=0, response_cost=0.00041400000000000003)

``` python
test_eq(u.cost, u.response_cost)
```

### Search

Now let’s run a prompt with web search:

``` python
flash = 'gemini/gemini-3-flash-preview'
chat = Chat(flash)
chat("What is the weather like in NYC? Search web.", search="m")
```

As of Thursday evening, March 5, 2026, the weather in New York City is
currently **chilly with light rain**.

### **Current Conditions (8:27 PM EST)**

- **Temperature:** 39°F (4°C)
- **Feels Like:** 33°F (1°C)
- **Conditions:** Light rain and patchy fog
- **Humidity:** 92%
- **Wind:** Northeast at 10–15 mph

### **Tonight’s Forecast**

Rain is expected to continue throughout the night with a near 100%
chance of precipitation. Temperatures will remain steady, dropping
slightly to a low of around **37°F (3°C)**.

### **Looking Ahead (Friday, March 6)**

- **Daytime:** Mostly cloudy and cooler. The rain is expected to taper
  off, leaving only a 10% chance of precipitation during the day.
- **High:** 40°F (4°C)
- **Low:** 36°F (2°C)

If you are heading out tonight, a waterproof jacket and umbrella are
highly recommended as steady rain and breezy conditions will persist.

<details>

- id: `chatcmpl-xxx`
- model: `gemini-3-flash-preview`
- finish_reason: `stop`
- usage:
  `Usage(completion_tokens=267, prompt_tokens=12, total_tokens=279, completion_tokens_details=CompletionTokensDetailsWrapper(accepted_prediction_tokens=None, audio_tokens=None, reasoning_tokens=None, rejected_prediction_tokens=None, text_tokens=267, image_tokens=None), prompt_tokens_details=PromptTokensDetailsWrapper(audio_tokens=None, cached_tokens=None, text_tokens=12, image_tokens=None), cache_read_input_tokens=None)`

</details>

``` python
time.sleep(0.1)
u = logger.usage(select=slc)[-1]; u
```

    Usage(id=5, timestamp=UNSET, model='gemini-3-flash-preview', user_id='user-123', prompt_tokens=12, completion_tokens=267, total_tokens=279, cached_tokens=None, cache_creation_tokens=None, cache_read_tokens=None, web_search_requests=2, response_cost=0.0008070000000000001)

``` python
assert u.web_search_requests
```

``` python
chat = Chat('claude-sonnet-4-5-20250929')
r = chat("What is the weather like in NYC? Search web.", search="m")
```

``` python
time.sleep(0.1)
u = logger.usage(select=slc)[-1]; u
```

    Usage(id=6, timestamp=UNSET, model='claude-sonnet-4-5-20250929', user_id='user-123', prompt_tokens=9121, completion_tokens=224, total_tokens=9345, cached_tokens=0, cache_creation_tokens=0, cache_read_tokens=0, web_search_requests=1, response_cost=0.030723)

``` python
assert u.web_search_requests
```

<div>

> **Important**
>
> Litellm’s `response_cost` doesn’t take web search request cost into
> account!

</div>

Now, this is a case where using the custom calculations is better as it
will also include the web search request cost:

``` python
test_eq(u.cost, u.response_cost + u.web_search_requests * model_prices[u.model]['web_search_prc'])
```

### Search with streaming

Web search with streaming:

<div>

> **Important**
>
> Gemini web search requests are part of `prompt_tokens_details` which
> is only included with `stream_options={"include_usage": True}` when
> `stream=True`.
>
> There is currently a bug with gemini web search request counts,
> [Issue](https://github.com/BerriAI/litellm/issues/17919) and
> [PR](https://github.com/BerriAI/litellm/pull/17921). Waiting for
> litellm 1.80.11 pypi release.

</div>

``` python
chat = Chat(flash)
res = chat("What is the weather like in NYC? Search web.", search="m", stream=True, stream_options={"include_usage": True})
for o in res: pass
# print(o)
```

``` python
time.sleep(0.1)
u = logger.usage(select=slc)[-1]; u
```

    Usage(id=7, timestamp=UNSET, model='gemini-3-flash-preview', user_id='user-123', prompt_tokens=12, completion_tokens=307, total_tokens=319, cached_tokens=None, cache_creation_tokens=None, cache_read_tokens=None, web_search_requests=1, response_cost=0.035927)

<div>

> **Important**
>
> Anthropic web search requests are available in `usage.server_tool_use`

</div>

``` python
chat = Chat('claude-sonnet-4-5')
res = chat("What is the weather like in NYC now? Search web", search="m", stream=True, stream_options={"include_usage": True})
for o in res: pass
# print(o)
```

``` python
time.sleep(0.1)
u = logger.usage(select=slc)[-1]; u
```

    Usage(id=9, timestamp=UNSET, model='claude-sonnet-4-5', user_id='user-123', prompt_tokens=9121, completion_tokens=216, total_tokens=9337, cached_tokens=0, cache_creation_tokens=0, cache_read_tokens=0, web_search_requests=1, response_cost=0.030603)

``` python
test_eq(u.cost, u.response_cost + u.web_search_requests * model_prices[u.model]['web_search_prc'])
```

``` python
test_eq(len(logger.usage()), 8)
```

------------------------------------------------------------------------

<a
href="https://github.com/AnswerDotAI/lisette/blob/main/lisette/usage.py#L55"
target="_blank" style="float:right; font-size:smaller">source</a>

### Usage.total_cost

``` python

def total_cost(
    sc:float=0.01
):

```

``` python
L(logger.usage()).attrgot('response_cost').sum()
```

    0.09923900000000001

``` python
disable_cachy()
```

A simple Gemini example (requires min tokens and running twice to see
`cached_tokens`):

``` python
# #| notest
# chat = Chat('gemini/gemini-2.5-flash')
# chat("What is 2+2?"* 500)
# time.sleep(5)
# chat("What is 2+2?"* 500)
```

``` python
# #| notest
# time.sleep(0.1) # wait for callback db write
# u = logger.usage(select=slc)[-1];u
```

``` python
# #| notest
# test_eq(len(logger.usage()), 10)
# test_eq(logger.usage()[-1].cached_tokens > 3000, True)
```

``` python
tf.close()
```
