SDK
LiteLLM
1. Initial Setup
import os
import nora
import litellm
# Set API key and optional API URL
nora_api_key = "your-nora-api-key"
api_url = "https://noraobservabilityback-production.up.railway.app/v1"
nora.init(
api_key=nora_api_key,
environment="lite-llm-stream",
api_url=api_url
)
2. Extracting Streaming Deltas
LiteLLM streaming chunks may be in different formats. Use a helper to extract content:
def _extract_delta(chunk):
try:
return chunk["choices"][0]["delta"].get("content")
except Exception:
pass
try:
return chunk.choices[0].delta.content
except Exception:
return None
3. Single Streaming Example
Wrap the LiteLLM streaming call inside a Nora Trace Group
Collect deltas to form the full response
with nora.trace_group(name="litellm_stream_chat"):
stream = litellm.completion(
model="gpt-3.5-turbo",
messages=[{"role": "user", "content": "Count from 1 to 5"}],
stream=True,
max_tokens=50,
)
parts = []
for chunk in stream:
delta = _extract_delta(chunk)
if delta:
parts.append(delta)
result = "".join(parts)
print("Streaming result:", result)
Key point: Node-level operations (streaming chunks) are automatically captured by Nora.
Only the Trace Group is required around the streaming execution.
4. Multiple Streaming Calls
Multiple streaming calls can be executed sequentially within the same Trace Group
outputs = []
with nora.trace_group(name="litellm_stream_multi"):
for i in range(2):
stream = litellm.completion(
model="gpt-3.5-turbo",
messages=[{"role": "user", "content": f"Say number {i}"}],
stream=True,
max_tokens=20,
)
content = ""
for chunk in stream:
delta = _extract_delta(chunk)
if delta:
content += delta
outputs.append(content)
print("All streaming outputs:", outputs)
Each stream is tracked separately but belongs to the same Trace Group
Was this page helpful?