category
Call all LLM APIs using the OpenAI format. Use Bedrock, Azure, OpenAI, Cohere, Anthropic, Ollama, Sagemaker, HuggingFace, Replicate (100+ LLMs)
Call all LLM APIs using the OpenAI format [Bedrock, Huggingface, VertexAI, TogetherAI, Azure, OpenAI, etc.]
LiteLLM manages:
- Translate inputs to provider's
completion
,embedding
, andimage_generation
endpoints - Consistent output, text responses will always be available at
['choices'][0]['message']['content']
- Retry/fallback logic across multiple deployments (e.g. Azure/OpenAI) - Router
- Set Budgets & Rate limits per project, api key, model OpenAI Proxy Server
Jump to OpenAI Proxy Docs
Jump to Supported LLM Providers
🚨 Stable Release: v1.34.1
Support for more providers. Missing a provider or LLM Platform, raise a feature request.
Usage (Docs)
Important
LiteLLM v1.0.0 now requires openai>=1.0.0
. Migration guide here
pip install litellm
from litellm import completion import os ## set ENV variables os.environ["OPENAI_API_KEY"] = "your-openai-key" os.environ["COHERE_API_KEY"] = "your-cohere-key" messages = [{ "content": "Hello, how are you?","role": "user"}] # openai call response = completion(model="gpt-3.5-turbo", messages=messages) # cohere call response = completion(model="command-nightly", messages=messages) print(response)
Call any model supported by a provider, with model=<provider_name>/<model_name>
. There might be provider-specific details here, so refer to provider docs for more information
Async (Docs)
from litellm import acompletion import asyncio async def test_get_response(): user_message = "Hello, how are you?" messages = [{"content": user_message, "role": "user"}] response = await acompletion(model="gpt-3.5-turbo", messages=messages) return response response = asyncio.run(test_get_response()) print(response)
Streaming (Docs)
liteLLM supports streaming the model response back, pass stream=True
to get a streaming iterator in response.
Streaming is supported for all models (Bedrock, Huggingface, TogetherAI, Azure, OpenAI, etc.)
from litellm import completion response = completion(model="gpt-3.5-turbo", messages=messages, stream=True) for part in response: print(part.choices[0].delta.content or "") # claude 2 response = completion('claude-2', messages, stream=True) for part in response: print(part.choices[0].delta.content or "")
Logging Observability (Docs)
LiteLLM exposes pre defined callbacks to send data to Lunary, Langfuse, DynamoDB, s3 Buckets, Helicone, Promptlayer, Traceloop, Athina, Slack
from litellm import completion ## set env variables for logging tools os.environ["LUNARY_PUBLIC_KEY"] = "your-lunary-public-key" os.environ["LANGFUSE_PUBLIC_KEY"] = "" os.environ["LANGFUSE_SECRET_KEY"] = "" os.environ["ATHINA_API_KEY"] = "your-athina-api-key" os.environ["OPENAI_API_KEY"] # set callbacks litellm.success_callback = ["lunary", "langfuse", "athina"] # log input/output to lunary, langfuse, supabase, athina etc #openai call response = completion(model="gpt-3.5-turbo", messages=[{"role": "user", "content": "Hi 👋 - i'm openai"}])
OpenAI Proxy - (Docs)
Set Budgets & Rate limits across multiple projects
The proxy provides:
📖 Proxy Endpoints - Swagger Docs
Quick Start Proxy - CLI
pip install 'litellm[proxy]'
Step 1: Start litellm proxy
$ litellm --model huggingface/bigcode/starcoder #INFO: Proxy running on http://0.0.0.0:4000
Step 2: Make ChatCompletions Request to Proxy
import openai # openai v1.0.0+ client = openai.OpenAI(api_key="anything",base_url="http://0.0.0.0:4000") # set proxy to base_url # request sent to model set on litellm proxy, `litellm --model` response = client.chat.completions.create(model="gpt-3.5-turbo", messages = [ { "role": "user", "content": "this is a test request, write a short poem" } ]) print(response)
Proxy Key Management (Docs)
UI on /ui
on your proxy server
Set budgets and rate limits across multiple projects POST /key/generate
Request
curl 'http://0.0.0.0:4000/key/generate' \ --header 'Authorization: Bearer sk-1234' \ --header 'Content-Type: application/json' \ --data-raw '{"models": ["gpt-3.5-turbo", "gpt-4", "claude-2"], "duration": "20m","metadata": {"user": "ishaan@berri.ai", "team": "core-infra"}}'
Expected Response
{ "key": "sk-kdEXbIqZRwEeEiHwdg7sFA", # Bearer token "expires": "2023-11-19T01:38:25.838000+00:00" # datetime object }
Supported Providers (Docs)
Provider | Completion | Streaming | Async Completion | Async Streaming | Async Embedding | Async Image Generation |
---|---|---|---|---|---|---|
openai | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
azure | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
aws - sagemaker | ✅ | ✅ | ✅ | ✅ | ✅ | |
aws - bedrock | ✅ | ✅ | ✅ | ✅ | ✅ | |
google - vertex_ai [Gemini] | ✅ | ✅ | ✅ | ✅ | ||
google - palm | ✅ | ✅ | ✅ | ✅ | ||
google AI Studio - gemini | ✅ | ✅ | ||||
mistral ai api | ✅ | ✅ | ✅ | ✅ | ✅ | |
cloudflare AI Workers | ✅ | ✅ | ✅ | ✅ | ||
cohere | ✅ | ✅ | ✅ | ✅ | ✅ | |
anthropic | ✅ | ✅ | ✅ | ✅ | ||
huggingface | ✅ | ✅ | ✅ | ✅ | ✅ | |
replicate | ✅ | ✅ | ✅ | ✅ | ||
together_ai | ✅ | ✅ | ✅ | ✅ | ||
openrouter | ✅ | ✅ | ✅ | ✅ | ||
ai21 | ✅ | ✅ | ✅ | ✅ | ||
baseten | ✅ | ✅ | ✅ | ✅ | ||
vllm | ✅ | ✅ | ✅ | ✅ | ||
nlp_cloud | ✅ | ✅ | ✅ | ✅ | ||
aleph alpha | ✅ | ✅ | ✅ | ✅ | ||
petals | ✅ | ✅ | ✅ | ✅ | ||
ollama | ✅ | ✅ | ✅ | ✅ | ||
deepinfra | ✅ | ✅ | ✅ | ✅ | ||
perplexity-ai | ✅ | ✅ | ✅ | ✅ | ||
Groq AI | ✅ | ✅ | ✅ | ✅ | ||
anyscale | ✅ | ✅ | ✅ | ✅ | ||
voyage ai | ✅ | |||||
xinference [Xorbits Inference] | ✅ |
- 登录 发表评论