Client Libraries
Plano provides a unified interface that works seamlessly with multiple client libraries and tools. You can use your preferred client library without changing your existing code - just point it to Plano’s gateway endpoints.
Supported Clients
OpenAI SDK - Full compatibility with OpenAI’s official client
Anthropic SDK - Native support for Anthropic’s client library
cURL - Direct HTTP requests for any programming language
Custom HTTP Clients - Any HTTP client that supports REST APIs
Gateway Endpoints
Plano exposes three main endpoints:
Endpoint |
Purpose |
|---|---|
|
OpenAI-compatible chat completions (LLM Gateway) |
|
OpenAI Responses API with conversational state management (LLM Gateway) |
|
Anthropic-compatible messages (LLM Gateway) |
OpenAI (Python) SDK
The OpenAI SDK works with any provider through Plano’s OpenAI-compatible endpoint.
Installation:
pip install openai
Basic Usage:
from openai import OpenAI
# Point to Plano's LLM Gateway
client = OpenAI(
api_key="test-key", # Can be any value for local testing
base_url="http://127.0.0.1:12000/v1"
)
# Use any model configured in your arch_config.yaml
completion = client.chat.completions.create(
model="gpt-4o-mini", # Or use :ref:`model aliases <model_aliases>` like "fast-model"
max_tokens=50,
messages=[
{
"role": "user",
"content": "Hello, how are you?"
}
]
)
print(completion.choices[0].message.content)
Streaming Responses:
from openai import OpenAI
client = OpenAI(
api_key="test-key",
base_url="http://127.0.0.1:12000/v1"
)
stream = client.chat.completions.create(
model="gpt-4o-mini",
max_tokens=50,
messages=[
{
"role": "user",
"content": "Tell me a short story"
}
],
stream=True
)
# Collect streaming chunks
for chunk in stream:
if chunk.choices[0].delta.content:
print(chunk.choices[0].delta.content, end="")
Using with Non-OpenAI Models:
The OpenAI SDK can be used with any provider configured in Plano:
# Using Claude model through OpenAI SDK
completion = client.chat.completions.create(
model="claude-3-5-sonnet-20241022",
max_tokens=50,
messages=[
{
"role": "user",
"content": "Explain quantum computing briefly"
}
]
)
# Using Ollama model through OpenAI SDK
completion = client.chat.completions.create(
model="llama3.1",
max_tokens=50,
messages=[
{
"role": "user",
"content": "What's the capital of France?"
}
]
)
OpenAI Responses API (Conversational State)
The OpenAI Responses API (v1/responses) enables multi-turn conversations with automatic state management. Plano handles conversation history for you, so you don’t need to manually include previous messages in each request.
See Conversational State for detailed configuration and storage backend options.
Installation:
pip install openai
Basic Multi-Turn Conversation:
from openai import OpenAI
# Point to Plano's LLM Gateway
client = OpenAI(
api_key="test-key",
base_url="http://127.0.0.1:12000/v1"
)
# First turn - creates a new conversation
response = client.chat.completions.create(
model="gpt-4o-mini",
messages=[
{"role": "user", "content": "My name is Alice"}
]
)
# Extract response_id for conversation continuity
response_id = response.id
print(f"Assistant: {response.choices[0].message.content}")
# Second turn - continues the conversation
# Plano automatically retrieves and merges previous context
response = client.chat.completions.create(
model="gpt-4o-mini",
messages=[
{"role": "user", "content": "What's my name?"}
],
metadata={"response_id": response_id} # Reference previous conversation
)
print(f"Assistant: {response.choices[0].message.content}")
# Output: "Your name is Alice"
Using with Any Provider:
The Responses API works with any LLM provider configured in Plano:
# Multi-turn conversation with Claude
response = client.chat.completions.create(
model="claude-3-5-sonnet-20241022",
messages=[
{"role": "user", "content": "Let's discuss quantum physics"}
]
)
response_id = response.id
# Continue conversation - Plano manages state regardless of provider
response = client.chat.completions.create(
model="claude-3-5-sonnet-20241022",
messages=[
{"role": "user", "content": "Tell me more about entanglement"}
],
metadata={"response_id": response_id}
)
Key Benefits:
Reduced payload size: No need to send full conversation history in each request
Provider flexibility: Use any configured LLM provider with state management
Automatic context merging: Plano handles conversation continuity behind the scenes
Production-ready storage: Configure PostgreSQL or memory storage based on your needs
Anthropic (Python) SDK
The Anthropic SDK works with any provider through Plano’s Anthropic-compatible endpoint.
Installation:
pip install anthropic
Basic Usage:
import anthropic
# Point to Plano's LLM Gateway
client = anthropic.Anthropic(
api_key="test-key", # Can be any value for local testing
base_url="http://127.0.0.1:12000"
)
# Use any model configured in your arch_config.yaml
message = client.messages.create(
model="claude-3-5-sonnet-20241022",
max_tokens=50,
messages=[
{
"role": "user",
"content": "Hello, please respond briefly!"
}
]
)
print(message.content[0].text)
Streaming Responses:
import anthropic
client = anthropic.Anthropic(
api_key="test-key",
base_url="http://127.0.0.1:12000"
)
with client.messages.stream(
model="claude-3-5-sonnet-20241022",
max_tokens=50,
messages=[
{
"role": "user",
"content": "Tell me about artificial intelligence"
}
]
) as stream:
# Collect text deltas
for text in stream.text_stream:
print(text, end="")
# Get final assembled message
final_message = stream.get_final_message()
final_text = "".join(block.text for block in final_message.content if block.type == "text")
Using with Non-Anthropic Models:
The Anthropic SDK can be used with any provider configured in Plano:
# Using OpenAI model through Anthropic SDK
message = client.messages.create(
model="gpt-4o-mini",
max_tokens=50,
messages=[
{
"role": "user",
"content": "Explain machine learning in simple terms"
}
]
)
# Using Ollama model through Anthropic SDK
message = client.messages.create(
model="llama3.1",
max_tokens=50,
messages=[
{
"role": "user",
"content": "What is Python programming?"
}
]
)
cURL Examples
For direct HTTP requests or integration with any programming language:
OpenAI-Compatible Endpoint:
# Basic request
curl -X POST http://127.0.0.1:12000/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer test-key" \
-d '{
"model": "gpt-4o-mini",
"messages": [
{"role": "user", "content": "Hello!"}
],
"max_tokens": 50
}'
# Using :ref:`model aliases <model_aliases>`
curl -X POST http://127.0.0.1:12000/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "fast-model",
"messages": [
{"role": "user", "content": "Summarize this text..."}
],
"max_tokens": 100
}'
# Streaming request
curl -X POST http://127.0.0.1:12000/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-4o-mini",
"messages": [
{"role": "user", "content": "Tell me a story"}
],
"stream": true,
"max_tokens": 200
}'
Anthropic-Compatible Endpoint:
# Basic request
curl -X POST http://127.0.0.1:12000/v1/messages \
-H "Content-Type: application/json" \
-H "x-api-key: test-key" \
-H "anthropic-version: 2023-06-01" \
-d '{
"model": "claude-3-5-sonnet-20241022",
"max_tokens": 50,
"messages": [
{"role": "user", "content": "Hello Claude!"}
]
}'
Cross-Client Compatibility
One of Plano’s key features is cross-client compatibility. You can:
Use OpenAI SDK with Claude Models:
# OpenAI client calling Claude model
from openai import OpenAI
client = OpenAI(base_url="http://127.0.0.1:12000/v1", api_key="test")
response = client.chat.completions.create(
model="claude-3-5-sonnet-20241022", # Claude model
messages=[{"role": "user", "content": "Hello"}]
)
Use Anthropic SDK with OpenAI Models:
# Anthropic client calling OpenAI model
import anthropic
client = anthropic.Anthropic(base_url="http://127.0.0.1:12000", api_key="test")
response = client.messages.create(
model="gpt-4o-mini", # OpenAI model
max_tokens=50,
messages=[{"role": "user", "content": "Hello"}]
)
Mix and Match with Model Aliases:
# Same code works with different underlying models
def ask_question(client, question):
return client.chat.completions.create(
model="reasoning-model", # Alias could point to any provider
messages=[{"role": "user", "content": question}]
)
# Works regardless of what "reasoning-model" actually points to
openai_client = OpenAI(base_url="http://127.0.0.1:12000/v1", api_key="test")
response = ask_question(openai_client, "Solve this math problem...")
Error Handling
OpenAI SDK Error Handling:
from openai import OpenAI
import openai
client = OpenAI(base_url="http://127.0.0.1:12000/v1", api_key="test")
try:
completion = client.chat.completions.create(
model="nonexistent-model",
messages=[{"role": "user", "content": "Hello"}]
)
except openai.NotFoundError as e:
print(f"Model not found: {e}")
except openai.APIError as e:
print(f"API error: {e}")
Anthropic SDK Error Handling:
import anthropic
client = anthropic.Anthropic(base_url="http://127.0.0.1:12000", api_key="test")
try:
message = client.messages.create(
model="nonexistent-model",
max_tokens=50,
messages=[{"role": "user", "content": "Hello"}]
)
except anthropic.NotFoundError as e:
print(f"Model not found: {e}")
except anthropic.APIError as e:
print(f"API error: {e}")
Best Practices
Use Model Aliases: Instead of hardcoding provider-specific model names, use semantic aliases:
# Good - uses semantic alias
model = "fast-model"
# Less ideal - hardcoded provider model
model = "openai/gpt-4o-mini"
Environment-Based Configuration: Use different model aliases for different environments:
import os
# Development uses cheaper/faster models
model = os.getenv("MODEL_ALIAS", "dev.chat.v1")
response = client.chat.completions.create(
model=model,
messages=[{"role": "user", "content": "Hello"}]
)
Graceful Fallbacks: Implement fallback logic for better reliability:
def chat_with_fallback(client, messages, primary_model="smart-model", fallback_model="fast-model"):
try:
return client.chat.completions.create(model=primary_model, messages=messages)
except Exception as e:
print(f"Primary model failed, trying fallback: {e}")
return client.chat.completions.create(model=fallback_model, messages=messages)
See Also
Supported Providers & Configuration - Configure your providers and see available models
Model Aliases - Create semantic model names
LLM Routing - Intelligent routing capabilities