LLM Routing
With the rapid proliferation of large language models (LLMs) — each optimized for different strengths, style, or latency/cost profile — routing has become an essential technique to operationalize the use of different models. Plano provides three distinct routing approaches to meet different use cases: Model-based routing, Alias-based routing, and Preference-aligned routing. This enables optimal performance, cost efficiency, and response quality by matching requests with the most suitable model from your available LLM fleet.
Note
For details on supported model providers, configuration options, and client libraries, see LLM Providers.
Routing Methods
Model-based routing
Direct routing allows you to specify exact provider and model combinations using the format provider/model-name:
Use provider-specific names like
openai/gpt-5.2oranthropic/claude-sonnet-4-5Provides full control and transparency over which model handles each request
Ideal for production workloads where you want predictable routing behavior
Configuration
Configure your LLM providers with specific provider/model names:
listeners:
egress_traffic:
address: 0.0.0.0
port: 12000
message_format: openai
timeout: 30s
llm_providers:
- model: openai/gpt-5.2
access_key: $OPENAI_API_KEY
default: true
- model: openai/gpt-5
access_key: $OPENAI_API_KEY
- model: anthropic/claude-sonnet-4-5
access_key: $ANTHROPIC_API_KEY
Client usage
Clients specify exact models:
# Direct provider/model specification
response = client.chat.completions.create(
model="openai/gpt-5.2",
messages=[{"role": "user", "content": "Hello!"}]
)
response = client.chat.completions.create(
model="anthropic/claude-sonnet-4-5",
messages=[{"role": "user", "content": "Write a story"}]
)
Alias-based routing
Alias-based routing lets you create semantic model names that decouple your application from specific providers:
Use meaningful names like
fast-model,reasoning-model, orplano.summarize.v1(see Model Aliases)Maps semantic names to underlying provider models for easier experimentation and provider switching
Ideal for applications that want abstraction from specific model names while maintaining control
Configuration
Configure semantic aliases that map to underlying models:
listeners:
egress_traffic:
address: 0.0.0.0
port: 12000
message_format: openai
timeout: 30s
llm_providers:
- model: openai/gpt-5.2
access_key: $OPENAI_API_KEY
- model: openai/gpt-5
access_key: $OPENAI_API_KEY
- model: anthropic/claude-sonnet-4-5
access_key: $ANTHROPIC_API_KEY
model_aliases:
# Model aliases - friendly names that map to actual provider names
fast-model:
target: gpt-5.2
reasoning-model:
target: gpt-5
creative-model:
target: claude-sonnet-4-5
Client usage
Clients use semantic names:
# Using semantic aliases
response = client.chat.completions.create(
model="fast-model", # Routes to best available fast model
messages=[{"role": "user", "content": "Quick summary please"}]
)
response = client.chat.completions.create(
model="reasoning-model", # Routes to best reasoning model
messages=[{"role": "user", "content": "Solve this complex problem"}]
)
Preference-aligned routing (Arch-Router)
Preference-aligned routing uses the Arch-Router model to pick the best LLM based on domain, action, and your configured preferences instead of hard-coding a model.
Domain: High-level topic of the request (e.g., legal, healthcare, programming).
Action: What the user wants to do (e.g., summarize, generate code, translate).
Routing preferences: Your mapping from (domain, action) to preferred models.
Arch-Router analyzes each prompt to infer domain and action, then applies your preferences to select a model. This decouples routing policy (how to choose) from model assignment (what to run), making routing transparent, controllable, and easy to extend as you add or swap models.
Configuration
To configure preference-aligned dynamic routing, define routing preferences that map domains and actions to specific models:
listeners:
egress_traffic:
address: 0.0.0.0
port: 12000
message_format: openai
timeout: 30s
llm_providers:
- model: openai/gpt-5.2
access_key: $OPENAI_API_KEY
default: true
- model: openai/gpt-5
access_key: $OPENAI_API_KEY
routing_preferences:
- name: code understanding
description: understand and explain existing code snippets, functions, or libraries
- name: complex reasoning
description: deep analysis, mathematical problem solving, and logical reasoning
- model: anthropic/claude-sonnet-4-5
access_key: $ANTHROPIC_API_KEY
routing_preferences:
- name: creative writing
description: creative content generation, storytelling, and writing assistance
- name: code generation
description: generating new code snippets, functions, or boilerplate based on user prompts
Client usage
Clients can let the router decide or still specify aliases:
# Let Arch-Router choose based on content
response = client.chat.completions.create(
messages=[{"role": "user", "content": "Write a creative story about space exploration"}]
# No model specified - router will analyze and choose claude-sonnet-4-5
)
Arch-Router
The Arch-Router is a state-of-the-art preference-based routing model specifically designed to address the limitations of traditional LLM routing. This compact 1.5B model delivers production-ready performance with low latency and high accuracy while solving key routing challenges.
Addressing Traditional Routing Limitations:
Human Preference Alignment Unlike benchmark-driven approaches, Arch-Router learns to match queries with human preferences by using domain-action mappings that capture subjective evaluation criteria, ensuring routing decisions align with real-world user needs.
Flexible Model Integration The system supports seamlessly adding new models for routing without requiring retraining or architectural modifications, enabling dynamic adaptation to evolving model landscapes.
Preference-Encoded Routing Provides a practical mechanism to encode user preferences through domain-action mappings, offering transparent and controllable routing decisions that can be customized for specific use cases.
To support effective routing, Arch-Router introduces two key concepts:
Domain – the high-level thematic category or subject matter of a request (e.g., legal, healthcare, programming).
Action – the specific type of operation the user wants performed (e.g., summarization, code generation, booking appointment, translation).
Both domain and action configs are associated with preferred models or model variants. At inference time, Arch-Router analyzes the incoming prompt to infer its domain and action using semantic similarity, task indicators, and contextual cues. It then applies the user-defined routing preferences to select the model best suited to handle the request.
In summary, Arch-Router demonstrates:
Structured Preference Routing: Aligns prompt request with model strengths using explicit domain–action mappings.
Transparent and Controllable: Makes routing decisions transparent and configurable, empowering users to customize system behavior.
Flexible and Adaptive: Supports evolving user needs, model updates, and new domains/actions without retraining the router.
Production-Ready Performance: Optimized for low-latency, high-throughput applications in multi-model environments.
Combining Routing Methods
You can combine static model selection with dynamic routing preferences for maximum flexibility:
llm_providers:
- model: openai/gpt-5.2
access_key: $OPENAI_API_KEY
default: true
- model: openai/gpt-5
access_key: $OPENAI_API_KEY
routing_preferences:
- name: complex_reasoning
description: deep analysis and complex problem solving
- model: anthropic/claude-sonnet-4-5
access_key: $ANTHROPIC_API_KEY
routing_preferences:
- name: creative_tasks
description: creative writing and content generation
model_aliases:
# Model aliases - friendly names that map to actual provider names
fast-model:
target: gpt-5.2
reasoning-model:
target: gpt-5
# Aliases that can also participate in dynamic routing
creative-model:
target: claude-sonnet-4-5
This configuration allows clients to:
Use direct model selection:
model="fast-model"Let the router decide: No model specified, router analyzes content
Example Use Cases
Here are common scenarios where Arch-Router excels:
Coding Tasks: Distinguish between code generation requests (“write a Python function”), debugging needs (“fix this error”), and code optimization (“make this faster”), routing each to appropriately specialized models.
Content Processing Workflows: Classify requests as summarization (“summarize this document”), translation (“translate to Spanish”), or analysis (“what are the key themes”), enabling targeted model selection.
Multi-Domain Applications: Accurately identify whether requests fall into legal, healthcare, technical, or general domains, even when the subject matter isn’t explicitly stated in the prompt.
Conversational Routing: Track conversation context to identify when topics shift between domains or when the type of assistance needed changes mid-conversation.
Best practices
💡Consistent Naming: Route names should align with their descriptions.
❌ Bad:
` {"name": "math", "description": "handle solving quadratic equations"} `✅ Good:
` {"name": "quadratic_equation", "description": "solving quadratic equations"} `
💡 Clear Usage Description: Make your route names and descriptions specific, unambiguous, and minimizing overlap between routes. The Router performs better when it can clearly distinguish between different types of requests.
❌ Bad:
` {"name": "math", "description": "anything closely related to mathematics"} `✅ Good:
` {"name": "math", "description": "solving, explaining math problems, concepts"} `
💡Nouns Descriptor: Preference-based routers perform better with noun-centric descriptors, as they offer more stable and semantically rich signals for matching.
💡Domain Inclusion: for best user experience, you should always include a domain route. This helps the router fall back to domain when action is not confidently inferred.
Unsupported Features
The following features are not supported by the Arch-Router model:
Multi-modality: The model is not trained to process raw image or audio inputs. It can handle textual queries about these modalities (e.g., “generate an image of a cat”), but cannot interpret encoded multimedia data directly.
Function calling: Arch-Router is designed for semantic preference matching, not exact intent classification or tool execution. For structured function invocation, use models in the Plano Function Calling collection instead.
System prompt dependency: Arch-Router routes based solely on the user’s conversation history. It does not use or rely on system prompts for routing decisions.