Smart Router

The smart router classifies each request's complexity and routes it to the most appropriate model based on a routing profile you specify. This saves cost on simple requests while ensuring complex ones get capable models.

The router is pure rule-based logic — deterministic, with zero external calls.

How to use it

Set model to a routing profile name instead of a specific model ID:

{
  "model": "auto",
  "messages": [{"role": "user", "content": "Explain recursion."}]
}

Profile	Aliases	Behavior
`auto`	`balanced`, `default`	Balanced cost and quality. Recommended default.
`eco`	`cheap`, `budget`	Cheapest capable model per tier.
`premium`	`best`, `quality`	Best available model regardless of cost.
`free`	`oss`, `open`	Free NVIDIA Nemotron models, tiered by complexity (Gemini 3.1 Flash-Lite fallback).

You can also use model aliases as shortcuts to specific models:

Alias	Resolves to
`gpt5`	`openai/gpt-5.2`
`sonnet`	`anthropic/claude-sonnet-4-6`
`opus`	`anthropic/claude-opus-4-8`
`gemini`	`google/gemini-3.1-pro`
`flash`	`google/gemini-2.5-flash`
`grok`	`xai/grok-4-fast-reasoning`
`deepseek`	`deepseek/deepseek-chat`

Complexity tiers

The router classifies each request into one of four tiers based on a weighted score:

Tier	Score range	Description
`Simple`	score < 0.0	Short, factual, conversational
`Medium`	0.0 – 0.2	Moderate complexity, some technical content
`Complex`	0.2 – 0.4	Multi-step, technical, or domain-specific
`Reasoning`	score ≥ 0.4	Deep reasoning, proofs, multi-question

The 15 scoring dimensions

The scorer analyzes user message content across 15 weighted dimensions. Higher score = more complex request.

#	Dimension	Weight	What it checks
1	Token count	0.08	Short messages score lower; long messages score higher
2	Code presence	0.15	Backticks, code keywords (`fn`, `class`, `async`, etc.)
3	Reasoning markers	0.18	Words like "prove", "analyze", "step by step", "explain why"
4	Technical terms	0.10	"algorithm", "kubernetes", "distributed", "concurrent"
5	Creative markers	0.05	"story", "poem", "brainstorm", "narrative"
6	Simple indicators	0.02	"hello", "what is", "translate" — negative signal
7	Multi-step patterns	0.12	"first", "then", "next", "step 1", numbered lists
8	Question complexity	0.05	Number of `?` marks — more questions = more complex
9	Agentic task markers	0.04	"read file", "deploy", "run command", "install"
10	Math/logic	0.06	Equations, operators, "formula", "calculate"
11	Language complexity	0.04	Average word length as a vocabulary proxy
12	Conversation depth	0.03	Number of messages in context
13	Tool usage	0.04	`+0.8` if request includes tool definitions
14	Output format complexity	0.02	"json", "csv", "xml", "structured"
15	Domain specificity	0.02	"medical", "legal", "clinical", "regulatory"

The weights sum to 1.0. The weighted sum produces a score that maps to a complexity tier.

Routing table

Each (Profile, Tier) pair maps to a specific model:

Tier	eco	auto	premium	free
Simple	`deepseek/deepseek-chat`	`google/gemini-2.5-flash`	`openai/gpt-4o`	`nvidia/nvidia/llama-3.1-nemotron-nano-8b-v1`
Medium	`google/gemini-2.5-flash-lite`	`xai/grok-code-fast-1`	`anthropic/claude-sonnet-4-6`	`nvidia/nvidia/llama-3.3-nemotron-super-49b-v1`
Complex	`deepseek/deepseek-chat`	`google/gemini-3.1-pro`	`anthropic/claude-opus-4-8`	`nvidia/nvidia/llama-3.1-nemotron-ultra-253b-v1`
Reasoning	`deepseek/deepseek-reasoner`	`xai/grok-4-fast-reasoning`	`openai/o3`	`nvidia/nvidia/llama-3.1-nemotron-ultra-253b-v1`

free profile vs. the free tier: the free profile is a routing choice — it sends each tier to a zero-cost NVIDIA Nemotron model (a different one per tier). Separately, any request whose estimated cost is zero (which is what routing to a zero-cost model produces) skips the 402 payment challenge entirely: no wallet or signature needed. Free-tier traffic is subject to per-IP and aggregate rate limits. See Free tier for details.

Example: same prompt, different profiles

Prompt: "Hello!"
Tier: Simple (short, matches simple indicator keywords)

  eco     → deepseek/deepseek-chat       ($0.28/M input)
  auto    → google/gemini-2.5-flash      ($0.30/M input)
  premium → openai/gpt-4o                ($2.50/M input)

Prompt: "Prove step by step that quicksort has O(n log n) average complexity.
         Analyze edge cases and compare with mergesort."
Tier: Reasoning (reasoning markers: "prove", "step by step", "analyze", "compare")

  eco     → deepseek/deepseek-reasoner   ($0.28/M input)
  auto    → xai/grok-4-fast-reasoning    ($0.20/M input)
  premium → openai/o3                    ($2.00/M input)

Note

The router's keyword dimensions score user message content only, but the conversation-depth dimension counts every message — system and assistant turns included — so a long history can nudge a request across a tier boundary.

Bypassing the router

To use a specific model, pass its full ID directly:

{
  "model": "anthropic/claude-opus-4-8",
  "messages": [...]
}

Or use a short alias:

{
  "model": "opus",
  "messages": [...]
}

See GET /v1/models for the full list of model IDs.