Local-first AI model routing for serious agents. One endpoint. Any provider. The router figures out the rest.
---
name: sage-router
version: 4.153.0
description: Local-first AI model routing for serious agents. One endpoint. Any provider. The router figures out the rest.
env:
- SAGE_ROUTER_HOME (required: path to sage-router repo)
- SAGE_ROUTER_DISABLED_PROVIDERS (optional: comma-separated provider names to suppress)
- SAGE_ROUTER_DISABLED_MODELS (optional: comma-separated model IDs or provider/model keys to suppress)
- SAGE_ROUTER_OLLAMA_TIMEOUT_SECONDS (optional, default 120)
- SAGE_ROUTER_OLLAMA_AUTO_PULL_PATTERNS (optional, default :cloud)
- OPENCLAW_GATEWAY_TOKEN (optional: token for OpenClaw gateway agent bridge)
- SAGE_ROUTER_OPENCLAW_TIMEOUT_SECONDS (optional, default 90)
---
# Sage Router
HTTP server on `:8788` that routes chat requests to the optimal provider based on intent classification.
## Endpoints
- `POST /v1/chat/completions` — OpenAI-compatible; routes automatically
- `POST /v1/messages` — Anthropic Messages API compatible; translates to/from OpenAI format internally
- `GET /health` — Provider status, model lists, routing debug
Any Anthropic-compatible tool (Cursor, Aider, Claude Code, Zed, Continue, OpenHands) can point at `http://localhost:8788` as the API base URL. Both streaming and non-streaming are supported.
## Active Providers
Providers are discovered from `~/.openclaw/openclaw.json` at startup.
Rules:
- skips the router's own `sage-router` provider entry to avoid recursion
- resolves `${ENV_VAR}` values for `baseUrl` and `apiKey`
- includes OpenClaw gateway `openai-codex` as a virtual provider when the auth profile exists
- routes `openai-codex` through the OpenClaw gateway by default so it follows the active OpenClaw/Codex auth source; set `SAGE_ROUTER_OPENAI_CODEX_DIRECT_RESPONSES=1` only to force direct Codex Responses calls
- recognizes Google Gemini providers from `generativelanguage.googleapis.com`
- auto-discovers Google models when the provider exists but `models` is empty in `openclaw.json`
- normalizes `anthropic` or Anthropic-hosted `anthropic-messages` providers onto the local Dario proxy at `localhost:3456`
- starts the Dario user service when Anthropic compatibility is needed and the service is not already running; in Docker, the image bundles `@askalf/dario` and autostarts `dario proxy` when credentials are mounted at `/root/.dario`
- supports temporary provider suppression via `SAGE_ROUTER_DISABLED_PROVIDERS=name1,name2`
`GET /health` shows:
- `configured`: all discovered providers
- `providers`: reachable providers with model lists
- `disabled`: providers suppressed by env
## Routing Logic
The router does **not** perform mid-stream switching. Once a request is sent to a provider, the full response is returned or the attempt fails. If it fails, the next candidate in the chain is tried sequentially. There is no partial-output fallback or streaming handoff between providers.
Flow:
- detect intent from the latest user message
- estimate complexity from prompt length
- score every reachable (provider, model) pair globally — not per-provider — from `openclaw.json`
- in `local-first`, operate as local-strict: reject centralized Internet API providers and only allow local/LAN/Tailnet endpoints plus approved decentralized providers such as Darkbloom, with Ollama `:cloud` models excluded
- for `GENERAL`, blend static heuristics with persisted empirical latency stats by provider and model
- rank candidates by API type, model-name hints, complexity, and measured latency
- attempt the top `SAGE_ROUTER_MAX_PROVIDER_ATTEMPTS` candidates in order
- `sage-router` provider (the router itself, model `auto`) is scored as a low-priority recursive fallback, never preferred
Intent scoring is generic, for example:
- code and analysis strongly favor Anthropic/OpenAI-style reasoning models
- general/realtime requests prefer fast direct providers first
- general traffic learns from real successful request latency over time, with light exploration for cold providers/models
- complex prompts boost larger reasoning models and penalize mini/haiku-class models
Intent is detected by keyword matching on the latest user message. Complexity is estimated by word count.
## API
- `GET /health` — JSON with reachable providers, configured providers, and disabled providers
- `POST /v1/chat/completions` — OpenAI-compatible; routes automatically
## Notes
- `openai-codex` is kept as an optional bridge, not a required first hop.
- Anthropic compatibility is provided through Dario, so `anthropic` can stay in `openclaw.json` while routing locally through `dario`.
- The repo `systemd` unit is template-style and expects local machine values in `~/.config/sage-router/sage-router.env`.
- Empirical latency memory is persisted at `~/.cache/sage-router/latency-stats.json` by default.
- When the OpenClaw gateway model-set path is unhealthy, the helper falls back to running without provider/model overrides instead of failing hard.
- If any provider starts misbehaving, suppress it with `SAGE_ROUTER_DISABLED_PROVIDERS` instead of editing the router.
- GitHub workflows now include CI syntax checks and CodeQL analysis for Python + JavaScript.
- See `BRANCH_PROTECTION.md` for the exact required-check setup on GitHub.
- `provider-profiles.json` includes a `grok-sso` template for the OpenClaw xAI auth plugin's local SuperGrok-backed proxy.
## Install
Install the user service from the repo copy:
```bash
mkdir -p ~/.config/systemd/user ~/.config/sage-router
cp systemd/sage-router.service ~/.config/systemd/user/sage-router.service
cp systemd/sage-router.env.example ~/.config/sage-router/sage-router.env
# edit ~/.config/sage-router/sage-router.env for your machine
systemctl --user daemon-reload
systemctl --user enable --now sage-router.service
```
Notes:
- the repo unit is now env-driven and does not hardcode your home path, Node version, or workspace location
- set `SAGE_ROUTER_HOME` to the actual repo path on your machine
- optionally set `SAGE_ROUTER_PATH_PREFIX` if your Python, Node, or Dario bins are not already on PATH
If an Anthropic provider is detected and Dario is not installed yet, install Dario first:
- GitHub: https://github.com/askalf/dario
## Service
```bash
systemctl --user status sage-router
systemctl --user restart sage-router
journalctl --user -u sage-router -f # live logs
```
## Docker production notes
- Docker image includes Node, Python, Sage Router, and `@askalf/dario`.
- Mount host Dario credentials as `~/.dario:/root/.dario` for Anthropic-compatible Claude routing.
- Enable llama.cpp classifier sidecar with `docker compose --profile classifier up -d` and `SAGE_ROUTER_INTENT_CLASSIFIER_ENABLED=1`.
- Production classifier flags: `SAGE_ROUTER_INTENT_CLASSIFIER_PROVIDER=llamacpp`, `SAGE_ROUTER_INTENT_CLASSIFIER_BASE_URL=http://llamacpp-classifier:8080`, `SAGE_ROUTER_INTENT_CLASSIFIER_MODEL=classifier`.
## Router profiles
Sage Router supports named routing profiles in `router-profiles.json` next to `router.py`.
Request a profile with any of:
- `model: "sage-router/<profile>"`
- `model: "<profile>"`
- top-level `profile`, `routerProfile`, or `sageRouterProfile`
Profile fields currently supported:
- `route`: `fast`, `balanced`, `best`, `local-first`, `realtime`
- `thinking`: `low`, `medium`, `high`
- capability/quality flags: `requiresQuality`, `requiresReasoning`, `requiresTools`, `frontierLargeOnly`, `frontierOrReasoningTools`, `suppressIntermediateToolText`, `qualitySensitive`, `reasoning`, `tools`, `preferTools`, `json`, `vision`, `document`, `longContext`
- constraints: `allowProviders`, `denyProviders`, `allowModels`, `denyModels`, `minParamsB`
Current profiles:
- `frontier`: default high-quality frontier routing profile. Forces best/high, quality-sensitive routing, suppresses tool-call narration, and blocks weak/tiny filler models without hard-pinning a brittle frontier allowlist.
- `frontier-large`: strict frontier-large-only routing.
- `fast-local`: low-latency local-first routing.
- `coding-max`: high-thinking code route with weak model exclusions.
don't have the plugin yet? install it then click "run inline in claude" again.