Chapter 5: Model Management
The model is the cuttlefish's brain. Choose the right brain, and the cuttlefish works faster and smarter. This chapter helps you understand Hermes Agent's model system and pick the best option for your needs.
Core Concepts
Hermes Agent's model configuration is unified — all settings are stored in ~/.hermes/config.yaml, the single source of truth.
Important Change
The legacy OPENAI_BASE_URL and LLM_MODEL environment variables are deprecated. If you still have old settings in ~/.hermes/.env, they won't be read. Please use hermes model or edit config.yaml directly.
There are two ways to switch models:
# Interactive switching (recommended)
hermes model
# Direct specification
/model <provider>/<model>Major Providers Overview
| Provider | Provider ID | Strengths | Free Tier | Recommended Model |
|---|---|---|---|---|
| Nous Portal | nous | Official Hermes, works out of the box | ✅ Yes | hermes-4 |
| OpenRouter | openrouter | 200+ model marketplace | ✅ Limited | Various flagships |
| Zhipu GLM | glm | Direct connection from China, no proxy needed | ✅ Yes | glm-4-plus |
| Kimi / Moonshot | kimi | Long context, Chinese provider | ✅ Yes | moonshot-v1 |
| MiniMax | minimax | Chinese provider, multimodal | ✅ Yes | minimax-01 |
| Anthropic | anthropic | Claude series | ❌ | claude-sonnet-4 |
| OpenAI | openai | GPT series | ❌ | gpt-4o |
| GitHub Copilot | copilot | Subscription-based, multi-model | Within subscription | gpt-5, claude |
| Hugging Face | huggingface | 20+ open-source models | ✅ $0.1/month | Various open-source |
| Local Models | Custom | Ollama/vLLM etc. | Free | Qwen2.5, Llama3 |
Recommended for Users in China
First choice: Zhipu GLM (glm) — direct connection, no proxy needed, generous free tier. Second choice: Kimi. If you need maximum power, use a proxy with OpenRouter.
Configure Providers
Option 1: Interactive Configuration (Recommended)
hermes modelThe wizard lists all available providers — use arrow keys to select, then enter your API Key.
Option 2: Edit config.yaml
# ~/.hermes/config.yaml
model:
default: glm-4-plus
provider:
default: glmAPI Keys are still stored in ~/.hermes/.env:
# Zhipu GLM
GLM_API_KEY=***
# OpenRouter
OPENROUTER_API_KEY=your_o..._key
# Anthropic
ANTHROPIC_API_KEY=your_a..._key
# Kimi
KIMI_API_KEY=***Key Provider Configuration Details
Zhipu GLM (Recommended for Users in China)
# 1. Get API Key: https://open.bigmodel.cn/
# 2. Configure
hermes model --provider glm
# 3. Enter API KeyAuto-Detection
The GLM provider automatically probes multiple endpoints (global, China, programming variant) to find the one that accepts your API Key. No need to manually set GLM_BASE_URL.
Hermes automatically handles GLM's 429 rate limiting, but it's recommended to avoid making too many concurrent requests.
OpenRouter
# 1. Get API Key: https://openrouter.ai/keys
# 2. Configure
hermes model --provider openrouter
# 3. Select a model (e.g., anthropic/claude-sonnet-4)OpenRouter provides 200+ models, with prices ranging from free to premium.
GitHub Copilot
hermes model --provider copilotAuthentication methods (by priority):
gh auth token(requires GitHub Copilot subscription)- OAuth device code login (wizard guides you automatically)
Note
ghp_* type Personal Access Tokens are not supported. If gh auth token returns a ghp_* token, use OAuth login via hermes model.
Anthropic (Claude)
hermes model --provider anthropic
# or shorthand
hermes model --provider claudeSupports three authentication methods: API Key, OAuth, and Claude Code credentials.
Local Models
Don't want to depend on cloud services? Run models locally.
Ollama (Simplest)
# 1. Install Ollama: https://ollama.ai
# 2. Pull a model
ollama pull qwen2.5:14b
# 3. Configure Hermes
hermes model --provider custom --base-url http://localhost:11434/v1Context Window
Ollama defaults to only 4k context. When the Agent uses tools, system prompts + tool definitions alone can fill this up. We recommend at least 16k-32k:
# Set at startup
OLLAMA_NUM_PARALLEL=1 ollama serve
# Or create a Modelfile to set num_ctx
FROM qwen2.5:14b
PARAMETER num_ctx 32768vLLM (GPU High Performance)
# Start vLLM server
python -m vllm.entrypoints.openai.api_server \
--model Qwen/Qwen2.5-14B-Instruct \
--tool-call-parser hermes \
--max-model-len 32k
# Configure Hermes
hermes model --provider custom --base-url http://localhost:8000/v1llama.cpp (CPU / Apple Silicon)
# Start llama-server
./llama-server -m model.gguf -c 32768 --jinja --port 8080
# Configure Hermes
hermes model --provider custom --base-url http://localhost:8080/v1--jinja is Required
Without --jinja, llama-server completely ignores the tools parameter. Your model will try to write JSON tool calls in text, but Hermes won't recognize them — you'll see raw JSON output as messages.
Switching Models During a Session
In any conversation, you can temporarily switch with the /model command:
/model openrouter/anthropic/claude-sonnet-4
/model glm/glm-4-plus
/model custom # Auto-query local endpointChanges are persisted to config.yaml and survive restarts.
Auxiliary Models
Even if you've selected Nous Portal or another provider, some tools (vision, web summarization, MoA) still need an auxiliary model. Gemini Flash is used by default (via OpenRouter).
Simply set OPENROUTER_API_KEY to automatically enable these tools.
Model Selection Guide (Updated April 2026)
| Use Case | Recommended Model | Reason |
|---|---|---|
| Daily chat | GLM-5.1 | Direct connection from China, great value ($0.95/M) |
| Programming | Claude Sonnet 4.6 | Strong code understanding, 1M context |
| Complex reasoning | GPT-5.4 Pro / Claude Opus 4.6 | Strongest overall capability |
| Privacy-sensitive | Qwen3 (local) | Data never leaves your machine |
| Budget-conscious | GPT-5.4 Nano / DeepSeek V3.2 | Extremely affordable ($0.2-0.26/M) |
| Long document processing | Gemini 2.5 Pro / Qwen3.6 Plus | 1M ultra-long context |
| Experimentation | OpenRouter | 200+ models to try freely |
📊 For a complete model comparison and pricing overview, see Appendix E: Model Selection Guide.
Further Reading
- Official docs: AI Providers
- Model routing and fallback: Provider Routing