Skip to content

Chapter 5: Model Management

The model is the cuttlefish's brain. Choose the right brain, and the cuttlefish works faster and smarter. This chapter helps you understand Hermes Agent's model system and pick the best option for your needs.

Core Concepts

Hermes Agent's model configuration is unified — all settings are stored in ~/.hermes/config.yaml, the single source of truth.

Important Change

The legacy OPENAI_BASE_URL and LLM_MODEL environment variables are deprecated. If you still have old settings in ~/.hermes/.env, they won't be read. Please use hermes model or edit config.yaml directly.

There are two ways to switch models:

bash
# Interactive switching (recommended)
hermes model

# Direct specification
/model <provider>/<model>

Major Providers Overview

ProviderProvider IDStrengthsFree TierRecommended Model
Nous PortalnousOfficial Hermes, works out of the box✅ Yeshermes-4
OpenRouteropenrouter200+ model marketplace✅ LimitedVarious flagships
Zhipu GLMglmDirect connection from China, no proxy needed✅ Yesglm-4-plus
Kimi / MoonshotkimiLong context, Chinese provider✅ Yesmoonshot-v1
MiniMaxminimaxChinese provider, multimodal✅ Yesminimax-01
AnthropicanthropicClaude seriesclaude-sonnet-4
OpenAIopenaiGPT seriesgpt-4o
GitHub CopilotcopilotSubscription-based, multi-modelWithin subscriptiongpt-5, claude
Hugging Facehuggingface20+ open-source models✅ $0.1/monthVarious open-source
Local ModelsCustomOllama/vLLM etc.FreeQwen2.5, Llama3

Recommended for Users in China

First choice: Zhipu GLM (glm) — direct connection, no proxy needed, generous free tier. Second choice: Kimi. If you need maximum power, use a proxy with OpenRouter.


Configure Providers

bash
hermes model

The wizard lists all available providers — use arrow keys to select, then enter your API Key.

Option 2: Edit config.yaml

yaml
# ~/.hermes/config.yaml
model:
  default: glm-4-plus
provider:
  default: glm

API Keys are still stored in ~/.hermes/.env:

bash
# Zhipu GLM
GLM_API_KEY=***

# OpenRouter
OPENROUTER_API_KEY=your_o..._key

# Anthropic
ANTHROPIC_API_KEY=your_a..._key

# Kimi
KIMI_API_KEY=***

Key Provider Configuration Details

bash
# 1. Get API Key: https://open.bigmodel.cn/
# 2. Configure
hermes model --provider glm
# 3. Enter API Key

Auto-Detection

The GLM provider automatically probes multiple endpoints (global, China, programming variant) to find the one that accepts your API Key. No need to manually set GLM_BASE_URL.

Hermes automatically handles GLM's 429 rate limiting, but it's recommended to avoid making too many concurrent requests.

OpenRouter

bash
# 1. Get API Key: https://openrouter.ai/keys
# 2. Configure
hermes model --provider openrouter
# 3. Select a model (e.g., anthropic/claude-sonnet-4)

OpenRouter provides 200+ models, with prices ranging from free to premium.

GitHub Copilot

bash
hermes model --provider copilot

Authentication methods (by priority):

  1. gh auth token (requires GitHub Copilot subscription)
  2. OAuth device code login (wizard guides you automatically)

Note

ghp_* type Personal Access Tokens are not supported. If gh auth token returns a ghp_* token, use OAuth login via hermes model.

Anthropic (Claude)

bash
hermes model --provider anthropic
# or shorthand
hermes model --provider claude

Supports three authentication methods: API Key, OAuth, and Claude Code credentials.


Local Models

Don't want to depend on cloud services? Run models locally.

Ollama (Simplest)

bash
# 1. Install Ollama: https://ollama.ai
# 2. Pull a model
ollama pull qwen2.5:14b

# 3. Configure Hermes
hermes model --provider custom --base-url http://localhost:11434/v1

Context Window

Ollama defaults to only 4k context. When the Agent uses tools, system prompts + tool definitions alone can fill this up. We recommend at least 16k-32k:

bash
# Set at startup
OLLAMA_NUM_PARALLEL=1 ollama serve

# Or create a Modelfile to set num_ctx
FROM qwen2.5:14b
PARAMETER num_ctx 32768

vLLM (GPU High Performance)

bash
# Start vLLM server
python -m vllm.entrypoints.openai.api_server \
  --model Qwen/Qwen2.5-14B-Instruct \
  --tool-call-parser hermes \
  --max-model-len 32k

# Configure Hermes
hermes model --provider custom --base-url http://localhost:8000/v1

llama.cpp (CPU / Apple Silicon)

bash
# Start llama-server
./llama-server -m model.gguf -c 32768 --jinja --port 8080

# Configure Hermes
hermes model --provider custom --base-url http://localhost:8080/v1

--jinja is Required

Without --jinja, llama-server completely ignores the tools parameter. Your model will try to write JSON tool calls in text, but Hermes won't recognize them — you'll see raw JSON output as messages.


Switching Models During a Session

In any conversation, you can temporarily switch with the /model command:

/model openrouter/anthropic/claude-sonnet-4
/model glm/glm-4-plus
/model custom                    # Auto-query local endpoint

Changes are persisted to config.yaml and survive restarts.


Auxiliary Models

Even if you've selected Nous Portal or another provider, some tools (vision, web summarization, MoA) still need an auxiliary model. Gemini Flash is used by default (via OpenRouter).

Simply set OPENROUTER_API_KEY to automatically enable these tools.


Model Selection Guide (Updated April 2026)

Use CaseRecommended ModelReason
Daily chatGLM-5.1Direct connection from China, great value ($0.95/M)
ProgrammingClaude Sonnet 4.6Strong code understanding, 1M context
Complex reasoningGPT-5.4 Pro / Claude Opus 4.6Strongest overall capability
Privacy-sensitiveQwen3 (local)Data never leaves your machine
Budget-consciousGPT-5.4 Nano / DeepSeek V3.2Extremely affordable ($0.2-0.26/M)
Long document processingGemini 2.5 Pro / Qwen3.6 Plus1M ultra-long context
ExperimentationOpenRouter200+ models to try freely

📊 For a complete model comparison and pricing overview, see Appendix E: Model Selection Guide.

Further Reading


Released under CC BY-NC-SA 4.0 | GitHub