Chapter 5: Model Management

The model is the cuttlefish's brain. Choose the right brain, and the cuttlefish works faster and smarter. This chapter helps you understand Hermes Agent's model system and pick the best option for your needs.

Core Concepts

Hermes Agent's model configuration is unified — all settings are stored in ~/.hermes/config.yaml, the single source of truth.

Important Change

The legacy OPENAI_BASE_URL and LLM_MODEL environment variables are deprecated. If you still have old settings in ~/.hermes/.env, they won't be read. Please use hermes model or edit config.yaml directly.

There are two ways to switch models:

bash

# Interactive switching (recommended)
hermes model

# Direct specification
/model <provider>/<model>

Major Providers Overview

Provider	Provider ID	Strengths	Free Tier	Recommended Model
Nous Portal	nous	Official Hermes, works out of the box	✅ Yes	hermes-4
OpenRouter	openrouter	200+ model marketplace	✅ Limited	Various flagships
Zhipu GLM	glm	Direct connection from China, no proxy needed	✅ Yes	glm-4-plus
Kimi / Moonshot	kimi	Long context, Chinese provider	✅ Yes	moonshot-v1
MiniMax	minimax	Chinese provider, multimodal	✅ Yes	minimax-01
Anthropic	anthropic	Claude series	❌	claude-sonnet-4
OpenAI	openai	GPT series	❌	gpt-4o
GitHub Copilot	copilot	Subscription-based, multi-model	Within subscription	gpt-5, claude
Hugging Face	huggingface	20+ open-source models	✅ $0.1/month	Various open-source
Local Models	Custom	Ollama/vLLM etc.	Free	Qwen2.5, Llama3

Recommended for Users in China

First choice: Zhipu GLM (glm) — direct connection, no proxy needed, generous free tier. Second choice: Kimi. If you need maximum power, use a proxy with OpenRouter.

Configure Providers

Option 1: Interactive Configuration (Recommended)

bash

hermes model

The wizard lists all available providers — use arrow keys to select, then enter your API Key.

Option 2: Edit config.yaml

yaml

# ~/.hermes/config.yaml
model:
  default: glm-4-plus
provider:
  default: glm

API Keys are still stored in ~/.hermes/.env:

bash

# Zhipu GLM
GLM_API_KEY=***

# OpenRouter
OPENROUTER_API_KEY=your_o..._key

# Anthropic
ANTHROPIC_API_KEY=your_a..._key

# Kimi
KIMI_API_KEY=***

Key Provider Configuration Details

Zhipu GLM (Recommended for Users in China)

bash

# 1. Get API Key: https://open.bigmodel.cn/
# 2. Configure
hermes model --provider glm
# 3. Enter API Key

Auto-Detection

The GLM provider automatically probes multiple endpoints (global, China, programming variant) to find the one that accepts your API Key. No need to manually set GLM_BASE_URL.

Hermes automatically handles GLM's 429 rate limiting, but it's recommended to avoid making too many concurrent requests.

OpenRouter

bash

# 1. Get API Key: https://openrouter.ai/keys
# 2. Configure
hermes model --provider openrouter
# 3. Select a model (e.g., anthropic/claude-sonnet-4)

OpenRouter provides 200+ models, with prices ranging from free to premium.

GitHub Copilot

bash

hermes model --provider copilot

Authentication methods (by priority):

gh auth token (requires GitHub Copilot subscription)
OAuth device code login (wizard guides you automatically)

Note

ghp_* type Personal Access Tokens are not supported. If gh auth token returns a ghp_* token, use OAuth login via hermes model.

Anthropic (Claude)

bash

hermes model --provider anthropic
# or shorthand
hermes model --provider claude

Supports three authentication methods: API Key, OAuth, and Claude Code credentials.

Local Models

Don't want to depend on cloud services? Run models locally.

Ollama (Simplest)

bash

# 1. Install Ollama: https://ollama.ai
# 2. Pull a model
ollama pull qwen2.5:14b

# 3. Configure Hermes
hermes model --provider custom --base-url http://localhost:11434/v1

Context Window

Ollama defaults to only 4k context. When the Agent uses tools, system prompts + tool definitions alone can fill this up. We recommend at least 16k-32k:

bash

# Set at startup
OLLAMA_NUM_PARALLEL=1 ollama serve

# Or create a Modelfile to set num_ctx
FROM qwen2.5:14b
PARAMETER num_ctx 32768

vLLM (GPU High Performance)

bash

# Start vLLM server
python -m vllm.entrypoints.openai.api_server \
  --model Qwen/Qwen2.5-14B-Instruct \
  --tool-call-parser hermes \
  --max-model-len 32k

# Configure Hermes
hermes model --provider custom --base-url http://localhost:8000/v1

llama.cpp (CPU / Apple Silicon)

bash

# Start llama-server
./llama-server -m model.gguf -c 32768 --jinja --port 8080

# Configure Hermes
hermes model --provider custom --base-url http://localhost:8080/v1

--jinja is Required

Without --jinja, llama-server completely ignores the tools parameter. Your model will try to write JSON tool calls in text, but Hermes won't recognize them — you'll see raw JSON output as messages.

Switching Models During a Session

In any conversation, you can temporarily switch with the /model command:

/model openrouter/anthropic/claude-sonnet-4
/model glm/glm-4-plus
/model custom                    # Auto-query local endpoint

Changes are persisted to config.yaml and survive restarts.

Auxiliary Models

Even if you've selected Nous Portal or another provider, some tools (vision, web summarization, MoA) still need an auxiliary model. Gemini Flash is used by default (via OpenRouter).

Simply set OPENROUTER_API_KEY to automatically enable these tools.

Model Selection Guide (Updated April 2026)

Use Case	Recommended Model	Reason
Daily chat	GLM-5.1	Direct connection from China, great value ($0.95/M)
Programming	Claude Sonnet 4.6	Strong code understanding, 1M context
Complex reasoning	GPT-5.4 Pro / Claude Opus 4.6	Strongest overall capability
Privacy-sensitive	Qwen3 (local)	Data never leaves your machine
Budget-conscious	GPT-5.4 Nano / DeepSeek V3.2	Extremely affordable ($0.2-0.26/M)
Long document processing	Gemini 2.5 Pro / Qwen3.6 Plus	1M ultra-long context
Experimentation	OpenRouter	200+ models to try freely

📊 For a complete model comparison and pricing overview, see Appendix E: Model Selection Guide.

Chapter 5: Model Management ​

Core Concepts ​

Major Providers Overview ​

Configure Providers ​

Option 1: Interactive Configuration (Recommended) ​

Option 2: Edit config.yaml ​

Key Provider Configuration Details ​

Zhipu GLM (Recommended for Users in China) ​

OpenRouter ​

GitHub Copilot ​

Anthropic (Claude) ​

Local Models ​

Ollama (Simplest) ​

vLLM (GPU High Performance) ​

llama.cpp (CPU / Apple Silicon) ​

Switching Models During a Session ​

Auxiliary Models ​

Model Selection Guide (Updated April 2026) ​

Further Reading ​

Chapter 5: Model Management

Core Concepts

Major Providers Overview

Configure Providers

Option 1: Interactive Configuration (Recommended)

Option 2: Edit config.yaml

Key Provider Configuration Details

Zhipu GLM (Recommended for Users in China)

OpenRouter

GitHub Copilot

Anthropic (Claude)

Local Models

Ollama (Simplest)

vLLM (GPU High Performance)

llama.cpp (CPU / Apple Silicon)

Switching Models During a Session

Auxiliary Models

Model Selection Guide (Updated April 2026)

Further Reading