Mistral Vibe

Mistral Vibe is Mistral AI’s open-source (Apache 2.0) command-line interface for AI-assisted software development. Like Claude Code, it provides agentic coding capabilities — reading files, executing commands, and making changes to your codebase directly from the terminal. Unlike Claude Code, Vibe can be pointed at any OpenAI-compatible backend, including self-hosted models via vLLM, LiteLLM, or Open-WebUI.

What is Mistral Vibe?

Mistral Vibe is a terminal-based AI coding assistant that can:

Read and understand your entire codebase
Execute shell commands via a stateful bash session
Create, edit, and delete files with search-and-replace precision
Run grep (ripgrep) for content search across your project
Delegate tasks to subagents for parallel work
Connect to MCP servers for extended capabilities (git, filesystem, web fetch)
Use custom skills and agents for specialized workflows

Built-in Tools

Vibe ships with these tools out of the box — no plugins required:

Tool	Description	Claude Code Equivalent
`read_file`	Read file contents	`Read`
`write_file`	Create or overwrite files	`Write`
`search_replace`	Diff-patch style editing	`Edit`
`grep`	ripgrep content search	`Grep`
`bash`	Stateful shell session	`Bash`
`task`	Spawn subagent for parallel work	`Agent`
`ask_user_question`	Prompt for user input	`AskUserQuestion`
`webfetch`	Fetch URL content	`WebFetch`
`websearch`	Search the web	`WebSearch`

Why Vibe? (vs. Alternatives)

vs. Claude Code

Feature	Claude Code	Mistral Vibe
License	Proprietary	Apache 2.0 (open source)
Model	Claude Opus/Sonnet (cloud only)	Any OpenAI-compatible endpoint
Self-hosted	No	Yes (vLLM, LiteLLM, Ollama, etc.)
Cost	$20-200/month subscription	Free (self-host) or pay-per-token
Context	1M tokens	Depends on model (128K+ with Devstral)
Plugins	Marketplace + community	Skills + MCP servers
Subagents	Yes (Agent tool)	Yes (task tool + TOML agents)
Project rules	CLAUDE.md (hierarchical)	AGENTS.md (project root only)
SWE-bench	~79.6% (Sonnet 4.6)	72.2% (Devstral 2 123B)

Verdict: Claude Code is more polished and has higher benchmark scores. Vibe wins on openness, self-hosting, and cost control. If your organization can’t send code to Anthropic’s cloud, Vibe with a self-hosted model is the answer. If you can, consider using both — Claude Code for complex tasks, Vibe for routine work on your own hardware.

Mistral’s model lineup has evolved significantly. The Devstral 2 family now includes purpose-built models for different workflows:

devstral-medium-latest ($0.40/$2.00 per M tokens) - Frontier agentic coding (SWE-bench 72.2%), best default for Vibe’s multi-file tool-use loop
devstral-small-latest ($0.10/$0.30 per M tokens) - Cheap fast fanout for parallel research tasks
codestral-latest ($0.30/$0.90 per M tokens) - FIM / inline completion (not agent loop)
mistral-large-latest ($0.50/$1.50 per M tokens) - Frontier generalist for hard reasoning
magistral-medium-latest ($2.00/$5.00 per M tokens) - Reasoning-tuned for architecture/planning

These models are available through both the official Mistral API and self-hosted setups. The alias re-mapping in LiteLLM lets you route Claude Code’s internal subagent spawns to the most appropriate model tier automatically.

vs. Gemini CLI

Feature	Gemini CLI	Mistral Vibe
Context	2M+ tokens	Model-dependent (128K-256K typical)
Self-hosted	No (Google cloud only)	Yes
Subagents	Yes (`.gemini/agents/`)	Yes (`.vibe/agents/`)
MCP support	Limited	Full (stdio + HTTP transports)
License	Proprietary	Apache 2.0

For more on Gemini CLI, see the Gemini Code guide.

vs. Le Chat (chat.mistral.ai — $14.99/month web UI)

chat.mistral.ai (“Le Chat”) is Mistral’s hosted web product — the same role claude.ai plays for Anthropic or chatgpt.com plays for OpenAI. Le Chat Pro is $14.99/month (commonly rounded to “$15/mo”), notably cheaper than Claude Pro or ChatGPT Plus at $20/mo. Pro gives you:

Unlimited chat with Mistral Large, Codestral, Pixtral, and other flagship Mistral models
Document upload and analysis, web search, image generation (Flux), Canvas (in-browser code editor)
Custom agents and a project workspace
Voice mode in supported regions

	Le Chat Pro (web, $15/mo)	Mistral Vibe (CLI)
Cost	$14.99/mo flat	Free (self-host) or pay-per-token (Mistral API)
Interface	Browser	Terminal + your editor
Reads your repo	Upload files manually	Yes — full filesystem access
Runs shell commands	No (Canvas sandbox only)	Yes — real shell on your machine
Edits files in place	No (copy/paste out)	Yes
MCP / subagents / skills	No	Yes
Best for	Q&A, brainstorming, one-off snippets	Multi-file refactors, agentic edits, automation

They complement each other — they don’t compete. Pay $15/mo for Le Chat if you want a polished web UI for chat and brainstorm tasks, and keep Vibe CLI for the actual coding work where you need the model to act on your repo. Many people run both.

Installation

Skip everything below if you just want to try it. With a Mistral API key from console.mistral.ai in hand:

curl -LsSf https://mistral.ai/vibe/install.sh | bash      # 1. install
echo 'MISTRAL_API_KEY=sk-...' > ~/.vibe/.env              # 2. key (replace sk-...)
printf 'active_model = "mistral-medium-3.5"\n\n[provider]\napi_key = "${MISTRAL_API_KEY}"\n' > ~/.vibe/config.toml  # 3. 5-line config
vibe -p "reply with exactly: pong" --max-turns 1 --output text   # 4. smoke test → pong

If you get pong, you’re done — start a real session with vibe inside any project directory. Read on for self-hosting, MCP servers, skills, and orchestration patterns.

macOS / Linux

curl -LsSf https://mistral.ai/vibe/install.sh | bash

Or via uv (the fast Python package manager):

uv tool install mistral-vibe

Or via pip:

pip install mistral-vibe

Prerequisite: Python 3.12+. Check with python3 --version.

After installation, verify:

vibe --version

PATH Setup

If you get command not found: vibe, add ~/.local/bin to your PATH:

echo 'export PATH="$HOME/.local/bin:$PATH"' >> ~/.zshrc
source ~/.zshrc

Configuration

Vibe’s configuration lives in TOML files:

Global: ~/.vibe/config.toml — applies to all projects
Per-project: .vibe/config.toml in the project root — overrides global
API keys: ~/.vibe/.env — loaded automatically

Vibe works against any OpenAI-compatible chat-completions endpoint. The two paths most users pick:

	Option A — Official Mistral API	Option B — Your own self-hosted backend
Endpoint	`https://api.mistral.ai/v1`	Your Open-WebUI / LiteLLM / vLLM URL
Auth	Mistral API key from console.mistral.ai	API key you generate on your gateway
Cost	Pay-per-token (La Plateforme billing)	Flat: your hardware + electricity (or GPU rental)
Models	All Mistral models (medium, large, codestral, etc.)	Whatever you serve (e.g. Devstral Small 2, Devstral 2 123B, Qwen Coder)
Network	Public internet → Mistral	Your LAN / VPN / Tailscale → your GPU host
Best for	Quickest start — works immediately with a credit card	Operators with existing GPU infrastructure or strict data-residency needs

You can keep both configured side-by-side and switch via the active_model setting or --agent.

Option A — Official Mistral API

The fastest way to start — Mistral hosts the models, you pay per token. Pricing lives at mistral.ai/pricing. The Devstral 2 family now provides purpose-built models for different workflows.

As of Vibe 2.10.1, mistral is a built-in provider — you do not need the explicit [[providers]] and [[models]] blocks for the official API. The whole ~/.vibe/config.toml can be five lines:

active_model = "devstral-medium-latest"

[provider]
api_key = "${MISTRAL_API_KEY}"

The verbose template below is still useful when you want to pin model aliases, override pricing for cost tracking, or run multiple providers side-by-side. Start minimal; add structure only when you need it.

Recommended model tiers:

devstral-medium-latest - Best default for agentic coding (replaces mistral-medium-3.5)
devstral-small-latest - Cheap parallel research ($0.10/$0.30 per M tokens)
mistral-large-latest - Hard reasoning tasks ($0.50/$1.50 per M tokens)
magistral-medium-latest - Architecture/planning ($2.00/$5.00 per M tokens)

Get a Mistral API key:
- Sign up at console.mistral.ai
- Workspace → API Keys → Create new key
- Key starts with no fixed prefix (opaque token)
Create ~/.vibe/.env:
Terminal window
```
MISTRAL_API_KEY=<your-mistral-key>
```
Create ~/.vibe/config.toml (verbose form — useful for multi-provider setups):

active_model = "mistral-medium-3.5"
enable_telemetry = false
auto_compact_threshold = 200000
api_timeout = 720.0

[[providers]]
name = "mistral"
api_base = "https://api.mistral.ai/v1"
api_key_env_var = "MISTRAL_API_KEY"
api_style = "openai"
backend = "mistral"

[[models]]
name = "mistral-medium-latest"
provider = "mistral"
alias = "mistral-medium-3.5"
temperature = 0.2
auto_compact_threshold = 200000
thinking = "off"
# Pricing as of 2026-05 — verify at mistral.ai/pricing before relying on these for budget tracking
input_price = 0.4   # $/MTok
output_price = 2.0  # $/MTok

[[models]]
name = "codestral-latest"
provider = "mistral"
alias = "codestral"
temperature = 0.2
input_price = 0.2
output_price = 0.6

Smoke test:

vibe -p "reply with exactly: pong" --max-turns 1 --output text
# → pong

The official API has no implicit cap — a runaway agentic loop can spend real money. Set a workspace spending limit at console.mistral.ai → Billing → Limits, and always pass --max-price and --max-turns in programmatic mode:

vibe -p "..." --max-price 0.50 --max-turns 25

Option B — Your own self-hosted backend

Point Vibe at a model you run yourself — typically Devstral Small 2 (Apache 2.0, fits on one RTX 4090), Devstral 2 123B (higher quality, 2× H100/A100 class), or any other tool-calling model served via vLLM behind an Open-WebUI or LiteLLM gateway. If you don’t have the gateway stood up yet, jump down to Self-Hosted Backend Setup first, then come back here for the client config.

Get an API key from your gateway:
- Generate one in your Open-WebUI / LiteLLM admin UI, or use vLLM’s bearer token if you serve it directly.
- If you serve vLLM without auth, any non-empty string works (api_key_env_var is still required — Vibe checks the variable exists).
Create ~/.vibe/.env:
```
LOCAL_API_KEY=<your-api-key>
```
Create ~/.vibe/config.toml:

active_model = "local"
enable_telemetry = false

# Context & UI
auto_compact_threshold = 200000
context_warnings = true
api_timeout = 720.0

# Project context injection
include_commit_signature = true
include_project_context = true

[project_context]
default_commit_count = 3
timeout_seconds = 2.0

# ── Provider ──────────────────────────────────
[[providers]]
name = "local"
api_base = "https://your-openwebui-instance.example.com/api"
api_key_env_var = "LOCAL_API_KEY"
api_style = "openai"
backend = "generic"

# ── Model ─────────────────────────────────────
[[models]]
name = "devstral-small-2"   # must match a model_name in your gateway / served-model-name
provider = "local"
alias = "local"
temperature = 0.2
auto_compact_threshold = 200000
thinking = "off"
input_price = 0.0
output_price = 0.0

# If your gateway exposes a different model, change `name` to match.
# Common examples: "devstral-small-2", "devstral-2-123b", "qwen-coder-30b".
# A wrong `name` produces: 400 Bad Request — Invalid model name.

# ── Tool Permissions ──────────────────────────
[tools.bash]
permission = "ask"
default_timeout = 120
max_output_bytes = 8000
allowlist = [
  "git", "ls", "cat", "echo", "pwd", "which", "python", "python3",
  "pip", "uv", "docker", "docker compose", "curl", "jq", "rg", "grep",
  "npm", "npx", "node", "make", "cargo", "go",
]
denylist = ["rm -rf /", "dd", "mkfs"]
sensitive_patterns = ["sudo"]

[tools.read_file]
permission = "always"

[tools.grep]
permission = "always"

[tools.search_replace]
permission = "ask"

[tools.write_file]
permission = "ask"
max_write_bytes = 64000
create_parent_dirs = true
sensitive_patterns = ["**/.env", "**/.env.*"]

[tools.webfetch]
permission = "ask"
default_timeout = 30
max_content_bytes = 60000

[tools.task]
permission = "always"

# ── Session Logging ───────────────────────────
[session_logging]
enabled = true
save_dir = "~/.vibe/logs"
session_prefix = "session"

Configuration Reference

Every config key can also be set via environment variable with the VIBE_ prefix:

Config Key	Env Variable	Default	Description
`active_model`	`VIBE_ACTIVE_MODEL`	—	Model alias to use
`auto_compact_threshold`	`VIBE_AUTO_COMPACT_THRESHOLD`	200000	Token count before auto-compaction
`api_timeout`	`VIBE_API_TIMEOUT`	720.0	HTTP timeout in seconds
`context_warnings`	`VIBE_CONTEXT_WARNINGS`	false	Warn when approaching context limit
`vim_keybindings`	`VIBE_VIM_KEYBINDINGS`	false	Enable vim bindings in TUI
`autocopy_to_clipboard`	—	false	Copy last response to clipboard
`enable_auto_update`	—	true	Auto-update Vibe

Tool Permission Levels

Permission	Behavior
`"always"`	Tool runs without confirmation
`"ask"`	Prompts for approval each time
`"never"`	Tool is disabled

Best practice: Set read_file, grep, and task to "always" (safe, read-only operations). Keep bash, write_file, and search_replace at "ask" to prevent unintended changes.

MCP Servers

MCP (Model Context Protocol) servers extend Vibe with additional tools. They run as local subprocesses and communicate over stdio or HTTP.

Prerequisites

# Node.js ≥18 (for filesystem MCP)
node --version

# uv (for Python MCP servers)
which uvx || curl -LsSf https://astral.sh/uv/install.sh | sh

Recommended MCP Servers

Add these to your config.toml:

# ── MCP Servers ───────────────────────────────

# Filesystem — directory tree, glob search, move, metadata
[[mcp_servers]]
name = "fs"
transport = "stdio"
command = "npx"
args = ["-y", "@modelcontextprotocol/server-filesystem", "/path/to/your/projects"]
startup_timeout_sec = 15
tool_timeout_sec = 60
sampling_enabled = false

# Git — status, diff, log, branch, commit, checkout
[[mcp_servers]]
name = "git"
transport = "stdio"
command = "uvx"
args = ["mcp-server-git"]
startup_timeout_sec = 15
tool_timeout_sec = 60
sampling_enabled = false

# Web fetch — URL to Markdown
[[mcp_servers]]
name = "web"
transport = "stdio"
command = "uvx"
args = ["mcp-server-fetch"]
startup_timeout_sec = 15
tool_timeout_sec = 30
sampling_enabled = false

MCP Capability Matrix

Vibe Need	Built-in Tool	MCP Server	MCP Tools Added
Read files	`read_file`	`fs` (richer)	`read_text_file`, `read_multiple_files`, `read_media_file`
Write files	`write_file`	`fs`	`write_file`, `edit_file`
Find files by name	—	`fs`	`search_files`, `directory_tree`
Search file content	`grep` (ripgrep)	—	Already covered
Shell commands	`bash` (stateful)	—	Already covered
Git operations	—	`git`	`git_status`, `git_diff`, `git_log`, `git_commit`, `git_add`, `git_branch`, etc.
Fetch URLs	`webfetch`	`web`	`fetch` (HTML → Markdown)

sampling_enabled

When true (the default), an MCP server can call back into the LLM during tool execution. Set to false for simple servers (filesystem, git, fetch) to save tokens and reduce latency.

Skills & Agents

Skills (Slash Commands & Ambient Context)

Skills are Markdown files that provide instructions, rules, or workflows. They live at:

Project-local: .vibe/skills/<name>/SKILL.md (committed to repo)
User-global: ~/.vibe/skills/<name>/SKILL.md

Skill Types

Type	Frontmatter	Behavior
Slash command	`user-invocable: true`	Triggered by typing `/skill-name`
Ambient context	`user-invocable: false`	Loaded automatically as background context

Example: Code Review Skill

Create ~/.vibe/skills/code-review/SKILL.md:

---
name: code-review
description: Structured code review on current diff
allowed-tools: read_file bash grep
user-invocable: true
---

# Code Review

Review the current git diff for:
1. Security vulnerabilities (injection, secrets, auth bypass)
2. Correctness (logic errors, edge cases, error handling)
3. Operations (config changes shipped? rollback plan?)
4. Style (conventional commits, no drive-by changes)

Process:
1. Run `git diff --staged` or `git diff HEAD~1`
2. Read each changed file in full context
3. Report by severity: CRITICAL / WARNING / NOTE

Invoke it with /code-review in a Vibe session.

Example: Ambient Safety Rules

Create .vibe/skills/safety-rules/SKILL.md in your project root:

---
name: safety-rules
description: Core safety rules for infrastructure operations
allowed-tools: read_file bash grep
user-invocable: false
---

# Safety Rules
- rsync --delete: ALWAYS --dry-run first
- .env files: backup before replacing
- Never force push to main without approval
- Verify database schema before writing queries
- Fail fast on missing env vars

This loads automatically whenever Vibe runs in that project directory.

Custom Agents

Agents are TOML files that override Vibe’s config for specific use cases. They live at ~/.vibe/agents/<name>.toml.

Example: Infrastructure Agent

display_name = "Infrastructure"
description = "Docker, SSH, and server management with careful permissions"
safety = "destructive"
agent_type = "agent"

auto_approve = false
enabled_tools = ["read_file", "grep", "bash", "write_file", "search_replace", "task"]

Example: Read-Only Agent

display_name = "Read Only"
description = "Safe exploration — read files, search, run safe commands"
safety = "safe"
agent_type = "agent"

auto_approve = true
enabled_tools = ["read_file", "grep", "bash"]

Use with: vibe --agent infra or vibe --agent readonly.

Built-in Agents

Vibe ships with several agents:

Agent	Mode	Description
`default`	Standard	Normal interactive mode
`plan`	Plan-first	Requires plan approval before execution
`accept-edits`	Edit-focused	Auto-approves file edits
`auto-approve`	Autonomous	Approves all tool calls
`explore`	Subagent	Read-only exploration subagent
`lean`	Installable	Uses Leanstral model with `thinking = "high"`

AGENTS.md — Your Project’s AI Rules File

AGENTS.md is Vibe’s equivalent of Claude Code’s CLAUDE.md. Place it at the root of your project and Vibe reads it automatically as context.

Template

# Project Name

description: Brief description of the project

## Safety Rules
- rsync --delete: ALWAYS --dry-run first
- .env files: backup before replacing/removing
- Never force push to main without approval
- Verify database schema before writing queries

## Stack
- Runtime: [your runtime]
- Database: [your database]
- Frontend: [your frontend]

## Conventions
- Conventional commits: type(scope): description
- Timezone: America/New_York (never assume UTC)
- Fail fast on missing env vars

## Common Commands
npm run dev        # Local development
./deploy.sh        # Production deploy (always --dry-run first)

Comparison: CLAUDE.md vs AGENTS.md

Feature	CLAUDE.md (Claude Code)	AGENTS.md (Vibe)
Location	`~/.claude/CLAUDE.md` (global) + project root	Project root only
Hierarchical	Yes (parent dirs cascade)	No (root only)
Format	Freeform Markdown	YAML-flavored Markdown
Domain rules	`~/.claude/rules/*.md` with glob matching	`.vibe/skills/` (ambient, no globs)
Per-file scoping	Glob patterns in frontmatter	Not supported
Supplement	—	`.vibe/skills/` with `user-invocable: false`

Self-Hosted Backend Setup

This section covers the server side — vLLM, Open-WebUI, LiteLLM, Docker Compose. Once your gateway is running, point Vibe at it using the Option B — Your own self-hosted backend client config above.

Architecture

A production multi-model setup uses a gateway pattern:

Vibe CLI / Open-WebUI (browser)
  ↓
LiteLLM Gateway (routes by model name)
  ├── mistral-medium → vLLM (GPU 6-7, FP8, TP=2)   ← IrregularChat default coding model
  ├── irregularchat  → vLLM (GPU 0, Gemma 4 31B)
  └── other-model    → vLLM (GPU N)

Or for direct access (simpler, recommended for tool calling with Vibe CLI):

Vibe (your machine) → vLLM (direct) → GPU(s)

vLLM Launch Command

For Devstral models, three flags are required for tool calling to work:

vllm serve mistralai/Devstral-2-123B-Instruct-2512 \
  --tool-call-parser mistral \
  --enable-auto-tool-choice \
  --tensor-parallel-size 2 \
  --quantization fp8 \
  --port 8080

Flag	Why Required
`--tool-call-parser mistral`	Without this, vLLM rejects tool call schemas with Pydantic validation errors
`--enable-auto-tool-choice`	Lets the model decide when to use tools
`--tensor-parallel-size N`	Split model across N GPUs (123B needs 2+ GPUs)
`--quantization fp8`	Fits 123B on 2x GPUs with ~178GB VRAM

Model Options

Model	Size	License	Min Hardware	SWE-bench	vLLM Image
Devstral 2	123B	Mistral Research (revenue cap)	2x H100/A100/B200	72.2%	`vllm/vllm-openai:v0.19.0`
Devstral Small 2	24B	Apache 2.0	1x RTX 4090 (24GB)	~55%	`vllm/vllm-openai:v0.19.0`
Gemma 4	12B-31B	Apache 2.0	1x RTX 4090 (24GB-31B)	—	Custom image required (see below)
Mistral Large	123B	Mistral Research	2x H100/A100/B200	—	`vllm/vllm-openai:v0.19.0`
Magistral Medium	123B	Mistral Research	2x H100/A100/B200	—	`vllm/vllm-openai:v0.19.0`

Pricing (June 2026, $/M tokens in/out):

devstral-small-latest: $0.10 / $0.30 - Cheap fast fanout
codestral-latest: $0.30 / $0.90 - FIM / inline completion
devstral-medium-latest: $0.40 / $2.00 - Frontier agentic coding
mistral-large-latest: $0.50 / $1.50 - Frontier generalist
magistral-medium-latest: $2.00 / $5.00 - Reasoning-tuned

Devstral 2 Licensing

Devstral 2 (123B) has a revenue restriction — commercial use by organizations with >$20M monthly revenue requires a separate license from Mistral. Devstral Small 2 (24B) is fully Apache 2.0 with no restrictions.

Gemma 4 models use the gemma4 architecture type, which is newer than the transformers library shipped in stock vLLM Docker images (4.57.x). You must build a custom image with upgraded transformers:

FROM vllm/vllm-openai:latest
RUN pip install --no-cache-dir --upgrade transformers

Build with: docker build -t vllm-openai-gemma4:latest .

Then use vllm-openai-gemma4:latest instead of the stock image.

Docker Compose Example

For a production-ready self-hosted setup:

services:
  vllm:
    container_name: vllm-devstral
    image: vllm/vllm-openai:v0.19.0
    restart: unless-stopped
    volumes:
      - /path/to/models:/root/.cache/huggingface
    ipc: host
    command: >
      mistralai/Devstral-2-123B-Instruct-2512
      --served-model-name devstral
      --quantization fp8
      --tool-call-parser mistral
      --enable-auto-tool-choice
      --tensor-parallel-size 2
      --gpu-memory-utilization 0.95
      --max-num-seqs 4
    runtime: nvidia
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              device_ids: ['0', '1']
              capabilities: [gpu]
    ports:
      - "8080:8000"

Then point Vibe at it:

[[providers]]
name = "local"
api_base = "http://localhost:8080/v1"
api_key_env_var = "VLLM_API_KEY"  # vLLM doesn't require a key, but Vibe needs the field
api_style = "openai"
backend = "generic"

Set a dummy key in ~/.vibe/.env:

VLLM_API_KEY=not-needed

Open-WebUI + LiteLLM Gateway

For teams that want both a browser UI and CLI access, add Open-WebUI with LiteLLM as a gateway:

services:
  litellm:
    image: ghcr.io/berriai/litellm:main-latest
    ports:
      - "4000:4000"
    volumes:
      - ./config.yaml:/app/config.yaml:ro
    environment:
      LITELLM_MASTER_KEY: ${LITELLM_MASTER_KEY}
      DATABASE_URL: postgresql://litellm:${POSTGRES_PASSWORD}@litellm-db:5432/litellm
    extra_hosts:
      - "host.docker.internal:host-gateway"
    command: ["--config", "/app/config.yaml", "--port", "4000"]
    restart: unless-stopped

  open-webui:
    image: ghcr.io/open-webui/open-webui:main
    ports:
      - "3000:8080"
    volumes:
      - open-webui-data:/app/backend/data
    environment:
      OPENAI_API_BASE_URL: http://litellm:4000/v1
      OPENAI_API_KEY: ${LITELLM_MASTER_KEY}
      ENABLE_OLLAMA_API: "false"
      WEBUI_AUTH: "true"
      WEBUI_NAME: "Your AI Instance"
      WEBUI_SECRET_KEY: ${WEBUI_SECRET_KEY}
      BYPASS_MODEL_ACCESS_CONTROL: "true"
    depends_on:
      - litellm
    restart: unless-stopped

LiteLLM config (config.yaml) routes model names to vLLM backends:

model_list:
  - model_name: devstral-123b
    litellm_params:
      model: hosted_vllm/devstral        # must match --served-model-name
      api_base: http://host.docker.internal:8080/v1
      api_key: none

litellm_settings:
  drop_params: true       # prevents 422 from unsupported params
  request_timeout: 600    # 10 min for long coding tasks

LiteLLM Config Gotchas

The model field must use the vLLM --served-model-name, not the filesystem path. hosted_vllm/devstral works; hosted_vllm//workspace/models/Devstral-2-123B-Instruct-2512 does not.
drop_params: true is essential — it silently drops parameters the backend doesn’t support instead of returning 422 errors.
host.docker.internal resolves to the Docker host — use this to reach vLLM containers from inside the LiteLLM container.

Tips & Known Issues

Tips

Temperature 0.2 is a good default for coding tasks across Mistral Medium, Devstral, and Codestral — low enough to keep tool calls structured, high enough to surface useful alternatives. Tune up to 0.5 for brainstorming/refactor work, down to 0.1 for strict refactors.
/compact manually compresses conversation context — use it after long investigations
Subagents can run tasks in parallel via the task tool — think of them like Claude Code’s Agent tool
Per-project config: Drop a .vibe/config.toml in any repo to override global settings (different model, different tools)
System prompts: Create ~/.vibe/prompts/<name>.md and set system_prompt_id = "<name>" in config
Session logs: Stored at ~/.vibe/logs/ when enabled — useful for reviewing what Vibe did

Known Issues

Issue	Workaround	Status
Ctrl+C breaks message alternation	Use `/clear` instead	Open (#255)
Tool calls fail through Open-WebUI proxy	Point Vibe directly at vLLM or LiteLLM	By design
Tool calls fail on LM Studio	Use vLLM with `--tool-call-parser mistral`	Confirmed (#124)
“Generating…” hangs indefinitely	Restart Vibe session	Open (#415)
TUI rendering breaks in some terminals	Use Alacritty, Ghostty, Kitty, or WezTerm	By design
Non-admin users see no models in Open-WebUI	Set `BYPASS_MODEL_ACCESS_CONTROL=true`	By design (since v0.4)
vLLM “model type not recognized” for new models	Build custom image with `pip install --upgrade transformers`	Gemma 4, etc.
LiteLLM 404 “model does not exist”	Use `--served-model-name` in config, not filesystem path	Config issue
Sessions lost on Open-WebUI restart	Set `WEBUI_SECRET_KEY` in environment	Config issue

Cost Optimization (Self-Hosted)

When running your own model, “cost” is GPU time rather than API tokens:

--max-num-seqs 2-4 limits concurrent requests (prevents OOM on large models)
--gpu-memory-utilization 0.95 maximizes VRAM usage (safe when GPUs are dedicated)
auto_compact_threshold = 200000 prevents context from growing unbounded
max_output_bytes = 8000 on bash tool prevents long command outputs from bloating context
Run a compaction model (smaller/faster) for auto-compaction if available on the same endpoint

Claude Code + Vibe Orchestration (Best of Both Worlds)

The most powerful way to use Vibe isn’t standalone — it’s as a workhorse dispatched by Claude Code. Claude Code has superior reasoning, planning, and code review but is limited by subscription tokens. Vibe on a self-hosted model has unlimited tokens but weaker orchestration. Together: Claude’s brain + Vibe’s unlimited hands.

How It Works

┌─────────────────────────────────────────────────────────────┐
│                      Claude Code (Brain)                     │
│  Plans → Dispatches → Reviews → Synthesizes → Commits       │
│                                                              │
│  ┌──────────┐   ┌──────────┐   ┌──────────┐                │
│  │ Vibe -p  │   │ Vibe -p  │   │ Vibe -p  │  (Parallel)    │
│  │ Agent 1  │   │ Agent 2  │   │ Agent 3  │                │
│  │ files    │   │ tests    │   │ docs     │                │
│  │ A-D      │   │ E-H      │   │ I-L      │                │
│  └──────────┘   └──────────┘   └──────────┘                │
│       ↓              ↓              ↓                        │
│  Claude Code reviews all results, fixes integration issues   │
└─────────────────────────────────────────────────────────────┘

Vibe’s -p flag runs it in headless/programmatic mode: it sends the prompt, executes any approved tool calls, outputs the result, and exits. This makes Vibe behave like a function call — prompt in, result out — perfect for dispatch from Claude Code.

The vibe CLI (Headless Mode)

# Single-shot with tool auto-approval and a wall-clock timeout
timeout 300 vibe -p "your prompt here" \
  --workdir /path/to/project --agent auto-approve \
  --max-turns 25 --output streaming > out.ndjson

# With tool restrictions — read-only research, no auto-approve needed
vibe -p "your prompt" --enabled-tools "read_file" --enabled-tools "grep" --output text

# JSON output for structured/parseable end-of-session results
vibe -p "your prompt" --agent auto-approve --output json --max-turns 10

Flag	Description
`-p "prompt"`	Headless/programmatic mode. Does NOT auto-approve tools as of recent Vibe versions — pass `--agent auto-approve` when the prompt needs to write/edit/run bash. Without it, mutation tools are silently skipped with “Tool execution not permitted” and the model loops trying alternates.
`--agent auto-approve`	Bypass all tool-approval prompts. Required for any prompt that calls `write_file`, `search_replace`, or `bash`.
`--workdir DIR`	Set working directory (always set this)
`--max-turns N`	Hard turn cap. NOT a wall-clock cap — each LLM call can retry for up to ~5 minutes on Mistral SDK backoff. Always wrap with `timeout` (180–600s sane).
`--output text\|json\|streaming`	Output format. `text` (default) buffers stdout until session end — 0 bytes mid-flight is normal, not stuck. `streaming` emits NDJSON per message with immediate flush (use for parallel batch progress visibility). `json` dumps all messages at the end.
`--enabled-tools TOOL`	Restrict available tools. Supports globs (`bash`) and regex (`re:.`).

Task Routing: Who Does What

Task Type	Who	Why
Planning & architecture	Claude Code	Superior multi-step reasoning, weighs tradeoffs
Code review & validation	Claude Code	Better judgment on quality, security, patterns
Synthesis & final decisions	Claude Code	Integrates results from multiple agents
Commit messages & git ops	Claude Code	Craft precise conventional commits
File generation (new files)	Vibe	Unlimited tokens, follows templates well
Bulk edits (many files)	Vibe (parallel)	Each agent handles a subset of files
Research (docs, APIs, codebase)	Vibe	Reads entire docs without token pressure
Test writing	Vibe	Repetitive work, follows patterns
Documentation	Vibe	Good at following style guides with unlimited context
Bug investigation	Vibe (gather) → Claude (diagnose)	Vibe reads 20 files, Claude interprets

Dispatch Patterns

Parallel Dispatch (Independent Tasks)

When tasks touch different files, run multiple Vibe agents simultaneously. Each job needs four guards: --agent auto-approve (so writes actually happen), --output streaming (so you can see progress), timeout (so a retry storm can’t hang for hours), and a parallel cap of ~3 (Mistral rate-limits per key). See Vibe Headless Mode Gotchas for why each of these matters and a complete xargs -P 3 template.

# Agent 1: Implement auth module
timeout 300 vibe -p "Create src/lib/auth.ts with login, logout, and session functions.
Follow patterns in src/lib/database.ts." \
  --workdir /path/to/project --agent auto-approve \
  --max-turns 30 --output streaming > out-auth.ndjson 2> err-auth &

# Agent 2: Write tests (different files)
timeout 300 vibe -p "Write tests for src/lib/utils.ts at src/tests/utils.test.ts.
Use vitest. Cover all exported functions." \
  --workdir /path/to/project --agent auto-approve \
  --max-turns 25 --output streaming > out-tests.ndjson 2> err-tests &

# Agent 3: Generate docs (different files)
timeout 300 vibe -p "Document all exported functions in src/lib/ to docs/api.md.
Follow the existing style in docs/README.md." \
  --workdir /path/to/project --agent auto-approve \
  --max-turns 20 --output streaming > out-docs.ndjson 2> err-docs &

wait  # All three complete in parallel — keep this at 3 unless empirically verified higher

After wait, parse each NDJSON: an empty file means timeout fired (real stall, rate limit, or retry storm); non-empty means Vibe produced output and the last {"role":"assistant",...} line is the final response. Claude Code then reviews all results, checks for integration issues, and makes targeted fixes.

Serial Dispatch (Dependent Tasks)

When step 2 needs step 1’s output, chain them:

# Step 1: Research (read-only, safe)
FINDINGS=$(vibe -p "Read src/api/ and list all endpoints, their methods, and parameters." \
  --workdir /path/to/project \
  --enabled-tools "read_file" --enabled-tools "grep" \
  --max-turns 10 --output text)

# Claude Code reads $FINDINGS, plans the implementation, then:

# Step 2: Implement (based on research)
vibe -p "Based on these existing endpoints: [paste findings]
Add a new POST /api/users/reset-password endpoint following the same patterns." \
  --workdir /path/to/project --max-turns 35 --output text

Research-Only Dispatch (Safe Mode)

Restrict Vibe to read-only tools for pure investigation:

vibe -p "Read all files in src/components/ and src/lib/.
Find everywhere that calls the 'authenticate' function.
Report: which files, which line numbers, what arguments are passed." \
  --workdir /path/to/project \
  --enabled-tools "read_file" --enabled-tools "grep" --enabled-tools "bash" \
  --max-turns 15 --output text

The --enabled-tools restriction means Vibe literally cannot modify files — defense-in-depth on top of headless mode.

Orchestration Workflow Example

Scenario: Implement a new “Reset Password” feature across API, frontend, and tests.

Claude Code plans — breaks the feature into 4 independent tasks
Claude Code dispatches parallel Vibe agents:
- Agent 1: Create src/api/reset-password.ts (backend endpoint)
- Agent 2: Create src/components/ResetPasswordForm.tsx (frontend)
- Agent 3: Write src/tests/reset-password.test.ts (tests)
- Agent 4: Update docs/api.md with new endpoint docs
All 4 agents run simultaneously on unlimited Vibe tokens
Claude Code reviews all generated files for:
- Do imports resolve correctly across modules?
- Does the frontend call the right API endpoint?
- Do tests cover the actual implementation (not just stubs)?
- Any security issues (input validation, auth checks)?
Claude Code fixes integration issues (or dispatches targeted Vibe fixes)
Claude Code commits with a conventional commit message

Cost Math

Approach	Token Cost	Time
Claude Code does everything	~500K tokens ($2-10 depending on plan)	1 session
Vibe does everything	Free (self-hosted) but weaker planning	May loop/fail
Claude orchestrates + Vibe implements	~50K Claude tokens + unlimited Vibe	Best of both

Claude Code’s token spend drops by ~90% because it only handles planning, review, and synthesis — the three things it’s best at. All the heavy file reading, generation, and bulk edits happen on Vibe’s unlimited self-hosted backend.

cc-vibe — Using Claude Code with the Mistral API

A complementary setup to the orchestration pattern above: instead of Claude Code (Anthropic cloud) → Vibe (Mistral), run Claude Code itself on Mistral for everyday work, then escalate to cloud Claude only for hard tasks. Many community members alias this as cc-vibe.

The catch: Claude Code speaks the Anthropic Messages API, but api.mistral.ai speaks Mistral’s chat-completions format. They are not wire-compatible. You need a translator in front of Mistral. The standard choice is a local LiteLLM proxy.

┌──────────────┐   Anthropic    ┌───────────────┐   Mistral chat   ┌──────────────────┐
│ Claude Code  │  Messages API  │ LiteLLM proxy │  /v1/completions │  api.mistral.ai  │
│  (cc-vibe)   │ ─────────────► │  localhost    │ ───────────────► │  (Official API)  │
└──────────────┘     :4000      └───────────────┘                  └──────────────────┘

1. Install LiteLLM:

uv tool install 'litellm[proxy]'
# or: pip install --user 'litellm[proxy]'

2. Configure the proxy at ~/.vibe/litellm-config.yaml:

# Mistral → Anthropic-compatible translator
# Exposes Mistral models at http://localhost:4000/v1/messages (Anthropic format).
#
# Reload after edits: launchctl kickstart -k gui/$UID/io.vibe.litellm
#
# Pricing (June 2026, $/M tokens in/out):
#   devstral-small-latest   $0.10 / $0.30   agentic coding, cheap fanout
#   codestral-latest        $0.30 / $0.90   FIM / inline completion (not agent loop)
#   devstral-medium-latest  $0.40 / $2.00   FRONTIER agentic coding (SWE-bench 72.2%)
#   mistral-large-latest    $0.50 / $1.50   frontier generalist, hard reasoning
#   magistral-medium-latest $2.00 / $5.00   reasoning-tuned (architecture, planning)

model_list:
  # ── Coding-specialized (default for cc-vibe) ────────────────────────────────
  - model_name: devstral-medium
    litellm_params:
      model: mistral/devstral-medium-latest
      api_key: os.environ/MISTRAL_API_KEY
  - model_name: devstral-small
    litellm_params:
      model: mistral/devstral-small-latest
      api_key: os.environ/MISTRAL_API_KEY
  - model_name: codestral
    litellm_params:
      model: mistral/codestral-latest
      api_key: os.environ/MISTRAL_API_KEY

  # ── Reasoning / generalist tiers ────────────────────────────────────────────
  - model_name: mistral-large
    litellm_params:
      model: mistral/mistral-large-latest
      api_key: os.environ/MISTRAL_API_KEY
  - model_name: mistral-medium
    litellm_params:
      model: mistral/mistral-medium-latest
      api_key: os.environ/MISTRAL_API_KEY
  - model_name: mistral-small
    litellm_params:
      model: mistral/mistral-small-latest
      api_key: os.environ/MISTRAL_API_KEY
  - model_name: magistral-medium
    litellm_params:
      model: mistral/magistral-medium-latest
      api_key: os.environ/MISTRAL_API_KEY

  # ── Claude tier aliases ─────────────────────────────────────────────────────
  # Claude Code internally dispatches subagents under specific claude-* model
  # IDs. Map each Anthropic tier to the Mistral model that matches its *role*:
  #   haiku  → cheap fast fanout       → devstral-small  ($0.10/$0.30)
  #   sonnet → default agent worker    → devstral-medium ($0.40/$2.00)
  #   opus   → planner / architect     → mistral-large   ($0.50/$1.50)
  # Previously these all mapped to mistral-{small,medium,large} generalists,
  # leaving Devstral's coding-agent specialization unused.
  - model_name: claude-haiku-4-5-20251001
    litellm_params:
      model: mistral/devstral-small-latest
      api_key: os.environ/MISTRAL_API_KEY
  - model_name: claude-haiku-4-5
    litellm_params:
      model: mistral/devstral-small-latest
      api_key: os.environ/MISTRAL_API_KEY
  - model_name: claude-sonnet-4-6
    litellm_params:
      model: mistral/devstral-medium-latest
      api_key: os.environ/MISTRAL_API_KEY
  - model_name: claude-sonnet-4-6-20250514
    litellm_params:
      model: mistral/devstral-medium-latest
      api_key: os.environ/MISTRAL_API_KEY
  - model_name: claude-opus-4-7
    litellm_params:
      model: mistral/mistral-large-latest
      api_key: os.environ/MISTRAL_API_KEY

litellm_settings:
  drop_params: true     # silently drop unsupported params (Mistral rejects some)
  set_verbose: false

general_settings:
  master_key: os.environ/LITELLM_MASTER_KEY

3. Add keys to ~/.vibe/.env (the vibe CLI already reads this file):

MISTRAL_API_KEY=<your-mistral-key>          # used by both Vibe and the LiteLLM proxy upstream
LITELLM_MASTER_KEY=sk-vibe-<random>          # used by Claude Code to authenticate TO the proxy

The distinction matters and is the #1 setup mistake: MISTRAL_API_KEY authenticates the proxy to Mistral. LITELLM_MASTER_KEY authenticates Claude Code to the proxy. They are not interchangeable — the proxy will reject MISTRAL_API_KEY, and Mistral will reject LITELLM_MASTER_KEY.

4. Start the proxy (leave it running in the background, e.g. via launchd, systemd, or a tmux pane):

source ~/.vibe/.env
litellm --config ~/.vibe/litellm-config.yaml --port 4000 &

5. Point Claude Code at it. Easiest path is the community claude-switch helper, with this entry in ~/.claude-backends.env:

VIBE_API_KEY_FILE="$HOME/.vibe/.env"
VIBE_BASE_URL="http://localhost:4000"
# Devstral 2 medium — purpose-built agentic coding model (SWE-bench 72.2%,
# parity with Claude Opus). Best default for Claude Code's multi-file tool-use
# loop. Swap mid-session with `/model <name>` — see litellm-config.yaml for the
# full menu (devstral-small, mistral-large, magistral-medium, codestral, …).
VIBE_MODEL="devstral-medium"
VIBE_MODEL_NAME="Devstral 2 Medium (Official API via LiteLLM)"

…and a switch_vibe() that exports ANTHROPIC_BASE_URL=$VIBE_BASE_URL and ANTHROPIC_AUTH_TOKEN=$LITELLM_MASTER_KEY. Then:

alias cc-vibe='source claude-switch vibe && claude --dangerously-skip-permissions --teammate-mode auto'
cc-vibe   # Claude Code now talking to Mistral Medium via the local proxy

Add these to your ~/.zshrc or ~/.bashrc for quick model switching:

# Mistral Vibe model presets — swap ANTHROPIC_MODEL after sourcing claude-switch.
# Inside a session, `/model <name>` does the same thing (LiteLLM advertises all).
alias cc-vibe-think='source claude-switch vibe && ANTHROPIC_MODEL=mistral-large ANTHROPIC_CUSTOM_MODEL_OPTION=mistral-large claude --dangerously-skip-permissions --teammate-mode auto'
alias cc-vibe-fast='source claude-switch vibe && ANTHROPIC_MODEL=devstral-small ANTHROPIC_CUSTOM_MODEL_OPTION=devstral-small claude --dangerously-skip-permissions --teammate-mode auto'
alias cc-vibe-reason='source claude-switch vibe && ANTHROPIC_MODEL=magistral-medium ANTHROPIC_CUSTOM_MODEL_OPTION=magistral-medium claude --dangerously-skip-permissions --teammate-mode auto'

Usage:

cc-vibe — default (Devstral Medium for agentic coding)
cc-vibe-fast — cheap parallel research (Devstral Small)
cc-vibe-think — hard reasoning tasks (Mistral Large)
cc-vibe-reason — architecture/planning (Magistral Medium)

6. Smoke test the round-trip:

source claude-switch vibe
claude -p "reply with exactly: cc-vibe-ok" --output-format text
# → cc-vibe-ok

The updated LiteLLM configuration introduces role-based model routing that maps Claude Code’s internal subagent tiers to Mistral’s purpose-built models:

Alias Mapping Strategy:

claude-haiku-* → devstral-small-latest ($0.10/$0.30) — cheap fast fanout for parallel research
claude-sonnet-* → devstral-medium-latest ($0.40/$2.00) — default agent worker for implementation
claude-opus-* → mistral-large-latest ($0.50/$1.50) — planner/architect for hard reasoning

Why this matters: Claude Code internally spawns subagents under specific model IDs (e.g., when you use the Agent tool with subagent_type: "explore", it requests a “haiku” tier agent). The alias map ensures these spawns route to the most appropriate Mistral model:

Haiku-tier spawns (quick lookups, parallel research) → Devstral Small — 4× cheaper than the generalist medium it was using before
Sonnet-tier spawns (implementation work) → Devstral Medium — purpose-built for agentic coding
Opus-tier spawns (planning/architecture) → Mistral Large — generalist reasoning where coding specialization isn’t needed

Mid-session model switching: Use /model <name> in a Claude Code session to swap models on the fly. The proxy advertises all configured models, so you can switch from devstral-medium to mistral-large for a hard reasoning task, then back.

Cost impact: The fanout pattern means subagents fire often. Mapping haiku spawns to the cheaper Devstral Small compounds savings across a session — expect ~40-60% lower costs for research-heavy workflows compared to the old generalist-only mapping.

The translator only works if it’s running. A robust switch_vibe() adds a 2-second health check:

curl -fsS -o /dev/null -m 2 "${VIBE_BASE_URL}/v1/models" \
  -H "x-api-key: ${LITELLM_MASTER_KEY}" \
  || echo "WARNING: LiteLLM proxy at ${VIBE_BASE_URL} not responding. Start with: litellm --config ~/.vibe/litellm-config.yaml --port 4000 &"

This surfaces the dead-proxy case before you spend ten minutes wondering why Claude returns connection errors.

For a self-hosted backend (your own Open-WebUI / LiteLLM gateway): no local proxy needed — point ANTHROPIC_BASE_URL directly at your gateway (LiteLLM exposes Anthropic-compatible endpoints natively) and use your gateway’s API key as ANTHROPIC_AUTH_TOKEN. See Claude Code with Self-Hosted Models → Setup with LiteLLM Gateway.

The updated setup provides three layers of model selection:

1. In-session swap (no shell reload):

/model devstral-medium    # default — agentic coding
/model mistral-large      # heavy reasoning / planning
/model devstral-small     # cheap, fast
/model magistral-medium   # reasoning-tuned (architecture)
/model codestral          # FIM completion (rare for Claude Code)

2. Session-start preset (new aliases — run source ~/.zshrc once):

cc-vibe         → devstral-medium  (default coding agent)
cc-vibe-think   → mistral-large    (reasoning / planning)
cc-vibe-fast    → devstral-small   (cheap fanout)
cc-vibe-reason  → magistral-medium (architecture decisions)

3. Automatic subagent routing: When Claude Code dispatches subagents internally, they now hit the right tier (haiku→devstral-small at $0.10/$0.30, opus→mistral-large at $0.50/$1.50) instead of all funneling to generalist medium.

Cost note: Devstral 2 medium output is $2.00/M (vs Large 3 at $1.50/M). If you find yourself doing heavy non-coding work in cc-vibe, switch to cc-vibe-think — same/better reasoning, cheaper output.

Vibe vs OpenCode for Dispatch

Both tools work as dispatch targets, but they have different strengths:

Capability	Vibe	OpenCode
`--workdir` flag	Yes	No (must `cd` first)
LSP diagnostics	No	Yes (TS, Go, Rust, Python)
Session continuity	No (stateless)	Yes (`--continue`)
Cold start	0.49s	0.85s
JSONL cost/tokens	No (internal only)	Yes (per-step events)
File writing	Yes (writes + shows text)	Yes (writes + shows text)
Temp directory support	Works (`--workdir`)	Fails (needs project root)
Cost budget limit	`--max-price`	No

Rule of thumb: Use Vibe for one-shot dispatch to any directory. Use OpenCode for multi-step TypeScript/Go work where LSP matters. See the OpenCode orchestration guide for the OpenCode-specific pattern.

Requirements

vibe CLI installed and on PATH (~/.local/bin/vibe)
~/.vibe/.env with your API key configured
~/.vibe/config.toml with provider and model configured
Self-hosted backend must handle concurrent requests (--max-num-seqs on vLLM ≥ number of parallel agents)
Read Vibe Headless Mode Gotchas before writing any dispatch script — tool-approval default, output-buffering behavior, retry budget, and per-key rate limits all affect vibe -p reliability and aren’t visible from the help text.

Claude Code - Anthropic’s proprietary agentic coding CLI
Gemini Code - Google’s Gemini CLI for multi-model orchestration
AI Agent Pricing - Cost comparison across all CLI agents
Project Rules & Lessons Learned - CLAUDE.md and AGENTS.md patterns
Full-Stack Development with AI - AI-powered development workflows
OpenHands Guide - Alternative agentic coding tool

External Links

Mistral Vibe Documentation
Mistral Vibe GitHub - Source code (Apache 2.0)
Devstral Model Card - Devstral 2 announcement
MCP Server Registry - Official MCP servers
Vibe Coding Repository - Community rules, skills, and lessons learned

★ Insight ───────────────────────────────────── The alias re-map is the highest-leverage change: any time Claude Code internally spawns a “haiku” subagent (e.g., a quick file lookup) it’ll now route to devstral-small ($0.10/$0.30 — 4× cheaper than mistral-medium) instead of the generalist medium it was using. The fanout pattern means subagents fire often, so this compounds.

Mapping claude-opus-* to mistral-large (not Devstral) is deliberate: Opus is invoked for planning/architecture contexts where raw reasoning beats coding-specialization. Devstral is sonnet-tier (the worker tier), Large is opus-tier (the thinker tier). This mirrors how Anthropic positions its own tiers.

I kept mistral-medium-latest reachable (just not the default) — if you ever need image input (Medium 3.5 is multimodal, Devstral 2 is text-only), /model mistral-medium is one keystroke away. ─────────────────────────────────────────────────

Mistral Vibe

Mistral Vibe

What is Mistral Vibe?

Built-in Tools

Why Vibe? (vs. Alternatives)

vs. Claude Code

vs. Gemini CLI

vs. Le Chat (chat.mistral.ai — $14.99/month web UI)

Installation

macOS / Linux

Configuration

Option A — Official Mistral API

Option B — Your own self-hosted backend

Configuration Reference

Tool Permission Levels

MCP Servers

Prerequisites

Recommended MCP Servers

MCP Capability Matrix

sampling_enabled

Skills & Agents

Skills (Slash Commands & Ambient Context)

Skill Types

Example: Code Review Skill

Example: Ambient Safety Rules

Custom Agents

Example: Infrastructure Agent

Example: Read-Only Agent

Built-in Agents

AGENTS.md — Your Project’s AI Rules File

Template

Comparison: CLAUDE.md vs AGENTS.md

Self-Hosted Backend Setup

Architecture

vLLM Launch Command

Model Options

Docker Compose Example

Open-WebUI + LiteLLM Gateway

LiteLLM Config Gotchas

Tips & Known Issues

Tips

Known Issues

Cost Optimization (Self-Hosted)

Claude Code + Vibe Orchestration (Best of Both Worlds)

How It Works

The vibe CLI (Headless Mode)

Task Routing: Who Does What

Dispatch Patterns

Parallel Dispatch (Independent Tasks)

Serial Dispatch (Dependent Tasks)

Research-Only Dispatch (Safe Mode)

Orchestration Workflow Example

Cost Math

cc-vibe — Using Claude Code with the Mistral API

Vibe vs OpenCode for Dispatch

Requirements

Related Resources

External Links