Mistral Vibe
Mistral Vibe
Section titled “Mistral Vibe”Mistral Vibe is Mistral AI’s open-source (Apache 2.0) command-line interface for AI-assisted software development. Like Claude Code, it provides agentic coding capabilities — reading files, executing commands, and making changes to your codebase directly from the terminal. Unlike Claude Code, Vibe can be pointed at any OpenAI-compatible backend, including self-hosted models via vLLM, LiteLLM, or Open-WebUI.
What is Mistral Vibe?
Section titled “What is Mistral Vibe?”Mistral Vibe is a terminal-based AI coding assistant that can:
- Read and understand your entire codebase
- Execute shell commands via a stateful bash session
- Create, edit, and delete files with search-and-replace precision
- Run grep (ripgrep) for content search across your project
- Delegate tasks to subagents for parallel work
- Connect to MCP servers for extended capabilities (git, filesystem, web fetch)
- Use custom skills and agents for specialized workflows
Built-in Tools
Section titled “Built-in Tools”Vibe ships with these tools out of the box — no plugins required:
| Tool | Description | Claude Code Equivalent |
|---|---|---|
read_file | Read file contents | Read |
write_file | Create or overwrite files | Write |
search_replace | Diff-patch style editing | Edit |
grep | ripgrep content search | Grep |
bash | Stateful shell session | Bash |
task | Spawn subagent for parallel work | Agent |
ask_user_question | Prompt for user input | AskUserQuestion |
webfetch | Fetch URL content | WebFetch |
websearch | Search the web | WebSearch |
Why Vibe? (vs. Alternatives)
Section titled “Why Vibe? (vs. Alternatives)”vs. Claude Code
Section titled “vs. Claude Code”| Feature | Claude Code | Mistral Vibe |
|---|---|---|
| License | Proprietary | Apache 2.0 (open source) |
| Model | Claude Opus/Sonnet (cloud only) | Any OpenAI-compatible endpoint |
| Self-hosted | No | Yes (vLLM, LiteLLM, Ollama, etc.) |
| Cost | $20-200/month subscription | Free (self-host) or pay-per-token |
| Context | 1M tokens | Depends on model (128K+ with Devstral) |
| Plugins | Marketplace + community | Skills + MCP servers |
| Subagents | Yes (Agent tool) | Yes (task tool + TOML agents) |
| Project rules | CLAUDE.md (hierarchical) | AGENTS.md (project root only) |
| SWE-bench | ~79.6% (Sonnet 4.6) | 72.2% (Devstral 2 123B) |
Verdict: Claude Code is more polished and has higher benchmark scores. Vibe wins on openness, self-hosting, and cost control. If your organization can’t send code to Anthropic’s cloud, Vibe with a self-hosted model is the answer. If you can, consider using both — Claude Code for complex tasks, Vibe for routine work on your own hardware.
vs. Gemini CLI
Section titled “vs. Gemini CLI”| Feature | Gemini CLI | Mistral Vibe |
|---|---|---|
| Context | 2M+ tokens | Model-dependent (128K-256K typical) |
| Self-hosted | No (Google cloud only) | Yes |
| Subagents | Yes (.gemini/agents/) | Yes (.vibe/agents/) |
| MCP support | Limited | Full (stdio + HTTP transports) |
| License | Proprietary | Apache 2.0 |
For more on Gemini CLI, see the Gemini Code guide.
vs. Le Chat (chat.mistral.ai — $15/month web UI)
Section titled “vs. Le Chat (chat.mistral.ai — $15/month web UI)”chat.mistral.ai (“Le Chat”) is Mistral’s hosted web product — the same role claude.ai plays for Anthropic or chatgpt.com plays for OpenAI. Le Chat Pro is $14.99/month (commonly rounded to “$15/mo”), notably cheaper than Claude Pro or ChatGPT Plus at $20/mo. Pro gives you:
- Unlimited chat with Mistral Large, Codestral, Pixtral, and other flagship Mistral models
- Document upload and analysis, web search, image generation (Flux), Canvas (in-browser code editor)
- Custom agents and a project workspace
- Voice mode in supported regions
| Le Chat Pro (web, $15/mo) | Mistral Vibe (CLI) | |
|---|---|---|
| Cost | $14.99/mo flat | Free (self-host) or pay-per-token (Mistral API) |
| Interface | Browser | Terminal + your editor |
| Reads your repo | Upload files manually | Yes — full filesystem access |
| Runs shell commands | No (Canvas sandbox only) | Yes — real shell on your machine |
| Edits files in place | No (copy/paste out) | Yes |
| MCP / subagents / skills | No | Yes |
| Best for | Q&A, brainstorming, one-off snippets | Multi-file refactors, agentic edits, automation |
They complement each other — they don’t compete. Pay $15/mo for Le Chat if you want a polished web UI for chat and brainstorm tasks, and keep Vibe CLI for the actual coding work where you need the model to act on your repo. Many people run both.
Installation
Section titled “Installation”macOS / Linux
Section titled “macOS / Linux”curl -LsSf https://mistral.ai/vibe/install.sh | bashOr via uv (the fast Python package manager):
uv tool install mistral-vibeOr via pip:
pip install mistral-vibePrerequisite: Python 3.12+. Check with python3 --version.
After installation, verify:
vibe --versionPATH Setup
If you get command not found: vibe, add ~/.local/bin to your PATH:
echo 'export PATH="$HOME/.local/bin:$PATH"' >> ~/.zshrcsource ~/.zshrcConfiguration
Section titled “Configuration”Vibe’s configuration lives in TOML files:
- Global:
~/.vibe/config.toml— applies to all projects - Per-project:
.vibe/config.tomlin the project root — overrides global - API keys:
~/.vibe/.env— loaded automatically
Vibe works against any OpenAI-compatible chat-completions endpoint. The two paths most users pick:
| Option A — Official Mistral API | Option B — IrregularChat self-hosted | |
|---|---|---|
| Endpoint | https://api.mistral.ai/v1 | Community Open-WebUI instance |
| Auth | Mistral API key from console.mistral.ai | Open-WebUI API key (group membership required) |
| Cost | Pay-per-token (La Plateforme billing) | Free for IrregularChat members |
| Models | All Mistral models (medium, large, codestral, etc.) | Whatever the gateway exposes (currently Mistral Medium 3.5) |
| Network | Public internet → Mistral | Tailscale / public VPN to community GPU host |
| Best for | Anyone — works immediately with a credit card | Members who want zero marginal cost |
You can keep both configured side-by-side and switch via the active_model setting or --agent.
Option A — Official Mistral API
Section titled “Option A — Official Mistral API”For users without IrregularChat backend access, or anyone who wants pay-as-you-go directly from Mistral. Pricing lives at mistral.ai/pricing — Mistral Medium is the recommended balance of cost vs. capability for Vibe’s agentic loop.
-
Get a Mistral API key:
- Sign up at console.mistral.ai
- Workspace → API Keys → Create new key
- Key starts with no fixed prefix (opaque token)
-
Create
~/.vibe/.env:Terminal window MISTRAL_API_KEY=<your-mistral-key> -
Create
~/.vibe/config.toml:
active_model = "mistral-medium-3.5"enable_telemetry = falseauto_compact_threshold = 200000api_timeout = 720.0
[[providers]]name = "mistral"api_base = "https://api.mistral.ai/v1"api_key_env_var = "MISTRAL_API_KEY"api_style = "openai"backend = "mistral"
[[models]]name = "mistral-medium-latest"provider = "mistral"alias = "mistral-medium-3.5"temperature = 0.2auto_compact_threshold = 200000thinking = "off"# Pricing as of 2026-05 — verify at mistral.ai/pricing before relying on these for budget trackinginput_price = 0.4 # $/MTokoutput_price = 2.0 # $/MTok
[[models]]name = "codestral-latest"provider = "mistral"alias = "codestral"temperature = 0.2input_price = 0.2output_price = 0.6- Smoke test:
Terminal window vibe -p "reply with exactly: pong" --max-turns 1 --output text# → pong
Option B — IrregularChat self-hosted backend
Section titled “Option B — IrregularChat self-hosted backend”Our community runs Mistral Medium 3.5 (128B) on dedicated GPU infrastructure via Open-WebUI. (We previously ran Devstral 2 123B — the alias was switched in May 2026; see the changelog at the bottom of this section.) To connect:
-
Get an API key:
- Log into Open-WebUI (ask your admin for the URL)
- You must be added to the api access group by an admin
- Go to Settings > Account > Show API Keys > Generate new API key (not a JWT)
- Copy the key — it starts with
sk-
-
Create
~/.vibe/.env:DF_API_KEY=<your-api-key> -
Create
~/.vibe/config.toml:
active_model = "mistral"enable_telemetry = false
# Context & UIauto_compact_threshold = 200000context_warnings = trueapi_timeout = 720.0
# Project context injectioninclude_commit_signature = trueinclude_project_context = true
[project_context]default_commit_count = 3timeout_seconds = 2.0
# ── Provider ──────────────────────────────────[[providers]]name = "df"api_base = "https://your-openwebui-instance.example.com/api"api_key_env_var = "DF_API_KEY"api_style = "openai"backend = "generic"
# ── Model ─────────────────────────────────────[[models]]name = "mistral-medium" # must match a model_name in the LiteLLM gatewayprovider = "df"alias = "mistral"temperature = 0.2auto_compact_threshold = 200000thinking = "off"input_price = 0.0output_price = 0.0
# :::caution[Don't use `devstral-123b`]# The old `devstral-123b` alias was removed from the gateway in May 2026.# If your config still says `name = "devstral-123b"` you will get:# `400 Bad Request ... InternalServerError ... Connection error`# Use `name = "mistral-medium"` as shown above.# :::
# ── Tool Permissions ──────────────────────────[tools.bash]permission = "ask"default_timeout = 120max_output_bytes = 8000allowlist = [ "git", "ls", "cat", "echo", "pwd", "which", "python", "python3", "pip", "uv", "docker", "docker compose", "curl", "jq", "rg", "grep", "npm", "npx", "node", "make", "cargo", "go",]denylist = ["rm -rf /", "dd", "mkfs"]sensitive_patterns = ["sudo"]
[tools.read_file]permission = "always"
[tools.grep]permission = "always"
[tools.search_replace]permission = "ask"
[tools.write_file]permission = "ask"max_write_bytes = 64000create_parent_dirs = truesensitive_patterns = ["**/.env", "**/.env.*"]
[tools.webfetch]permission = "ask"default_timeout = 30max_content_bytes = 60000
[tools.task]permission = "always"
# ── Session Logging ───────────────────────────[session_logging]enabled = truesave_dir = "~/.vibe/logs"session_prefix = "session"Configuration Reference
Section titled “Configuration Reference”Every config key can also be set via environment variable with the VIBE_ prefix:
| Config Key | Env Variable | Default | Description |
|---|---|---|---|
active_model | VIBE_ACTIVE_MODEL | — | Model alias to use |
auto_compact_threshold | VIBE_AUTO_COMPACT_THRESHOLD | 200000 | Token count before auto-compaction |
api_timeout | VIBE_API_TIMEOUT | 720.0 | HTTP timeout in seconds |
context_warnings | VIBE_CONTEXT_WARNINGS | false | Warn when approaching context limit |
vim_keybindings | VIBE_VIM_KEYBINDINGS | false | Enable vim bindings in TUI |
autocopy_to_clipboard | — | false | Copy last response to clipboard |
enable_auto_update | — | true | Auto-update Vibe |
Tool Permission Levels
Section titled “Tool Permission Levels”| Permission | Behavior |
|---|---|
"always" | Tool runs without confirmation |
"ask" | Prompts for approval each time |
"never" | Tool is disabled |
Best practice: Set read_file, grep, and task to "always" (safe, read-only operations). Keep bash, write_file, and search_replace at "ask" to prevent unintended changes.
MCP Servers
Section titled “MCP Servers”MCP (Model Context Protocol) servers extend Vibe with additional tools. They run as local subprocesses and communicate over stdio or HTTP.
Prerequisites
Section titled “Prerequisites”# Node.js ≥18 (for filesystem MCP)node --version
# uv (for Python MCP servers)which uvx || curl -LsSf https://astral.sh/uv/install.sh | shRecommended MCP Servers
Section titled “Recommended MCP Servers”Add these to your config.toml:
# ── MCP Servers ───────────────────────────────
# Filesystem — directory tree, glob search, move, metadata[[mcp_servers]]name = "fs"transport = "stdio"command = "npx"args = ["-y", "@modelcontextprotocol/server-filesystem", "/path/to/your/projects"]startup_timeout_sec = 15tool_timeout_sec = 60sampling_enabled = false
# Git — status, diff, log, branch, commit, checkout[[mcp_servers]]name = "git"transport = "stdio"command = "uvx"args = ["mcp-server-git"]startup_timeout_sec = 15tool_timeout_sec = 60sampling_enabled = false
# Web fetch — URL to Markdown[[mcp_servers]]name = "web"transport = "stdio"command = "uvx"args = ["mcp-server-fetch"]startup_timeout_sec = 15tool_timeout_sec = 30sampling_enabled = falseMCP Capability Matrix
Section titled “MCP Capability Matrix”| Vibe Need | Built-in Tool | MCP Server | MCP Tools Added |
|---|---|---|---|
| Read files | read_file | fs (richer) | read_text_file, read_multiple_files, read_media_file |
| Write files | write_file | fs | write_file, edit_file |
| Find files by name | — | fs | search_files, directory_tree |
| Search file content | grep (ripgrep) | — | Already covered |
| Shell commands | bash (stateful) | — | Already covered |
| Git operations | — | git | git_status, git_diff, git_log, git_commit, git_add, git_branch, etc. |
| Fetch URLs | webfetch | web | fetch (HTML → Markdown) |
sampling_enabled
Section titled “sampling_enabled”When true (the default), an MCP server can call back into the LLM during tool execution. Set to false for simple servers (filesystem, git, fetch) to save tokens and reduce latency.
Skills & Agents
Section titled “Skills & Agents”Skills (Slash Commands & Ambient Context)
Section titled “Skills (Slash Commands & Ambient Context)”Skills are Markdown files that provide instructions, rules, or workflows. They live at:
- Project-local:
.vibe/skills/<name>/SKILL.md(committed to repo) - User-global:
~/.vibe/skills/<name>/SKILL.md
Skill Types
Section titled “Skill Types”| Type | Frontmatter | Behavior |
|---|---|---|
| Slash command | user-invocable: true | Triggered by typing /skill-name |
| Ambient context | user-invocable: false | Loaded automatically as background context |
Example: Code Review Skill
Section titled “Example: Code Review Skill”Create ~/.vibe/skills/code-review/SKILL.md:
---name: code-reviewdescription: Structured code review on current diffallowed-tools: read_file bash grepuser-invocable: true---
# Code Review
Review the current git diff for:1. Security vulnerabilities (injection, secrets, auth bypass)2. Correctness (logic errors, edge cases, error handling)3. Operations (config changes shipped? rollback plan?)4. Style (conventional commits, no drive-by changes)
Process:1. Run `git diff --staged` or `git diff HEAD~1`2. Read each changed file in full context3. Report by severity: CRITICAL / WARNING / NOTEInvoke it with /code-review in a Vibe session.
Example: Ambient Safety Rules
Section titled “Example: Ambient Safety Rules”Create .vibe/skills/safety-rules/SKILL.md in your project root:
---name: safety-rulesdescription: Core safety rules for infrastructure operationsallowed-tools: read_file bash grepuser-invocable: false---
# Safety Rules- rsync --delete: ALWAYS --dry-run first- .env files: backup before replacing- Never force push to main without approval- Verify database schema before writing queries- Fail fast on missing env varsThis loads automatically whenever Vibe runs in that project directory.
Custom Agents
Section titled “Custom Agents”Agents are TOML files that override Vibe’s config for specific use cases. They live at ~/.vibe/agents/<name>.toml.
Example: Infrastructure Agent
Section titled “Example: Infrastructure Agent”display_name = "Infrastructure"description = "Docker, SSH, and server management with careful permissions"safety = "destructive"agent_type = "agent"
auto_approve = falseenabled_tools = ["read_file", "grep", "bash", "write_file", "search_replace", "task"]Example: Read-Only Agent
Section titled “Example: Read-Only Agent”display_name = "Read Only"description = "Safe exploration — read files, search, run safe commands"safety = "safe"agent_type = "agent"
auto_approve = trueenabled_tools = ["read_file", "grep", "bash"]Use with: vibe --agent infra or vibe --agent readonly.
Built-in Agents
Section titled “Built-in Agents”Vibe ships with several agents:
| Agent | Mode | Description |
|---|---|---|
default | Standard | Normal interactive mode |
plan | Plan-first | Requires plan approval before execution |
accept-edits | Edit-focused | Auto-approves file edits |
auto-approve | Autonomous | Approves all tool calls |
explore | Subagent | Read-only exploration subagent |
lean | Installable | Uses Leanstral model with thinking = "high" |
AGENTS.md — Your Project’s AI Rules File
Section titled “AGENTS.md — Your Project’s AI Rules File”AGENTS.md is Vibe’s equivalent of Claude Code’s CLAUDE.md. Place it at the root of your project and Vibe reads it automatically as context.
Template
Section titled “Template”# Project Name
description: Brief description of the project
## Safety Rules- rsync --delete: ALWAYS --dry-run first- .env files: backup before replacing/removing- Never force push to main without approval- Verify database schema before writing queries
## Stack- Runtime: [your runtime]- Database: [your database]- Frontend: [your frontend]
## Conventions- Conventional commits: type(scope): description- Timezone: America/New_York (never assume UTC)- Fail fast on missing env vars
## Common Commandsnpm run dev # Local development./deploy.sh # Production deploy (always --dry-run first)Comparison: CLAUDE.md vs AGENTS.md
Section titled “Comparison: CLAUDE.md vs AGENTS.md”| Feature | CLAUDE.md (Claude Code) | AGENTS.md (Vibe) |
|---|---|---|
| Location | ~/.claude/CLAUDE.md (global) + project root | Project root only |
| Hierarchical | Yes (parent dirs cascade) | No (root only) |
| Format | Freeform Markdown | YAML-flavored Markdown |
| Domain rules | ~/.claude/rules/*.md with glob matching | .vibe/skills/ (ambient, no globs) |
| Per-file scoping | Glob patterns in frontmatter | Not supported |
| Supplement | — | .vibe/skills/ with user-invocable: false |
Self-Hosted Backend Setup
Section titled “Self-Hosted Backend Setup”Architecture
Section titled “Architecture”A production multi-model setup uses a gateway pattern:
Vibe CLI / Open-WebUI (browser) ↓LiteLLM Gateway (routes by model name) ├── mistral-medium → vLLM (GPU 6-7, FP8, TP=2) ← IrregularChat default coding model ├── irregularchat → vLLM (GPU 0, Gemma 4 31B) └── other-model → vLLM (GPU N)Or for direct access (simpler, recommended for tool calling with Vibe CLI):
Vibe (your machine) → vLLM (direct) → GPU(s)vLLM Launch Command
Section titled “vLLM Launch Command”For Devstral models, three flags are required for tool calling to work:
vllm serve mistralai/Devstral-2-123B-Instruct-2512 \ --tool-call-parser mistral \ --enable-auto-tool-choice \ --tensor-parallel-size 2 \ --quantization fp8 \ --port 8080| Flag | Why Required |
|---|---|
--tool-call-parser mistral | Without this, vLLM rejects tool call schemas with Pydantic validation errors |
--enable-auto-tool-choice | Lets the model decide when to use tools |
--tensor-parallel-size N | Split model across N GPUs (123B needs 2+ GPUs) |
--quantization fp8 | Fits 123B on 2x GPUs with ~178GB VRAM |
Model Options
Section titled “Model Options”| Model | Size | License | Min Hardware | SWE-bench | vLLM Image |
|---|---|---|---|---|---|
| Devstral 2 | 123B | Mistral Research (revenue cap) | 2x H100/A100/B200 | 72.2% | vllm/vllm-openai:v0.19.0 |
| Devstral Small 2 | 24B | Apache 2.0 | 1x RTX 4090 (24GB) | ~55% | vllm/vllm-openai:v0.19.0 |
| Gemma 4 | 12B-31B | Apache 2.0 | 1x RTX 4090 (24GB-31B) | — | Custom image required (see below) |
Devstral 2 Licensing
Devstral 2 (123B) has a revenue restriction — commercial use by organizations with >$20M monthly revenue requires a separate license from Mistral. Devstral Small 2 (24B) is fully Apache 2.0 with no restrictions.
Docker Compose Example
Section titled “Docker Compose Example”For a production-ready self-hosted setup:
services: vllm: container_name: vllm-devstral image: vllm/vllm-openai:v0.19.0 restart: unless-stopped volumes: - /path/to/models:/root/.cache/huggingface ipc: host command: > mistralai/Devstral-2-123B-Instruct-2512 --served-model-name devstral --quantization fp8 --tool-call-parser mistral --enable-auto-tool-choice --tensor-parallel-size 2 --gpu-memory-utilization 0.95 --max-num-seqs 4 runtime: nvidia deploy: resources: reservations: devices: - driver: nvidia device_ids: ['0', '1'] capabilities: [gpu] ports: - "8080:8000"Then point Vibe at it:
[[providers]]name = "local"api_base = "http://localhost:8080/v1"api_key_env_var = "VLLM_API_KEY" # vLLM doesn't require a key, but Vibe needs the fieldapi_style = "openai"backend = "generic"Set a dummy key in ~/.vibe/.env:
VLLM_API_KEY=not-neededOpen-WebUI + LiteLLM Gateway
Section titled “Open-WebUI + LiteLLM Gateway”For teams that want both a browser UI and CLI access, add Open-WebUI with LiteLLM as a gateway:
services: litellm: image: ghcr.io/berriai/litellm:main-latest ports: - "4000:4000" volumes: - ./config.yaml:/app/config.yaml:ro environment: LITELLM_MASTER_KEY: ${LITELLM_MASTER_KEY} DATABASE_URL: postgresql://litellm:${POSTGRES_PASSWORD}@litellm-db:5432/litellm extra_hosts: - "host.docker.internal:host-gateway" command: ["--config", "/app/config.yaml", "--port", "4000"] restart: unless-stopped
open-webui: image: ghcr.io/open-webui/open-webui:main ports: - "3000:8080" volumes: - open-webui-data:/app/backend/data environment: OPENAI_API_BASE_URL: http://litellm:4000/v1 OPENAI_API_KEY: ${LITELLM_MASTER_KEY} ENABLE_OLLAMA_API: "false" WEBUI_AUTH: "true" WEBUI_NAME: "Your AI Instance" WEBUI_SECRET_KEY: ${WEBUI_SECRET_KEY} BYPASS_MODEL_ACCESS_CONTROL: "true" depends_on: - litellm restart: unless-stoppedLiteLLM config (config.yaml) routes model names to vLLM backends:
model_list: - model_name: devstral-123b litellm_params: model: hosted_vllm/devstral # must match --served-model-name api_base: http://host.docker.internal:8080/v1 api_key: none
litellm_settings: drop_params: true # prevents 422 from unsupported params request_timeout: 600 # 10 min for long coding tasksLiteLLM Config Gotchas
Section titled “LiteLLM Config Gotchas”- The
modelfield must use the vLLM--served-model-name, not the filesystem path.hosted_vllm/devstralworks;hosted_vllm//workspace/models/Devstral-2-123B-Instruct-2512does not. drop_params: trueis essential — it silently drops parameters the backend doesn’t support instead of returning 422 errors.host.docker.internalresolves to the Docker host — use this to reach vLLM containers from inside the LiteLLM container.
Tips & Known Issues
Section titled “Tips & Known Issues”- Temperature 0.2 is the community-recommended setting for coding tasks with Devstral
/compactmanually compresses conversation context — use it after long investigations- Subagents can run tasks in parallel via the
tasktool — think of them like Claude Code’sAgenttool - Per-project config: Drop a
.vibe/config.tomlin any repo to override global settings (different model, different tools) - System prompts: Create
~/.vibe/prompts/<name>.mdand setsystem_prompt_id = "<name>"in config - Session logs: Stored at
~/.vibe/logs/when enabled — useful for reviewing what Vibe did
Known Issues
Section titled “Known Issues”| Issue | Workaround | Status |
|---|---|---|
| Ctrl+C breaks message alternation | Use /clear instead | Open (#255) |
| Tool calls fail through Open-WebUI proxy | Point Vibe directly at vLLM or LiteLLM | By design |
| Tool calls fail on LM Studio | Use vLLM with --tool-call-parser mistral | Confirmed (#124) |
| “Generating…” hangs indefinitely | Restart Vibe session | Open (#415) |
| TUI rendering breaks in some terminals | Use Alacritty, Ghostty, Kitty, or WezTerm | By design |
| Non-admin users see no models in Open-WebUI | Set BYPASS_MODEL_ACCESS_CONTROL=true | By design (since v0.4) |
| vLLM “model type not recognized” for new models | Build custom image with pip install --upgrade transformers | Gemma 4, etc. |
| LiteLLM 404 “model does not exist” | Use --served-model-name in config, not filesystem path | Config issue |
| Sessions lost on Open-WebUI restart | Set WEBUI_SECRET_KEY in environment | Config issue |
Cost Optimization (Self-Hosted)
Section titled “Cost Optimization (Self-Hosted)”When running your own model, “cost” is GPU time rather than API tokens:
--max-num-seqs 2-4limits concurrent requests (prevents OOM on large models)--gpu-memory-utilization 0.95maximizes VRAM usage (safe when GPUs are dedicated)auto_compact_threshold = 200000prevents context from growing unboundedmax_output_bytes = 8000on bash tool prevents long command outputs from bloating context- Run a compaction model (smaller/faster) for auto-compaction if available on the same endpoint
Claude Code + Vibe Orchestration (Best of Both Worlds)
Section titled “Claude Code + Vibe Orchestration (Best of Both Worlds)”The most powerful way to use Vibe isn’t standalone — it’s as a workhorse dispatched by Claude Code. Claude Code has superior reasoning, planning, and code review but is limited by subscription tokens. Vibe on a self-hosted model has unlimited tokens but weaker orchestration. Together: Claude’s brain + Vibe’s unlimited hands.
How It Works
Section titled “How It Works”┌─────────────────────────────────────────────────────────────┐│ Claude Code (Brain) ││ Plans → Dispatches → Reviews → Synthesizes → Commits ││ ││ ┌──────────┐ ┌──────────┐ ┌──────────┐ ││ │ Vibe -p │ │ Vibe -p │ │ Vibe -p │ (Parallel) ││ │ Agent 1 │ │ Agent 2 │ │ Agent 3 │ ││ │ files │ │ tests │ │ docs │ ││ │ A-D │ │ E-H │ │ I-L │ ││ └──────────┘ └──────────┘ └──────────┘ ││ ↓ ↓ ↓ ││ Claude Code reviews all results, fixes integration issues │└─────────────────────────────────────────────────────────────┘Vibe’s -p flag runs it in headless/programmatic mode: auto-approves all tools, executes the prompt, outputs the result, and exits. This makes Vibe behave like a function call — prompt in, result out — perfect for dispatch from Claude Code.
The vibe CLI (Headless Mode)
Section titled “The vibe CLI (Headless Mode)”# Single-shot execution — auto-approves all tools, runs to completion, exitsvibe -p "your prompt here" --workdir /path/to/project --output text --max-turns 25
# With tool restrictions (safer for research — can't modify files)vibe -p "your prompt" --enabled-tools "read_file" --enabled-tools "grep" --output text
# JSON output for structured/parseable resultsvibe -p "your prompt" --output json --max-turns 10| Flag | Description |
|---|---|
-p "prompt" | Headless mode. Auto-approves ALL tools. Runs and exits. |
--workdir DIR | Set working directory (always set this) |
--max-turns N | Limit assistant turns. 10-25 for research, 25-50 for implementation. |
--output text|json|streaming | Output format |
--enabled-tools TOOL | Restrict available tools. Supports globs (bash*) and regex (re:.*). |
--agent NAME | Use a custom agent profile |
Task Routing: Who Does What
Section titled “Task Routing: Who Does What”| Task Type | Who | Why |
|---|---|---|
| Planning & architecture | Claude Code | Superior multi-step reasoning, weighs tradeoffs |
| Code review & validation | Claude Code | Better judgment on quality, security, patterns |
| Synthesis & final decisions | Claude Code | Integrates results from multiple agents |
| Commit messages & git ops | Claude Code | Craft precise conventional commits |
| File generation (new files) | Vibe | Unlimited tokens, follows templates well |
| Bulk edits (many files) | Vibe (parallel) | Each agent handles a subset of files |
| Research (docs, APIs, codebase) | Vibe | Reads entire docs without token pressure |
| Test writing | Vibe | Repetitive work, follows patterns |
| Documentation | Vibe | Good at following style guides with unlimited context |
| Bug investigation | Vibe (gather) → Claude (diagnose) | Vibe reads 20 files, Claude interprets |
Dispatch Patterns
Section titled “Dispatch Patterns”Parallel Dispatch (Independent Tasks)
Section titled “Parallel Dispatch (Independent Tasks)”When tasks touch different files, run multiple Vibe agents simultaneously. Claude Code spawns them as background bash commands or parallel subagents:
# Agent 1: Implement auth modulevibe -p "Create src/lib/auth.ts with login, logout, and session functions.Follow patterns in src/lib/database.ts." \ --workdir /path/to/project --max-turns 30 --output text &
# Agent 2: Write tests (different files)vibe -p "Write tests for src/lib/utils.ts at src/tests/utils.test.ts.Use vitest. Cover all exported functions." \ --workdir /path/to/project --max-turns 25 --output text &
# Agent 3: Generate docs (different files)vibe -p "Document all exported functions in src/lib/ to docs/api.md.Follow the existing style in docs/README.md." \ --workdir /path/to/project --max-turns 20 --output text &
wait # All three complete in parallelClaude Code then reviews all results, checks for integration issues, and makes targeted fixes.
Serial Dispatch (Dependent Tasks)
Section titled “Serial Dispatch (Dependent Tasks)”When step 2 needs step 1’s output, chain them:
# Step 1: Research (read-only, safe)FINDINGS=$(vibe -p "Read src/api/ and list all endpoints, their methods, and parameters." \ --workdir /path/to/project \ --enabled-tools "read_file" --enabled-tools "grep" \ --max-turns 10 --output text)
# Claude Code reads $FINDINGS, plans the implementation, then:
# Step 2: Implement (based on research)vibe -p "Based on these existing endpoints: [paste findings]Add a new POST /api/users/reset-password endpoint following the same patterns." \ --workdir /path/to/project --max-turns 35 --output textResearch-Only Dispatch (Safe Mode)
Section titled “Research-Only Dispatch (Safe Mode)”Restrict Vibe to read-only tools for pure investigation:
vibe -p "Read all files in src/components/ and src/lib/.Find everywhere that calls the 'authenticate' function.Report: which files, which line numbers, what arguments are passed." \ --workdir /path/to/project \ --enabled-tools "read_file" --enabled-tools "grep" --enabled-tools "bash" \ --max-turns 15 --output textThe --enabled-tools restriction means Vibe literally cannot modify files — defense-in-depth on top of headless mode.
Orchestration Workflow Example
Section titled “Orchestration Workflow Example”Scenario: Implement a new “Reset Password” feature across API, frontend, and tests.
- Claude Code plans — breaks the feature into 4 independent tasks
- Claude Code dispatches parallel Vibe agents:
- Agent 1: Create
src/api/reset-password.ts(backend endpoint) - Agent 2: Create
src/components/ResetPasswordForm.tsx(frontend) - Agent 3: Write
src/tests/reset-password.test.ts(tests) - Agent 4: Update
docs/api.mdwith new endpoint docs
- Agent 1: Create
- All 4 agents run simultaneously on unlimited Vibe tokens
- Claude Code reviews all generated files for:
- Do imports resolve correctly across modules?
- Does the frontend call the right API endpoint?
- Do tests cover the actual implementation (not just stubs)?
- Any security issues (input validation, auth checks)?
- Claude Code fixes integration issues (or dispatches targeted Vibe fixes)
- Claude Code commits with a conventional commit message
Cost Math
Section titled “Cost Math”| Approach | Token Cost | Time |
|---|---|---|
| Claude Code does everything | ~500K tokens ($2-10 depending on plan) | 1 session |
| Vibe does everything | Free (self-hosted) but weaker planning | May loop/fail |
| Claude orchestrates + Vibe implements | ~50K Claude tokens + unlimited Vibe | Best of both |
Claude Code’s token spend drops by ~90% because it only handles planning, review, and synthesis — the three things it’s best at. All the heavy file reading, generation, and bulk edits happen on Vibe’s unlimited self-hosted backend.
cc-vibe — Using Claude Code with the Mistral API
Section titled “cc-vibe — Using Claude Code with the Mistral API”A complementary setup to the orchestration pattern above: instead of Claude Code (Anthropic cloud) → Vibe (Mistral), run Claude Code itself on Mistral for everyday work, then escalate to cloud Claude only for hard tasks. Many community members alias this as cc-vibe.
The catch: Claude Code speaks the Anthropic Messages API, but api.mistral.ai speaks Mistral’s chat-completions format. They are not wire-compatible. You need a translator in front of Mistral. The standard choice is a local LiteLLM proxy.
┌──────────────┐ Anthropic ┌───────────────┐ Mistral chat ┌──────────────────┐│ Claude Code │ Messages API │ LiteLLM proxy │ /v1/completions │ api.mistral.ai ││ (cc-vibe) │ ─────────────► │ localhost │ ───────────────► │ (Official API) │└──────────────┘ :4000 └───────────────┘ └──────────────────┘1. Install LiteLLM:
uv tool install 'litellm[proxy]'# or: pip install --user 'litellm[proxy]'2. Configure the proxy at ~/.vibe/litellm-config.yaml:
# Mistral → Anthropic-compatible translator# Exposes Mistral models at http://localhost:4000/v1/messages (Anthropic format).
model_list: - model_name: mistral-small litellm_params: model: mistral/mistral-small-latest api_key: os.environ/MISTRAL_API_KEY - model_name: mistral-medium litellm_params: model: mistral/mistral-medium-latest api_key: os.environ/MISTRAL_API_KEY - model_name: mistral-large litellm_params: model: mistral/mistral-large-latest api_key: os.environ/MISTRAL_API_KEY - model_name: codestral litellm_params: model: mistral/codestral-latest api_key: os.environ/MISTRAL_API_KEY
litellm_settings: drop_params: true # silently drop unsupported params (Mistral rejects some) set_verbose: false
general_settings: master_key: os.environ/LITELLM_MASTER_KEY3. Add keys to ~/.vibe/.env (the vibe CLI already reads this file):
MISTRAL_API_KEY=<your-mistral-key> # used by both Vibe and the LiteLLM proxy upstreamLITELLM_MASTER_KEY=sk-vibe-<random> # used by Claude Code to authenticate TO the proxyThe distinction matters and is the #1 setup mistake: MISTRAL_API_KEY authenticates the proxy to Mistral. LITELLM_MASTER_KEY authenticates Claude Code to the proxy. They are not interchangeable — the proxy will reject MISTRAL_API_KEY, and Mistral will reject LITELLM_MASTER_KEY.
4. Start the proxy (leave it running in the background, e.g. via launchd, systemd, or a tmux pane):
source ~/.vibe/.envlitellm --config ~/.vibe/litellm-config.yaml --port 4000 &5. Point Claude Code at it. Easiest path is the community claude-switch helper, with this entry in ~/.claude-backends.env:
VIBE_API_KEY_FILE="$HOME/.vibe/.env"VIBE_BASE_URL="http://localhost:4000"VIBE_MODEL="mistral-medium"VIBE_MODEL_NAME="Mistral Medium (Official API via LiteLLM)"…and a switch_vibe() that exports ANTHROPIC_BASE_URL=$VIBE_BASE_URL and ANTHROPIC_AUTH_TOKEN=$LITELLM_MASTER_KEY. Then:
alias cc-vibe='source claude-switch vibe && claude --dangerously-skip-permissions --teammate-mode auto'cc-vibe # Claude Code now talking to Mistral Medium via the local proxy6. Smoke test the round-trip:
source claude-switch vibeclaude -p "reply with exactly: cc-vibe-ok" --output-format text# → cc-vibe-okFor the IrregularChat self-hosted backend (Open-WebUI / LiteLLM gateway): no local proxy needed — point ANTHROPIC_BASE_URL directly at the community gateway (which already exposes Anthropic-compatible endpoints) and use your Open-WebUI API key as ANTHROPIC_AUTH_TOKEN. See Claude Code with Self-Hosted Models → Setup with LiteLLM Gateway.
Vibe vs OpenCode for Dispatch
Section titled “Vibe vs OpenCode for Dispatch”Both tools work as dispatch targets, but they have different strengths:
| Capability | Vibe | OpenCode |
|---|---|---|
--workdir flag | Yes | No (must cd first) |
| LSP diagnostics | No | Yes (TS, Go, Rust, Python) |
| Session continuity | No (stateless) | Yes (--continue) |
| Cold start | 0.49s | 0.85s |
| JSONL cost/tokens | No (internal only) | Yes (per-step events) |
| File writing | Yes (writes + shows text) | Yes (writes + shows text) |
| Temp directory support | Works (--workdir) | Fails (needs project root) |
| Cost budget limit | --max-price | No |
Rule of thumb: Use Vibe for one-shot dispatch to any directory. Use OpenCode for multi-step TypeScript/Go work where LSP matters. See the OpenCode orchestration guide for the OpenCode-specific pattern.
Requirements
Section titled “Requirements”vibeCLI installed and on PATH (~/.local/bin/vibe)~/.vibe/.envwith your API key configured~/.vibe/config.tomlwith provider and model configured- Self-hosted backend must handle concurrent requests (
--max-num-seqson vLLM ≥ number of parallel agents)
Related Resources
Section titled “Related Resources”- Claude Code - Anthropic’s proprietary agentic coding CLI
- Gemini Code - Google’s Gemini CLI for multi-model orchestration
- AI Agent Pricing - Cost comparison across all CLI agents
- Project Rules & Lessons Learned - CLAUDE.md and AGENTS.md patterns
- Full-Stack Development with AI - AI-powered development workflows
- OpenHands Guide - Alternative agentic coding tool
External Links
Section titled “External Links”- Mistral Vibe Documentation
- Mistral Vibe GitHub - Source code (Apache 2.0)
- Devstral Model Card - Devstral 2 announcement
- MCP Server Registry - Official MCP servers
- Vibe Coding Repository - Community rules, skills, and lessons learned