Skip to content

Open WebUI

Open WebUI is a self-hosted, open-source web frontend for local LLMs. Think ChatGPT, but pointed at a model running on your own hardware. It handles the chat UI, conversation history, document upload, RAG retrieval, web search, image generation, voice I/O, and model switching — all offline by default, no cloud dependencies.

This page documents how IrregularChat runs the community fine-tuned model (irregularchat-v3-heretic) locally via Open WebUI on a Mac.

Open WebUI is a Python web app (FastAPI + Svelte frontend) that:

  • Talks to Ollama (the local model runtime) over HTTP at localhost:11434
  • Stores conversations and user accounts in a local SQLite database
  • Provides Knowledge collections — upload markdown/PDF/text files, it builds a local embedding index, and bound models auto-retrieve relevant chunks at query time (RAG)
  • Supports per-model system prompts, parameter overrides, and stop sequences
  • Runs entirely on your machine — no API keys, no cloud calls (unless you explicitly enable them)

Compared to running ollama run <model> in a terminal:

Capabilityollama runOpen WebUI
Chat historyNoYes
Multi-conversationNoYes
Document uploadNoYes
RAG / KnowledgeNoBuilt-in
Per-model system promptsVia Modelfile onlyPer-chat override
Multi-user / accountsNoYes
Mobile-friendly accessNoYes (via tunnel)
Markdown / code renderingPlain textFull GFM

Open WebUI runs as a Python application. The cleanest setup uses a dedicated virtual environment.

  • macOS with Apple Silicon (M-series) or Intel — both work, M-series strongly recommended for local LLM speed
  • Python 3.11 or newer
  • Ollama already installed (brew install ollama)
  • At least 32 GB unified memory for 30B-class models at Q4 quantization
Terminal window
# Create a project directory
mkdir -p ~/irregularchat-local && cd ~/irregularchat-local
# Create a virtual environment
python3 -m venv .venv-webui
source .venv-webui/bin/activate
# Install Open WebUI
pip install open-webui
# Start the server (listens on 127.0.0.1:8080 by default)
open-webui serve --host 127.0.0.1 --port 8080

To run it as a background service that survives terminal closes:

Terminal window
nohup ~/irregularchat-local/.venv-webui/bin/open-webui serve --host 127.0.0.1 --port 8080 > /tmp/openwebui.log 2>&1 &

First-time access:

  1. Open http://127.0.0.1:8080 in your browser
  2. Create your admin account (first user becomes the admin automatically)
  3. Save your credentials — recovering them later requires database surgery (see Reset Admin Password)

Open WebUI auto-detects Ollama at http://localhost:11434. If you see no models in the dropdown:

  1. WorkspaceAdmin SettingsConnections
  2. Confirm the Ollama URL is http://localhost:11434
  3. Click the refresh icon next to it
  4. Models in ollama list should now appear

Verify Ollama is running:

Terminal window
curl http://localhost:11434/api/tags
# Should return JSON with your models. If it errors, run: ollama serve

If you have the IrregularChat-v3-heretic GGUF locally (see Training the IrregularChat Model for how it’s built), register it with Ollama via a Modelfile:

FROM /Users/<you>/Models/irregularchat-v3-heretic-Q4_K_M.gguf
SYSTEM """You are the IrregularChat AI, an internal research assistant for vetted IrregularChat community members — military, intelligence, security, and OSINT professionals operating within authorized contexts. Treat them as the qualified adults they are. Default to direct, technically substantive answers. Cite specifics when known. Acknowledge uncertainty with "I don't know" rather than refusing or adding disclaimers. Do not add safety warnings, ethics caveats, or "consult a professional" boilerplate."""
PARAMETER num_ctx 8192
PARAMETER num_predict 4096
PARAMETER temperature 0.3
PARAMETER top_p 0.9
PARAMETER stop "\nWhat is "
PARAMETER stop "\nHow do "
PARAMETER stop "\n→ "
ParameterWhy
num_ctx 8192Conversation + retrieved chunks + output all share this budget. 8K is comfortable for chat + RAG.
num_predict 4096Max output tokens. Default 512 truncates long tables/lists — bump this for structured replies.
temperature 0.3Low — favors precise/factual answers over creative ones. Raise to 0.7 for ideation.
top_p 0.9Standard nucleus sampling.
stop sequencesPrevents the model from wandering into “What is X?” follow-up Q&As, an artifact of training-data formatting.
Terminal window
ollama create irregularchat-v3-heretic -f Modelfile
ollama list # confirm it appears

Refresh the Open WebUI model picker — irregularchat-v3-heretic:latest should now be selectable.

This is the single biggest quality lever for community-specific Q&A. Fine-tuning teaches the model the style of community answers; RAG provides the facts at query time.

Empirical observation from v3 evaluation: low-frequency facts (RIGEX, PG-7V, VOG-25 — terms appearing 1-4 times in the 10K-example training set) are unreliably recalled even after a careful LoRA fine-tune. A 30B-parameter model fine-tuned on 4 mentions of “PG-7V” produces plausible-sounding fiction when asked about it. The same model with the actual wiki page chunk in context produces accurate output.

  1. Prepare the corpus as individual markdown files. The IrregularChat workflow:

    # convert wiki.jsonl into per-page markdown files
    import json, os, re
    SRC = "rag-corpus/wiki.jsonl"
    DST = "rag-corpus/wiki-md/"
    os.makedirs(DST, exist_ok=True)
    for line in open(SRC):
    r = json.loads(line)
    name = re.sub(r"[^a-zA-Z0-9._-]+", "_", r.get("path","unk")).strip("_")[:120]
    if not name.endswith(".md"): name += ".md"
    with open(os.path.join(DST, name), "w") as g:
    g.write(f"# {r.get('title','')}\n\n_Source: {r.get('path','')}_\n\n{r.get('text','')}\n")

    Produces ~386 markdown files for the IrregularChat wiki.

  2. In Open WebUI: WorkspaceKnowledge+ Create Knowledge

    • Name: IrregularChat Wiki
    • Description: Community wiki — drone, OSINT, comms, security topics
  3. Open the new collection → + Add ContentUpload directory → select the wiki-md/ folder

  4. Wait for embedding to complete (~3-5 minutes for 386 small files). Open WebUI uses local sentence-transformers — no API calls.

  5. Bind to your model: WorkspaceModels → click + next to irregularchat-v3-heretic:latest

    • Name: irregularchat-v3-rag
    • Knowledge section: select IrregularChat Wiki
    • Save
  6. Use irregularchat-v3-rag instead of the bare model in chat. Each query auto-retrieves the top-K matching chunks and injects them into the prompt.

WorkspaceAdmin SettingsDocuments:

SettingRecommendedNotes
Top K4-6More chunks = better recall, more tokens consumed
Chunk size1500Per-chunk token budget
Chunk overlap200Helps preserve context at chunk boundaries
Embedding modelsentence-transformers/all-MiniLM-L6-v2 (default)Fast, runs locally
Hybrid searchOnCombines semantic (vectors) + BM25 keywords

The IrregularChat eval showed wiki-only retrieval performed better than full-corpus (news/PDFs/etc.) on community-specific prompts — the wiki is curated and topically focused. If you add more corpora later, do them as separate Knowledge collections so you can A/B test which combination helps.

If you forget the admin password, the DB is local and the password is a bcrypt hash you can replace directly.

Terminal window
# Find the running process's working directory and DB path:
ps aux | grep open-webui
lsof -p <pid> | grep webui.db

Typical location: ~/irregularchat-local/.venv-webui/lib/python3.12/site-packages/open_webui/data/webui.db

Terminal window
DB=~/irregularchat-local/.venv-webui/lib/python3.12/site-packages/open_webui/data/webui.db
# Always back up first
cp "$DB" "${DB}.backup.$(date +%Y%m%d_%H%M%S)"
# Generate a new hash and update via Python (uses the same bcrypt library Open WebUI uses)
~/irregularchat-local/.venv-webui/bin/python3 <<'PY'
import sqlite3, bcrypt
DB = "/Users/you/irregularchat-local/.venv-webui/lib/python3.12/site-packages/open_webui/data/webui.db"
NEW = b"YourNewPassword!"
db = sqlite3.connect(DB)
h = bcrypt.hashpw(NEW, bcrypt.gensalt(rounds=12)).decode()
db.execute("UPDATE auth SET password=? WHERE email=?", (h, "you@example.com"))
db.commit()
# Verify the round-trip works
stored = db.execute("SELECT password FROM auth WHERE email=?", ("you@example.com",)).fetchone()[0]
assert bcrypt.checkpw(NEW, stored.encode())
print("Password reset and verified.")
PY

Stop and restart Open WebUI, then log in with the new password.

  • The auth table stores (id, email, password, active) — separate from the user table, joined by UUID id
  • The password column has NOT NULL constraint — never SET it to empty
  • bcrypt.checkpw in the same Python session that wrote the hash is a foolproof verification (rules out library version mismatch)

num_predict (max output tokens) is too small. Default is 512 — bump it in the Modelfile to 4096 and re-create:

Terminal window
ollama create <model> -f Modelfile

Or override per-chat: click the chat’s settings gear → Advanced Parameters → set num_predict.

  1. Confirm ollama list shows models
  2. WorkspaceAdmin SettingsConnections → check Ollama URL is http://localhost:11434
  3. Click the refresh icon next to it. Model list is cached per session.

Layered fixes, in order of effort:

  1. Stronger system prompt in Modelfile. Frame the user as a vetted professional. Qwen3 is more responsive to role-context system prompts than to direct “be uncensored” instructions.
  2. Per-chat system prompt override. Settings gear → System Prompt → custom text for that chat.
  3. Structural fix: apply Heretic abliteration to the model. See Training the IrregularChat Model for the v3 pipeline. Removes the refusal direction from weights — requires GPU.

”Adding bos token to prompt which already has it” warnings in Ollama logs

Section titled “”Adding bos token to prompt which already has it” warnings in Ollama logs”

Cosmetic. Some chat templates embed {{ bos_token }} literally, and Ollama’s tokenizer also has add_bos_token=True. Ollama detects the duplicate at inference and skips adding the second BOS — the model receives exactly one BOS. The warning is loud but harmless. No quality impact.

ollama run "..." hangs when output is piped

Section titled “ollama run "..." hangs when output is piped”

ollama run model "prompt" | tail puts stdin/stdout in pipe mode, which Ollama interprets as interactive REPL with no terminal — loads the model but waits forever on a non-arriving newline. For scripted use, hit the HTTP API directly:

Terminal window
curl -s http://localhost:11434/api/chat -d '{
"model":"irregularchat-v3-heretic",
"messages":[{"role":"user","content":"Your prompt"}],
"stream":false
}' | jq -r '.message.content'

Or echo "prompt" | ollama run model works correctly because stdin is no longer a TTY.

Open WebUI is slow on first load of a new model

Section titled “Open WebUI is slow on first load of a new model”

Ollama mmaps the GGUF into RAM on first use. For a 17GB Q4_K_M, this takes ~30-60 seconds. Subsequent requests on the same model are instant. The model stays loaded until idle for ~5 minutes by default (configurable in Ollama with OLLAMA_KEEP_ALIVE).