How to Deploy Open WebUI with Ollama on Ubuntu 24.04

Intermediate
Updated Mar 24, 202616 min read~30 minutes total
AI
Docker
Ubuntu
Self-Hosted
Automation
Containers

On This Page

Prerequisites

A Raff VM running Ubuntu 24.04 with at least 4 vCPU and 8 GB RAM (CPU-Optimized Tier 4 or higher), SSH access configured, a non-root user with sudo privileges (initial server setup), a registered domain name with DNS A record pointing to your server

Don't have a server yet? Deploy a Raff VM in 60 seconds.

Deploy a VM

Introduction

Open WebUI is an open-source, self-hosted web interface that gives you a ChatGPT-style experience with any large language model. Combined with Ollama — a local LLM runtime — you can run AI models entirely on your own server with no data leaving your infrastructure, no per-token API costs, and no vendor dependency.

Open WebUI has over 90,000 GitHub stars and supports conversation history, multi-model switching, document uploads for retrieval-augmented generation (RAG), multi-user accounts with role-based access, and a clean interface that feels familiar to anyone who has used ChatGPT. Ollama handles the model management layer — downloading, running, and serving models through a local REST API.

In this tutorial, you will install Ollama on your Raff Ubuntu 24.04 VM, deploy Open WebUI using Docker Compose, configure Nginx as a reverse proxy, secure the deployment with HTTPS, and pull your first LLM to start chatting. By the end, you will have a private AI assistant running on infrastructure you control.

Step 1 — Installing Ollama

Ollama runs as a native service on your server (not in Docker) so it can directly access CPU and RAM without container overhead. This is important for LLM inference performance.

Download and install Ollama using the official installer:

bashcurl -fsSL https://ollama.com/install.sh | sh

The installer downloads the Ollama binary, creates a dedicated ollama system user, and configures a systemd service that starts automatically on boot.

Verify Ollama is running:

bashsudo systemctl status ollama

You should see active (running). Test the Ollama API:

bashcurl http://localhost:11434

Expected output: Ollama is running

By default, Ollama listens only on 127.0.0.1:11434, which means it is not accessible from outside the server. This is the correct security posture — Open WebUI will connect to it locally.

Step 2 — Pulling Your First Language Model

Ollama supports dozens of open-source LLMs. The model you choose depends on your server's RAM. Here is a quick guide:

ModelParametersSizeRAM RequiredBest For
llama3.23B2.0 GB4 GBFast responses, simple tasks
llama3.18B4.7 GB8 GBGeneral chat, coding help
gemma34B3.3 GB6 GBCompact, multilingual
mistral7B4.1 GB8 GBFast, strong coding ability
qwen38B4.9 GB8 GBMultilingual, reasoning
llama3.1:70b70B40 GB48+ GBAdvanced reasoning, long context

Browse the full model list at ollama.com/library.

For a Raff VM with 8 GB RAM (CPU-Optimized Tier 4), start with the 8B parameter Llama 3.1 model. If you are logged in as your sudo user, run:

bashollama pull llama3.1
bashexport HOME=/root

For non-root users, use export HOME=/home/your_username instead. The export keyword is required — without it, child processes like Ollama cannot see the variable.

The download size is approximately 4.7 GB. Download time depends on your connection speed — on a Raff VM with unmetered bandwidth, this typically takes 2-5 minutes.

Verify the model was downloaded:

bashollama list

You should see llama3.1:latest with its size and modification date. Test the model from the command line:

bashollama run llama3.1 "What is Ubuntu?"

You should see a text response generated by the model. Press Ctrl+C to exit the interactive mode. The model loads into RAM on first use and stays cached for subsequent requests.

Note

Larger models produce higher-quality responses but require more RAM. If your server has 16 GB RAM or more, you can run 14B or larger models for noticeably better output. Start with an 8B model and upgrade later as needed.

Step 3 — Installing Docker and Deploying Open WebUI

Open WebUI runs as a Docker container. If you have already installed Docker by following our Docker installation tutorial, skip the Docker installation commands and go directly to the Docker Compose configuration.

Install Docker Engine:

bashsudo apt update
sudo apt install -y ca-certificates curl gnupg
sudo install -m 0755 -d /etc/apt/keyrings
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo gpg --dearmor -o /etc/apt/keyrings/docker.gpg
sudo chmod a+r /etc/apt/keyrings/docker.gpg
echo "deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.gpg] https://download.docker.com/linux/ubuntu $(. /etc/os-release && echo $VERSION_CODENAME) stable" | sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
sudo apt update
sudo apt install -y docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin
sudo usermod -aG docker $USER
newgrp docker

Create a project directory for Open WebUI:

bashmkdir -p ~/open-webui && cd ~/open-webui

Create the Docker Compose file:

bashnano ~/open-webui/compose.yaml

Paste the following configuration:

yamlservices:
  open-webui:
    image: ghcr.io/open-webui/open-webui:main
    container_name: open-webui
    restart: unless-stopped
    ports:
      - "127.0.0.1:3000:8080"
    volumes:
      - open-webui-data:/app/backend/data
    environment:
      - OLLAMA_BASE_URL=http://host.docker.internal:11434
      - WEBUI_SECRET_KEY=${WEBUI_SECRET_KEY}
      - WEBUI_AUTH=true
      - ENABLE_SIGNUP=false
    extra_hosts:
      - "host.docker.internal:host-gateway"

volumes:
  open-webui-data:

Important configuration details:

  • 127.0.0.1:3000:8080 — Binds Open WebUI to localhost only. All external access goes through Nginx.
  • OLLAMA_BASE_URL=http://host.docker.internal:11434 — Tells Open WebUI how to reach Ollama running on the host. The extra_hosts directive maps host.docker.internal to the host's gateway IP, which is required on Linux.
  • ENABLE_SIGNUP=false — Disables public registration. The first user to log in becomes the admin. Disable signups after creating your account to prevent unauthorized access.
  • WEBUI_SECRET_KEY — Used for session encryption. Set this via an environment variable.

Generate a secret key and create a .env file:

bashecho "WEBUI_SECRET_KEY=$(openssl rand -hex 32)" > ~/open-webui/.env
chmod 600 ~/open-webui/.env

Start Open WebUI:

bashcd ~/open-webui
docker compose up -d

Docker pulls the Open WebUI image (approximately 2 GB) and starts the container. Check that it is running:

bashdocker compose ps

You should see open-webui with status Up. Test local connectivity:

bashcurl -s http://localhost:3000 | head -5

You should see HTML output from the Open WebUI application.

Step 4 — Configuring Nginx as a Reverse Proxy

Nginx serves as the public-facing entry point, handling TLS termination and proxying requests to Open WebUI.

Install Nginx if it is not already installed:

bashsudo apt install -y nginx

Create an Nginx server block:

bashsudo nano /etc/nginx/sites-available/open-webui

Paste the following configuration, replacing ai.example.com with your actual domain:

nginxserver {
    listen 80;
    listen [::]:80;
    server_name ai.example.com;

    location / {
        proxy_pass http://127.0.0.1:3000;
        proxy_http_version 1.1;
        proxy_set_header Upgrade $http_upgrade;
        proxy_set_header Connection "upgrade";
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header X-Forwarded-Proto $scheme;
        proxy_read_timeout 600s;
        proxy_send_timeout 600s;
        client_max_body_size 100M;
    }
}

Critical settings for Open WebUI:

  • WebSocket headers — Open WebUI uses WebSockets for streaming model responses in real time. Without the Upgrade and Connection headers, you will see responses appear all at once instead of streaming token by token.
  • proxy_read_timeout 600s — LLM inference can take 30-120 seconds for longer prompts. The default 60-second timeout would cut off responses mid-generation.
  • client_max_body_size 100M — Allows uploading documents for RAG. The default 1 MB limit blocks most file uploads.

Enable the site and test:

bashsudo ln -s /etc/nginx/sites-available/open-webui /etc/nginx/sites-enabled/
sudo nginx -t
sudo systemctl reload nginx

If you have UFW configured, allow web traffic:

bashsudo ufw allow 'Nginx Full'

Step 5 — Securing with HTTPS

LLM conversations often contain sensitive information — code, business data, personal questions. Serving Open WebUI over plain HTTP means this data travels unencrypted. Use Certbot to add HTTPS.

Install Certbot:

bashsudo apt install -y certbot python3-certbot-nginx

Obtain and install the certificate:

bashsudo certbot --nginx -d ai.example.com

Certbot will verify your domain ownership, obtain a certificate from Let's Encrypt, and update your Nginx configuration to redirect HTTP to HTTPS.

Verify HTTPS is working:

bashcurl -I https://ai.example.com

You should see a 200 response with security headers. Certbot sets up automatic renewal, so your certificate renews before it expires without manual intervention.

Tip

Verify the renewal timer is active with sudo systemctl status certbot.timer. You can also test renewal with sudo certbot renew --dry-run.

Step 6 — Creating Your Admin Account and First Chat

Open https://ai.example.com in your browser. You should see the Open WebUI login page with a padlock icon confirming HTTPS.

Click Sign Up to create the first account. This account automatically becomes the administrator with full control over the instance — model management, user creation, and system settings.

After logging in, you land on the chat interface. The model selector dropdown at the top should show llama3.1:latest (or whichever model you pulled in Step 2).

Type a message to test:

Explain what a reverse proxy is in simple terms.

You should see the response stream in token by token, similar to the ChatGPT experience. If the response appears all at once instead of streaming, the WebSocket configuration in your Nginx proxy is not working — revisit Step 4.

Explore the admin settings at Settings > Admin Panel:

  • Models — Pull additional models directly from the UI
  • Users — Create accounts for team members
  • Documents — Upload files for RAG (retrieval-augmented generation)

Step 7 — Managing Models and Updating

You can manage models both from the command line and from the Open WebUI interface.

Pull additional models from the command line:

bashollama pull mistral:7b
ollama pull gemma3:4b

List all downloaded models:

bashollama list

Remove a model you no longer need:

bashollama rm gemma3:4b

To update Open WebUI to the latest version:

bashcd ~/open-webui
docker compose pull
docker compose up -d

To update Ollama:

bashcurl -fsSL https://ollama.com/install.sh | sh

The installer detects the existing installation and upgrades in place. Your downloaded models are preserved.

Monitor resource usage to ensure your server is not running out of RAM:

bashfree -h
htop

When Ollama loads a model, it occupies RAM proportional to the model size. An 8B model uses approximately 5-6 GB of RAM. If you see swap usage increasing, consider upgrading to a larger VM or using a smaller model.

Tip

Ollama automatically unloads models from memory after 5 minutes of inactivity. You can change this with the OLLAMA_KEEP_ALIVE environment variable. Set it to -1 to keep models loaded indefinitely, or to 30m for 30 minutes.

Conclusion

You have deployed a private AI assistant on your Raff Ubuntu 24.04 VM with Ollama serving open-source language models locally, Open WebUI providing a ChatGPT-style browser interface, Nginx proxying requests with WebSocket support for real-time streaming, and HTTPS encryption via Let's Encrypt for secure access.

All conversations and data stay on your server — nothing is sent to external APIs. You have full control over which models to run, who can access the interface, and how the system is configured.

From here, you can:

  • Add more models for different tasks — coding assistants, creative writing, multilingual support
  • Upload documents and use RAG to chat with your own data
  • Create accounts for team members who need private AI access
  • Set up automated backups to protect your conversation history and configurations

For LLM workloads, Raff's CPU-Optimized VMs with dedicated AMD EPYC processors provide consistent inference performance without noisy-neighbor interference. The NVMe SSD storage handles model loading quickly, and unmetered bandwidth means model downloads and API traffic never incur overage charges.

Get notified when we publish new tutorials

Cloud tips, step-by-step guides, and infrastructure insights — straight to your inbox.

Frequently Asked Questions

Ready to get started?

Deploy an Ubuntu 24.04 VM and follow along in under 60 seconds.

Deploy a VM Now