Install Ollama in 5 Minutes — Step by Step (Windows, Mac, Linux)

Step	Action	Time
1	Install Ollama (winget / brew / curl)	1 min
2	Ollama starts as background service	automatic
3	`ollama run qwen3.5:4b` — download model	2-3 min
4	Interactive chat session in terminal	ready
5	API available at `localhost:11434`	ready
6	Verify with `curl localhost:11434/api/tags`	10 sec

5 minutes. Then qwen3.5:4b runs on your own hardware.

No cloud account, no API costs, no data being uploaded anywhere. Tested on Windows 11, macOS Sonoma, and Ubuntu 24.04.

What Is Ollama?

Ollama is a local LLM runner — a program that executes language models on your own hardware and exposes an OpenAI-compatible API. That means: tools built for ChatGPT work with Ollama without modification. We run 3 Ollama instances on different hardware in 24/7 operation. This is the setup that works.

Installation

Windows 11

Option 1: winget (recommended)

winget install Ollama.Ollama

Option 2: Direct download Download the installer from ollama.com/download and run it. Ollama then runs as a Windows service in the background.

GPU support (NVIDIA) is detected automatically if CUDA drivers are installed — no additional configuration needed. AMD GPUs are supported via ROCm — details in the Ollama GitHub.

macOS (Sonoma, Ventura, Monterey)

brew install ollama

Without Homebrew: direct download at ollama.com/download/mac. Apple Silicon (M1/M2/M3/M4) is fully supported — the integrated GPU is used automatically.

Linux (Ubuntu 24.04, Debian, Fedora)

curl -fsSL https://ollama.com/install.sh | sh

The installer sets up Ollama as a systemd service. After installation, Ollama starts automatically at boot. GPU support for NVIDIA and AMD is detected if drivers are present.

First Test: Load and Run a Model

ollama run qwen3.5:4b

On the first run, the model downloads (~2.5 GB). Then an interactive chat session starts directly in the terminal:

>>> Explain Docker in one sentence.
Docker is a platform that packages applications in isolated containers so they
run the same everywhere — regardless of the host system.

>>> /bye

/bye ends the session. The model stays stored locally and is immediately available again.

The API runs in parallel at http://localhost:11434. Test it:

curl http://localhost:11434/api/tags

This returns all locally available models as JSON.

Which Model for Which VRAM?

VRAM is the limiting factor — not RAM. If your GPU doesn't have enough VRAM, the model continues running on the CPU (noticeably slower, but functional).

VRAM	Recommended Model	Download Size	Context
4 GB	`qwen3.5:4b`	~2.5 GB	256K tokens
8 GB	`qwen3.5:8b`	~5 GB	256K tokens
16 GB	`qwen3.5:14b`	~9 GB	256K tokens
24 GB	`qwen3.5:27b`	~17 GB	256K tokens

We use qwen3.5:27b on an RTX 3090 (24 GB) as our primary model — Ollama Model Library lists all available models with size information.

No dedicated GPU VRAM? No problem. qwen3.5:4b runs on CPU too — slower, but perfectly fine for first tests. On a modern laptop processor that's roughly 3-8 tokens per second.

Model Management

# Show all locally available models
ollama list

# Download a model without immediately starting it
ollama pull llama3.2:3b

# Remove a model
ollama rm llama3.2:3b

Models are stored at ~/.ollama/models (Linux/macOS) or C:\Users\<name>\.ollama\models (Windows). On an SSD with at least 20 GB free space we recommend qwen3.5:4b plus a second model for comparison.

What's Next?

Ollama is running. The API responds. That's the foundation. What's still missing is a browser interface so you can chat without a terminal — and a clean configuration so Ollama reliably starts after a reboot.

Continue to Step 4: Set Up a Browser Interface with Open WebUI →

Or go straight to the complete setup — the Local AI Playbook P1 (EUR 49) includes pre-configured Docker Compose files for Ollama + Open WebUI + monitoring, detailed instructions for all operating systems, and the complete stack we run in production ourselves.

Sources: ollama.com — official documentation. github.com/ollama/ollama — source code and GPU support details. ollama.com/library — complete model library with size information and benchmarks.

Install Ollama in 5 Minutes — Step by Step (Windows, Mac, Linux)

Install Ollama in 5 Minutes — Step by Step (Windows, Mac, Linux)

What Is Ollama?

Installation

Windows 11

macOS (Sonoma, Ventura, Monterey)

Linux (Ubuntu 24.04, Debian, Fedora)

First Test: Load and Run a Model

Which Model for Which VRAM?

Model Management

What's Next?

Verwandte Artikel

Ollama installieren in 5 Minuten — Schritt für Schritt (Windows, Mac, Linux)

Your First Local AI Chatbot: Set Up Open WebUI in 10 Minutes

What Is a Large Language Model? Explained Without Buzzwords

Nächster Schritt: vom Wissen in die Umsetzung