Ollama: Local LLMs Made Easy
Tools Β· 8 min
Ollama makes local Large Language Models accessible. No cloud, no API costs, no data flowing anywhere. In 5 minutes you have your own AI chat running on your hardware.
What is Ollama?
Ollama is a CLI tool to run LLMs locally. Supports 132+ models (Llama, Mistral, CodeLlama, etc.) and runs on macOS (with Metal GPU acceleration), Linux and Windows (via WSL2).
Supported Models (Selection)
Llama 3.2
1B, 3B, 8B, 70B β Text and Vision
Mistral
7B β Fast, efficient
Phi
3.5B β Small but smart
CodeLlama
7B, 13B, 34B β Coding specialized
Qwen
0.5B to 72B β multilingual
Gemma
2B, 7B β Google
Installation
macOS
brew install ollamaLinux/WSL2
curl -fsSL https://ollama.com/install.sh | shDocker (our recommendation)
docker run -d \
--name ollama \
-v ollama_data:/root/.ollama \
-p 11434:11434 \
ollama/ollama:latestDownload Models
The first model is downloaded automatically on first start. You can also preload models explicitly:
# Download model
ollama pull llama3.2
# List available models
ollama list
# Show model info
ollama show llama3.2Recommended Starter Models
| Model | Size | VRAM | Use Case |
|---|---|---|---|
| phi3:3.8b | 2.3 GB | ~4 GB | Fast, beginner |
| llama3.2:1b | 1.3 GB | ~2 GB | Lightweight, fast |
| llama3.2:3b | 3.8 GB | ~6 GB | Balance |
| llama3.1:8b | 4.7 GB | ~8 GB | Advanced |
| mistral:7b | 4.1 GB | ~8 GB | Coding, reasoning |
Using Ollama
Interactive Chat
ollama run llama3.2REST API
Ollama provides a REST API on port 11434:
# Chat
curl -X POST http://localhost:11434/api/chat \
-d '{
"model": "llama3.2",
"messages": [
{ "role": "user", "content": "Hello!" }
]
}'
# Generate (single response)
curl -X POST http://localhost:11434/api/generate \
-d '{
"model": "llama3.2",
"prompt": "What is Docker?"
}'GPU Configuration
Ollama automatically uses available GPUs. For Docker, the GPU needs to be passed through:
# NVIDIA GPU
docker run -d --gpus all \
--name ollama \
-v ollama_data:/root/.ollama \
-p 11434:11434 \
ollama/ollama:latest
# Or with docker-compose.yml
services:
ollama:
image: ollama/ollama:latest
volumes:
- ollama_data:/root/.ollama
ports:
- "11434:11434"
deploy:
resources:
reservations:
devices:
- capabilities: [gpu]Our Docker Swarm Setup
In our 3-node Swarm, Ollama runs on the GPU node (docker-swarm3):
services:
ollama:
image: ollama/ollama:latest
volumes:
- ollama_data:/root/.ollama
ports:
- "11434:11434"
deploy:
resources:
reservations:
devices:
- capabilities: [gpu]
placement:
constraints:
- node.hostname == docker-swarm3
networks:
- ai-networkWeb Interface: Open WebUI
For a ChatGPT-like interface, we use Open WebUI:
services:
open-webui:
image: open-webui/open-webui:main
ports:
- "3000:8080"
environment:
- OLLAMA_BASE_URL=http://ollama:11434
volumes:
- open-webui_data:/app/backend/data
depends_on:
- ollama
networks:
- ai-networkNext Steps
- β’ Set up RAG: RAG Complete Guide β
- β’ Compare models: Test multiple models in parallel
- β’ Monitoring: Enable Prometheus metrics
Next step: ship workflows that stay operable
Use proven n8n patterns, templates and integrations for workflows that stay local, documented, and auditable.
- Local and self-hosted by default
- Documented and auditable
- Built from our own runtime
- Made in Austria