Files
nomad/stacks/ai/ai-backend/README.md
2025-12-29 18:38:18 -05:00

51 lines
3.4 KiB
Markdown

# AI Backend (Ollama) Nomad Job
This Nomad job defines the deployment for an Ollama server, which provides a local large language model (LLM) serving environment. It is configured to run on a specific host with GPU acceleration using Vulkan.
## What is this file?
The [`ai-backend.nomad`](stacks/ai/ai-backend.nomad) file is a HashiCorp Nomad job specification written in HCL (HashiCorp Configuration Language). It describes how to deploy and manage the Ollama service.
Key configurations:
- **`job "ai-backend"`**: The main job definition.
- **`datacenters = ["Homelab-PTECH-DC"]`**: Specifies the datacenter where this job should run.
- **`group "ollama-group"`**: Defines a group of tasks.
- **`constraint { attribute = "${meta.device}"; value = "p52-laptop" }`**: Ensures the job runs on the node tagged with `p52-laptop`.
- **`network { port "api" { static = 11434 } }`**: Exposes port 11434 for the Ollama API.
- **`task "ollama"`**: The actual task running the Ollama container.
- **`driver = "podman"`**: Uses Podman to run the container.
- **`env`**: Environment variables for the Ollama container:
- `OLLAMA_HOST = "0.0.0.0:11434"`: Binds Ollama to all network interfaces on port 11434.
- `OLLAMA_ORIGINS = "*"`: Allows requests from any origin (CORS).
- `OLLAMA_VULKAN = "1"`: Enables Vulkan for GPU acceleration.
- `HSA_OVERRIDE_GFX_VERSION = "10.3.0"`: Fallback for ROCm, though Vulkan takes priority.
- **`config`**: Podman-specific configuration:
- `image = "docker.io/ollama/ollama:latest"`: Uses the latest Ollama Docker image.
- `privileged = true`: Grants extended privileges to the container, necessary for direct hardware access for GPU.
- `volumes`: Mounts for persistent data and GPU devices:
- `"/mnt/local-ssd/nomad/stacks/ai/ai-backend/ollama:/root/.ollama"`: Persistent storage for Ollama models and data.
- `"/dev/kfd:/dev/kfd"` and `"/dev/dri:/dev/dri"`: Direct access to AMD GPU kernel driver and DRM (Direct Rendering Manager) devices for Vulkan.
- **`service "ollama"`**: Registers the Ollama service with Consul and Traefik.
- `tags = ["traefik.enable=true"]`: Enables Traefik ingress for this service.
## How to use it
To deploy this AI backend:
1. Ensure you have a Nomad cluster running with a client node tagged `p52-laptop` that has Podman installed and appropriate GPU drivers.
2. Make sure the directory `/mnt/local-ssd/nomad/stacks/ai/ai-backend/ollama` exists on the host for persistent data.
3. Execute the following command on your Nomad server (or a machine with Nomad CLI access configured to connect to your server):
```bash
nomad job run stacks/ai/ai-backend.nomad
```
After deployment, Ollama will be accessible on port 11434 on the host machine, and via Traefik if properly configured.
## Projects Involved
- **[HashiCorp Nomad](https://www.nomadproject.io/)**: A workload orchestrator that enables an organization to easily deploy and manage any containerized or non-containerized application.
- **[Ollama](https://ollama.com/)**: A tool to run large language models locally.
- **[Podman](https://podman.io/)**: A daemonless container engine for developing, managing, and running OCI containers on your Linux system.
- **[Traefik](https://traefik.io/traefik/)**: An open-source Edge Router that makes publishing your services a fun and easy experience. It receives requests and finds out which components are responsible for handling them.