Files
nomad/stacks/ai/ai-backend
2025-12-29 18:38:18 -05:00
..
2025-12-29 18:38:18 -05:00

AI Backend (Ollama) Nomad Job

This Nomad job defines the deployment for an Ollama server, which provides a local large language model (LLM) serving environment. It is configured to run on a specific host with GPU acceleration using Vulkan.

What is this file?

The ai-backend.nomad file is a HashiCorp Nomad job specification written in HCL (HashiCorp Configuration Language). It describes how to deploy and manage the Ollama service.

Key configurations:

  • job "ai-backend": The main job definition.
  • datacenters = ["Homelab-PTECH-DC"]: Specifies the datacenter where this job should run.
  • group "ollama-group": Defines a group of tasks.
  • constraint { attribute = "${meta.device}"; value = "p52-laptop" }: Ensures the job runs on the node tagged with p52-laptop.
  • network { port "api" { static = 11434 } }: Exposes port 11434 for the Ollama API.
  • task "ollama": The actual task running the Ollama container.
    • driver = "podman": Uses Podman to run the container.
    • env: Environment variables for the Ollama container:
      • OLLAMA_HOST = "0.0.0.0:11434": Binds Ollama to all network interfaces on port 11434.
      • OLLAMA_ORIGINS = "*": Allows requests from any origin (CORS).
      • OLLAMA_VULKAN = "1": Enables Vulkan for GPU acceleration.
      • HSA_OVERRIDE_GFX_VERSION = "10.3.0": Fallback for ROCm, though Vulkan takes priority.
    • config: Podman-specific configuration:
      • image = "docker.io/ollama/ollama:latest": Uses the latest Ollama Docker image.
      • privileged = true: Grants extended privileges to the container, necessary for direct hardware access for GPU.
      • volumes: Mounts for persistent data and GPU devices:
        • "/mnt/local-ssd/nomad/stacks/ai/ai-backend/ollama:/root/.ollama": Persistent storage for Ollama models and data.
        • "/dev/kfd:/dev/kfd" and "/dev/dri:/dev/dri": Direct access to AMD GPU kernel driver and DRM (Direct Rendering Manager) devices for Vulkan.
  • service "ollama": Registers the Ollama service with Consul and Traefik.
    • tags = ["traefik.enable=true"]: Enables Traefik ingress for this service.

How to use it

To deploy this AI backend:

  1. Ensure you have a Nomad cluster running with a client node tagged p52-laptop that has Podman installed and appropriate GPU drivers.

  2. Make sure the directory /mnt/local-ssd/nomad/stacks/ai/ai-backend/ollama exists on the host for persistent data.

  3. Execute the following command on your Nomad server (or a machine with Nomad CLI access configured to connect to your server):

    nomad job run stacks/ai/ai-backend.nomad
    

After deployment, Ollama will be accessible on port 11434 on the host machine, and via Traefik if properly configured.

Projects Involved

  • HashiCorp Nomad: A workload orchestrator that enables an organization to easily deploy and manage any containerized or non-containerized application.
  • Ollama: A tool to run large language models locally.
  • Podman: A daemonless container engine for developing, managing, and running OCI containers on your Linux system.
  • Traefik: An open-source Edge Router that makes publishing your services a fun and easy experience. It receives requests and finds out which components are responsible for handling them.