Appearance
Getting started with llama-cpp-python Docker image
Get started with llama-cpp-python in minutes using the official Docker image. Learn how to download models, start the container, and more.
Official Docker image
llama-cpp-python provides the repository ghcr.io/abetlen/llama-cpp-python which supports two architectures: amd64 and arm64.
Here are image details:
- Base image: Python 3 (Debian)
- Working directory:
/app - Environment variables:
HOST=0.0.0.0,PORT=8000 - Exposed port:
8000 - Default command:
/bin/sh /app/docker/simple/run.sh
The script /app/docker/simple/run.sh builds the application and runs a Uvicorn server listening on $HOST and $PORT.
Quick start
Download models
To download a GGUF file from the Hugging Face Hub, follow these steps:
- Go to the Hugging Face Hub (https://huggingface.co/models).
- Search for the model you want to download (e.g.
lmstudio-community/Llama-3.2-1B-Instruct-GGUF). - Look for the GGUF file you want to download (e.g.
Llama-3.2-1B-Instruct-Q4_K_M.gguf). - Click the "Download" icon.
Start the container
To start the container, run the following command:
sh
docker run --rm -it -p 8000:8000 -v /path/to/models:/models -e MODEL=/models/Llama-3.2-1B-Instruct-Q4_K_M.gguf ghcr.io/abetlen/llama-cpp-python:latestThe command starts the server accessible at http://localhost:8000. Replace /path/to/models with the actual path to your models directory.
Docker Compose
You can also use Docker Compose to define and run a container with the configuration file (compose.yaml).
yaml
services:
llama-cpp-python:
image: ghcr.io/abetlen/llama-cpp-python:latest
ports:
- 8000:8000
volumes:
- /path/to/models:/models
environment:
MODEL: /models/Llama-3.2-1B-Instruct-Q4_K_M.gguf