Deploy OmniVoice with Docker, Gradio, Traefik, and Authentik
Run OmniVoice as a protected Gradio voice-cloning service on an ARM64 homelab host with CPU PyTorch and persistent model cache.
OmniVoice is a voice-cloning and speech-generation project that can be exposed as a private Gradio web UI. This deployment is designed for an ARM64 homelab host where GPU acceleration is not available, so it uses CPU-only PyTorch wheels, a persistent Hugging Face cache, and Authentik in front of the UI.
This service has since been retired from the live homelab, but the documentation is kept because the deployment pattern is useful for other private AI/Gradio services.
All hostnames and secrets below are placeholders. Replace omnivoice.example.com with your own hostname.
Generated voices and uploaded reference audio are sensitive. Keep the UI behind SSO or a private network, and do not expose it as an anonymous public web app.
What this service does
Route or access pattern:
1
2
3
https://omnivoice.example.com -> Authentik -> OmniVoice Gradio UI
127.0.0.1:8001 -> local debug bind
<tailnet-ip>:8001 -> optional private tailnet debug bind
Main components:
1
OmniVoice source, Python, Gradio, CPU-only PyTorch, torchaudio, Hugging Face cache, Docker Compose, Traefik, Authentik.
Runtime model:
1
2
3
4
5
Browser
-> Traefik HTTPS
-> Authentik forward-auth
-> OmniVoice Gradio app on port 8001
-> persistent /data model cache and generated outputs
Related documentation
- Deploy ComfyUI AI Workflows with Docker, Traefik, and Authentik
- Deploy OpenClaw Gateway with Docker, Traefik and Authentik
- Deploy Traefik Reverse Proxy with Docker and Cloudflare
- Deploy Authentik SSO with Docker and Traefik
Folder layout
Use one service folder.
1
2
3
4
5
6
7
8
9
10
/home/ubuntu/omnivoice/
├── docker-compose.yml
├── Dockerfile
├── .env
├── run_demo.py
├── src/ # OmniVoice source checkout or copied package source
└── data/
├── hf-cache/
├── gradio-cache/
└── outputs/
Create it:
1
2
mkdir -p /home/ubuntu/omnivoice/data
cd /home/ubuntu/omnivoice
Get the source
Edit files under /home/ubuntu/omnivoice.
Clone or copy the OmniVoice source into src:
1
2
cd /home/ubuntu/omnivoice
git clone https://github.com/k2-fsa/OmniVoice.git src
If the upstream repository structure changes, keep the Dockerfile aligned with the actual Python package location.
Environment file
Edit /home/ubuntu/omnivoice/.env.
OMNIVOICE_HOST=omnivoice.example.com
OMNIVOICE_SERVER_NAME=0.0.0.0
OMNIVOICE_SERVER_PORT=8001
OMNIVOICE_DEVICE=cpu
OMNIVOICE_LOAD_ASR=false
HF_HUB_DISABLE_XET=1
GRADIO_ANALYTICS_ENABLED=false
Why these values matter:
| Setting | Reason |
|---|---|
OMNIVOICE_DEVICE=cpu | avoids assuming GPU/CUDA on ARM64 hosts |
OMNIVOICE_LOAD_ASR=false | lowers startup memory/latency by not loading ASR automatically |
HF_HUB_DISABLE_XET=1 | avoids stalled Hugging Face Xet downloads on some hosts |
GRADIO_ANALYTICS_ENABLED=false | keeps the private service quieter |
Gradio runner
Create /home/ubuntu/omnivoice/run_demo.py.
This wrapper imports the OmniVoice demo and forces the bind host/port from environment variables. Adjust the import path if upstream changes the demo entrypoint.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
import os
server_name = os.getenv("OMNIVOICE_SERVER_NAME", "0.0.0.0")
server_port = int(os.getenv("OMNIVOICE_SERVER_PORT", "8001"))
# Example shape. Replace this import with the real OmniVoice demo launcher
# if the upstream repository changes.
from src.demo import demo
if __name__ == "__main__":
demo.launch(
server_name=server_name,
server_port=server_port,
share=False,
show_api=False,
)
If OmniVoice ships a different launcher, keep this file as the place where you normalize Gradio networking for Docker.
Dockerfile
Edit /home/ubuntu/omnivoice/Dockerfile.
This Dockerfile is optimized for CPU/ARM64 hosts. The important detail is the explicit CPU PyTorch index.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
FROM python:3.11-slim
ENV PYTHONUNBUFFERED=1 PIP_NO_CACHE_DIR=1 HF_HOME=/data/hf-cache TRANSFORMERS_CACHE=/data/hf-cache/transformers GRADIO_TEMP_DIR=/data/gradio-cache
RUN apt-get update && apt-get install -y --no-install-recommends build-essential ffmpeg git libsndfile1 curl && rm -rf /var/lib/apt/lists/*
WORKDIR /app
COPY src /app/src
COPY run_demo.py /app/run_demo.py
RUN python -m pip install --upgrade pip setuptools wheel && python -m pip install --index-url https://download.pytorch.org/whl/cpu 'torch==2.11.0+cpu' 'torchaudio==2.11.0+cpu' && python -m pip install 'transformers>=5.3.0' accelerate pydub gradio tensorboardX webdataset numpy soundfile librosa && python -m pip install --no-deps /app/src
RUN useradd --create-home --uid 1001 appuser && mkdir -p /data/hf-cache /data/gradio-cache /data/outputs && chown -R appuser:appuser /data /app
USER appuser
EXPOSE 8001
HEALTHCHECK --interval=60s --timeout=10s --start-period=1800s --retries=5 CMD curl -fsS http://127.0.0.1:8001/ >/dev/null || exit 1
CMD ["python", "/app/run_demo.py"]
If you have a CUDA-capable x86 host, you can use a different base image and PyTorch install path. Do not use CUDA wheels on a CPU-only ARM64 host.
Docker Compose stack
Edit /home/ubuntu/omnivoice/docker-compose.yml.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
services:
omnivoice:
build:
context: .
dockerfile: Dockerfile
image: local/omnivoice:latest
container_name: omnivoice
restart: unless-stopped
init: true
env_file: .env
shm_size: "2gb"
volumes:
- /home/ubuntu/omnivoice/data:/data
ports:
- "127.0.0.1:8001:8001"
- "100.64.0.2:8001:8001"
networks:
- proxy
labels:
- "traefik.enable=true"
- "traefik.docker.network=proxy"
- "traefik.http.middlewares.omnivoice-https.redirectscheme.scheme=https"
- "traefik.http.routers.omnivoice.entrypoints=http"
- "traefik.http.routers.omnivoice.rule=Host(`${OMNIVOICE_HOST}`)"
- "traefik.http.routers.omnivoice.middlewares=omnivoice-https"
- "traefik.http.routers.omnivoice-secure.entrypoints=https"
- "traefik.http.routers.omnivoice-secure.rule=Host(`${OMNIVOICE_HOST}`)"
- "traefik.http.routers.omnivoice-secure.tls=true"
- "traefik.http.routers.omnivoice-secure.tls.certresolver=cloudflare"
- "traefik.http.routers.omnivoice-secure.middlewares=authentik@docker"
- "traefik.http.routers.omnivoice-secure.service=omnivoice"
- "traefik.http.services.omnivoice.loadbalancer.server.port=8001"
networks:
proxy:
external: true
The 100.64.0.2 bind is an example tailnet bind. Replace it with your server’s private tailnet IP or remove that line if you only need local/Traefik access.
Authentik outpost route
If your Authentik forward-auth setup needs explicit outpost routing, edit /home/ubuntu/authentik/docker-compose.yml and add labels under the Authentik server service:
1
2
3
4
5
6
- "traefik.http.routers.authentik-omnivoice-outpost.entrypoints=https"
- "traefik.http.routers.authentik-omnivoice-outpost.rule=Host(`omnivoice.example.com`) && PathPrefix(`/outpost.goauthentik.io/`)"
- "traefik.http.routers.authentik-omnivoice-outpost.priority=100"
- "traefik.http.routers.authentik-omnivoice-outpost.tls=true"
- "traefik.http.routers.authentik-omnivoice-outpost.tls.certresolver=cloudflare"
- "traefik.http.routers.authentik-omnivoice-outpost.service=authentik"
Restart Authentik after editing:
1
2
cd /home/ubuntu/authentik
docker compose up -d
Start the service
Run from /home/ubuntu/omnivoice:
1
2
3
4
cd /home/ubuntu/omnivoice
docker compose build
docker compose up -d
Follow logs:
1
docker compose logs -f omnivoice
The first run can take a long time because model weights may download into /home/ubuntu/omnivoice/data/hf-cache.
Verify it
Check the local health endpoint:
1
curl -I http://127.0.0.1:8001/
Check the container:
1
2
docker compose ps
docker inspect omnivoice --format '{json .State.Health}' | jq
Open the browser route:
1
https://omnivoice.example.com
Expected result:
1
Authentik login -> Gradio OmniVoice UI
Operating notes
Model cache
The cache path is intentionally persistent:
1
/home/ubuntu/omnivoice/data/hf-cache
Do not delete it unless you want to redownload model weights.
Outputs
Generated audio should go under:
1
/home/ubuntu/omnivoice/data/outputs
Back it up only if you want to keep generated outputs.
Memory pressure
If the container gets killed during startup:
1
2
docker logs --tail=200 omnivoice
dmesg -T | grep -i 'killed process\|oom'
Reduce loaded models or keep ASR disabled.
Retiring the service while keeping docs
If you no longer want OmniVoice running, keep this documentation and remove the live deployment.
Run:
1
2
cd /home/ubuntu/omnivoice
docker compose down --remove-orphans
Then remove the service folder only if you do not need cached models or generated outputs:
1
rm -rf /home/ubuntu/omnivoice
Also remove any dashboard entry from your Homepage configuration and remove the Authentik outpost labels for omnivoice.example.com if you added them manually.
Troubleshooting
The build installs huge CUDA packages
Make sure the Dockerfile uses:
1
python -m pip install --index-url https://download.pytorch.org/whl/cpu 'torch==2.11.0+cpu' 'torchaudio==2.11.0+cpu'
Hugging Face download stalls
Set this in /home/ubuntu/omnivoice/.env:
HF_HUB_DISABLE_XET=1
Then recreate the container:
1
docker compose up -d --build
The UI loads but generation fails
Check:
1
2
3
docker logs --tail=300 omnivoice
docker exec omnivoice df -h /data
docker exec omnivoice python -c "import torch; print(torch.__version__, torch.cuda.is_available())"
On a CPU-only ARM64 host, torch.cuda.is_available() should be False.
Security checklist
- UI protected by Authentik.
- No anonymous public Gradio sharing.
- Persistent cache mounted under
/data. - Reference audio handled as sensitive data.
- Generated outputs cleaned or backed up intentionally.
- Service retired cleanly if no longer needed.