Post

Deploy OmniVoice with Docker, Gradio, Traefik, and Authentik

Run OmniVoice as a protected Gradio voice-cloning service on an ARM64 homelab host with CPU PyTorch and persistent model cache.

Deploy OmniVoice with Docker, Gradio, Traefik, and Authentik

OmniVoice is a voice-cloning and speech-generation project that can be exposed as a private Gradio web UI. This deployment is designed for an ARM64 homelab host where GPU acceleration is not available, so it uses CPU-only PyTorch wheels, a persistent Hugging Face cache, and Authentik in front of the UI.

This service has since been retired from the live homelab, but the documentation is kept because the deployment pattern is useful for other private AI/Gradio services.

All hostnames and secrets below are placeholders. Replace omnivoice.example.com with your own hostname.

Generated voices and uploaded reference audio are sensitive. Keep the UI behind SSO or a private network, and do not expose it as an anonymous public web app.


What this service does

Route or access pattern:

1
2
3
https://omnivoice.example.com  -> Authentik -> OmniVoice Gradio UI
127.0.0.1:8001                 -> local debug bind
<tailnet-ip>:8001              -> optional private tailnet debug bind

Main components:

1
OmniVoice source, Python, Gradio, CPU-only PyTorch, torchaudio, Hugging Face cache, Docker Compose, Traefik, Authentik.

Runtime model:

1
2
3
4
5
Browser
  -> Traefik HTTPS
  -> Authentik forward-auth
  -> OmniVoice Gradio app on port 8001
  -> persistent /data model cache and generated outputs


Folder layout

Use one service folder.

1
2
3
4
5
6
7
8
9
10
/home/ubuntu/omnivoice/
├── docker-compose.yml
├── Dockerfile
├── .env
├── run_demo.py
├── src/                 # OmniVoice source checkout or copied package source
└── data/
    ├── hf-cache/
    ├── gradio-cache/
    └── outputs/

Create it:

1
2
mkdir -p /home/ubuntu/omnivoice/data
cd /home/ubuntu/omnivoice

Get the source

Edit files under /home/ubuntu/omnivoice.

Clone or copy the OmniVoice source into src:

1
2
cd /home/ubuntu/omnivoice
git clone https://github.com/k2-fsa/OmniVoice.git src

If the upstream repository structure changes, keep the Dockerfile aligned with the actual Python package location.


Environment file

Edit /home/ubuntu/omnivoice/.env.

OMNIVOICE_HOST=omnivoice.example.com
OMNIVOICE_SERVER_NAME=0.0.0.0
OMNIVOICE_SERVER_PORT=8001
OMNIVOICE_DEVICE=cpu
OMNIVOICE_LOAD_ASR=false
HF_HUB_DISABLE_XET=1
GRADIO_ANALYTICS_ENABLED=false

Why these values matter:

SettingReason
OMNIVOICE_DEVICE=cpuavoids assuming GPU/CUDA on ARM64 hosts
OMNIVOICE_LOAD_ASR=falselowers startup memory/latency by not loading ASR automatically
HF_HUB_DISABLE_XET=1avoids stalled Hugging Face Xet downloads on some hosts
GRADIO_ANALYTICS_ENABLED=falsekeeps the private service quieter

Gradio runner

Create /home/ubuntu/omnivoice/run_demo.py.

This wrapper imports the OmniVoice demo and forces the bind host/port from environment variables. Adjust the import path if upstream changes the demo entrypoint.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
import os

server_name = os.getenv("OMNIVOICE_SERVER_NAME", "0.0.0.0")
server_port = int(os.getenv("OMNIVOICE_SERVER_PORT", "8001"))

# Example shape. Replace this import with the real OmniVoice demo launcher
# if the upstream repository changes.
from src.demo import demo

if __name__ == "__main__":
    demo.launch(
        server_name=server_name,
        server_port=server_port,
        share=False,
        show_api=False,
    )

If OmniVoice ships a different launcher, keep this file as the place where you normalize Gradio networking for Docker.


Dockerfile

Edit /home/ubuntu/omnivoice/Dockerfile.

This Dockerfile is optimized for CPU/ARM64 hosts. The important detail is the explicit CPU PyTorch index.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
FROM python:3.11-slim

ENV PYTHONUNBUFFERED=1     PIP_NO_CACHE_DIR=1     HF_HOME=/data/hf-cache     TRANSFORMERS_CACHE=/data/hf-cache/transformers     GRADIO_TEMP_DIR=/data/gradio-cache

RUN apt-get update     && apt-get install -y --no-install-recommends        build-essential        ffmpeg        git        libsndfile1        curl     && rm -rf /var/lib/apt/lists/*

WORKDIR /app

COPY src /app/src
COPY run_demo.py /app/run_demo.py

RUN python -m pip install --upgrade pip setuptools wheel     && python -m pip install --index-url https://download.pytorch.org/whl/cpu        'torch==2.11.0+cpu' 'torchaudio==2.11.0+cpu'     && python -m pip install        'transformers>=5.3.0'        accelerate        pydub        gradio        tensorboardX        webdataset        numpy        soundfile        librosa     && python -m pip install --no-deps /app/src

RUN useradd --create-home --uid 1001 appuser     && mkdir -p /data/hf-cache /data/gradio-cache /data/outputs     && chown -R appuser:appuser /data /app

USER appuser
EXPOSE 8001

HEALTHCHECK --interval=60s --timeout=10s --start-period=1800s --retries=5   CMD curl -fsS http://127.0.0.1:8001/ >/dev/null || exit 1

CMD ["python", "/app/run_demo.py"]

If you have a CUDA-capable x86 host, you can use a different base image and PyTorch install path. Do not use CUDA wheels on a CPU-only ARM64 host.


Docker Compose stack

Edit /home/ubuntu/omnivoice/docker-compose.yml.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
services:
  omnivoice:
    build:
      context: .
      dockerfile: Dockerfile
    image: local/omnivoice:latest
    container_name: omnivoice
    restart: unless-stopped
    init: true
    env_file: .env
    shm_size: "2gb"
    volumes:
      - /home/ubuntu/omnivoice/data:/data
    ports:
      - "127.0.0.1:8001:8001"
      - "100.64.0.2:8001:8001"
    networks:
      - proxy
    labels:
      - "traefik.enable=true"
      - "traefik.docker.network=proxy"

      - "traefik.http.middlewares.omnivoice-https.redirectscheme.scheme=https"
      - "traefik.http.routers.omnivoice.entrypoints=http"
      - "traefik.http.routers.omnivoice.rule=Host(`${OMNIVOICE_HOST}`)"
      - "traefik.http.routers.omnivoice.middlewares=omnivoice-https"

      - "traefik.http.routers.omnivoice-secure.entrypoints=https"
      - "traefik.http.routers.omnivoice-secure.rule=Host(`${OMNIVOICE_HOST}`)"
      - "traefik.http.routers.omnivoice-secure.tls=true"
      - "traefik.http.routers.omnivoice-secure.tls.certresolver=cloudflare"
      - "traefik.http.routers.omnivoice-secure.middlewares=authentik@docker"
      - "traefik.http.routers.omnivoice-secure.service=omnivoice"
      - "traefik.http.services.omnivoice.loadbalancer.server.port=8001"

networks:
  proxy:
    external: true

The 100.64.0.2 bind is an example tailnet bind. Replace it with your server’s private tailnet IP or remove that line if you only need local/Traefik access.


Authentik outpost route

If your Authentik forward-auth setup needs explicit outpost routing, edit /home/ubuntu/authentik/docker-compose.yml and add labels under the Authentik server service:

1
2
3
4
5
6
      - "traefik.http.routers.authentik-omnivoice-outpost.entrypoints=https"
      - "traefik.http.routers.authentik-omnivoice-outpost.rule=Host(`omnivoice.example.com`) && PathPrefix(`/outpost.goauthentik.io/`)"
      - "traefik.http.routers.authentik-omnivoice-outpost.priority=100"
      - "traefik.http.routers.authentik-omnivoice-outpost.tls=true"
      - "traefik.http.routers.authentik-omnivoice-outpost.tls.certresolver=cloudflare"
      - "traefik.http.routers.authentik-omnivoice-outpost.service=authentik"

Restart Authentik after editing:

1
2
cd /home/ubuntu/authentik
docker compose up -d

Start the service

Run from /home/ubuntu/omnivoice:

1
2
3
4
cd /home/ubuntu/omnivoice

docker compose build
docker compose up -d

Follow logs:

1
docker compose logs -f omnivoice

The first run can take a long time because model weights may download into /home/ubuntu/omnivoice/data/hf-cache.


Verify it

Check the local health endpoint:

1
curl -I http://127.0.0.1:8001/

Check the container:

1
2
docker compose ps
docker inspect omnivoice --format '{json .State.Health}' | jq

Open the browser route:

1
https://omnivoice.example.com

Expected result:

1
Authentik login -> Gradio OmniVoice UI

Operating notes

Model cache

The cache path is intentionally persistent:

1
/home/ubuntu/omnivoice/data/hf-cache

Do not delete it unless you want to redownload model weights.

Outputs

Generated audio should go under:

1
/home/ubuntu/omnivoice/data/outputs

Back it up only if you want to keep generated outputs.

Memory pressure

If the container gets killed during startup:

1
2
docker logs --tail=200 omnivoice
dmesg -T | grep -i 'killed process\|oom'

Reduce loaded models or keep ASR disabled.


Retiring the service while keeping docs

If you no longer want OmniVoice running, keep this documentation and remove the live deployment.

Run:

1
2
cd /home/ubuntu/omnivoice
docker compose down --remove-orphans

Then remove the service folder only if you do not need cached models or generated outputs:

1
rm -rf /home/ubuntu/omnivoice

Also remove any dashboard entry from your Homepage configuration and remove the Authentik outpost labels for omnivoice.example.com if you added them manually.


Troubleshooting

The build installs huge CUDA packages

Make sure the Dockerfile uses:

1
python -m pip install --index-url https://download.pytorch.org/whl/cpu   'torch==2.11.0+cpu' 'torchaudio==2.11.0+cpu'

Hugging Face download stalls

Set this in /home/ubuntu/omnivoice/.env:

HF_HUB_DISABLE_XET=1

Then recreate the container:

1
docker compose up -d --build

The UI loads but generation fails

Check:

1
2
3
docker logs --tail=300 omnivoice
docker exec omnivoice df -h /data
docker exec omnivoice python -c "import torch; print(torch.__version__, torch.cuda.is_available())"

On a CPU-only ARM64 host, torch.cuda.is_available() should be False.


Security checklist

  • UI protected by Authentik.
  • No anonymous public Gradio sharing.
  • Persistent cache mounted under /data.
  • Reference audio handled as sensitive data.
  • Generated outputs cleaned or backed up intentionally.
  • Service retired cleanly if no longer needed.
This post is licensed under CC BY 4.0 by the author.