Deploy Kioku on RunPod with a two-pod architecture: always-on CPU stateful pod + ephemeral GPU bot pods.
Architecture
┌──────────────────────────────────────────────────────┐
│ kioku-stateful pod (CPU, always-on) │
│ supervisord │
│ ├── postgres ├── qdrant ├── redis │
│ ├── ollama ├── minio ├── hivemind │
│ ├── vexa api-gateway ├── vexa-meeting-api │
│ ├── vexa-admin-api ├── vexa-agent-api │
│ ├── vexa-mcp ├── vexa-tts-service │
│ └── runtime-api (ORCHESTRATOR_BACKEND=runpod) │
│ Exposed: 22, 6379, 8080, 9100, 8056 │
└────────────┬──────────────────────────────────────────┘
│ RunPod REST API (create/stop pod)
▼
┌──────────────────────────────────┐
│ kioku-stateless pod (GPU) │
│ ├── Whisper transcription (GPU) │
│ └── Vexa bot (Playwright) │
│ Lives ~1 meeting, then exits │
└──────────────────────────────────┘
Images
| Image | Registry | Pod Type |
|---|
kyomoto/kioku-stateful:latest | Docker Hub | CPU, always-on |
kyomoto/kioku-stateless:latest | Docker Hub | GPU, ephemeral |
Images are built automatically by GitHub Actions on push to master.
Deploy
cd deployment/runpod
cp .env.example .env
# Fill in RUNPOD_API_KEY, secrets, domain
$EDITOR .env
./deploy.sh
The script creates a CPU pod with all stateful services via runpodctl pod create --compute-type cpu.
Security
- Redis — AUTH password required (
REDIS_PASSWORD), port 6379 exposed publicly
- Postgres — NOT exposed publicly (internal only)
- MinIO/Qdrant — NOT exposed publicly (internal only)
- meeting-api — Uses
INTERNAL_API_SECRET for internal callbacks
- Cloudflared — Optional, tunnels public traffic to hivemind/vexa-gateway
Cost
| Resource | Rate | Monthly (24/7) |
|---|
| Stateful CPU pod | ~$0.10-0.20/hr | ~$72-144 |
| Bot GPU pod (per meeting) | ~$0.27-0.46/hr | per-meeting |
| Container disk (20GB) | $0.10/GB/mo | ~$2 |
Bot pods only cost money while a meeting is in progress. A 1-hour meeting costs ~$0.27-0.46 in GPU compute.
Bot Pod Lifecycle
- Spawn:
POST /vexa/bots → runtime-api calls RunPod REST API → GPU pod created
- Boot: Pod pulls image (~30-60s), starts Whisper + bot
- Meeting: Bot joins Google Meet/Zoom/Teams, transcribes
- Exit: Bot exits → reaper detects (15s poll) → pod deleted
Bot pod startup latency is ~30-60s vs ~2s for Docker Compose. Plan accordingly for time-sensitive meetings.