Skip to main content
Deploy Kioku on RunPod with a two-pod architecture: always-on CPU stateful pod + ephemeral GPU bot pods.

Architecture

┌──────────────────────────────────────────────────────┐
│         kioku-stateful pod (CPU, always-on)            │
│  supervisord                                           │
│  ├── postgres    ├── qdrant    ├── redis              │
│  ├── ollama      ├── minio     ├── hivemind           │
│  ├── vexa api-gateway  ├── vexa-meeting-api            │
│  ├── vexa-admin-api   ├── vexa-agent-api              │
│  ├── vexa-mcp          ├── vexa-tts-service            │
│  └── runtime-api (ORCHESTRATOR_BACKEND=runpod)        │
│  Exposed: 22, 6379, 8080, 9100, 8056                  │
└────────────┬──────────────────────────────────────────┘
             │ RunPod REST API (create/stop pod)

┌──────────────────────────────────┐
│   kioku-stateless pod (GPU)       │
│  ├── Whisper transcription (GPU)  │
│  └── Vexa bot (Playwright)        │
│  Lives ~1 meeting, then exits     │
└──────────────────────────────────┘

Images

ImageRegistryPod Type
kyomoto/kioku-stateful:latestDocker HubCPU, always-on
kyomoto/kioku-stateless:latestDocker HubGPU, ephemeral
Images are built automatically by GitHub Actions on push to master.

Deploy

cd deployment/runpod
cp .env.example .env
# Fill in RUNPOD_API_KEY, secrets, domain
$EDITOR .env

./deploy.sh
The script creates a CPU pod with all stateful services via runpodctl pod create --compute-type cpu.

Security

  • Redis — AUTH password required (REDIS_PASSWORD), port 6379 exposed publicly
  • Postgres — NOT exposed publicly (internal only)
  • MinIO/Qdrant — NOT exposed publicly (internal only)
  • meeting-api — Uses INTERNAL_API_SECRET for internal callbacks
  • Cloudflared — Optional, tunnels public traffic to hivemind/vexa-gateway

Cost

ResourceRateMonthly (24/7)
Stateful CPU pod~$0.10-0.20/hr~$72-144
Bot GPU pod (per meeting)~$0.27-0.46/hrper-meeting
Container disk (20GB)$0.10/GB/mo~$2
Bot pods only cost money while a meeting is in progress. A 1-hour meeting costs ~$0.27-0.46 in GPU compute.

Bot Pod Lifecycle

  1. Spawn: POST /vexa/bots → runtime-api calls RunPod REST API → GPU pod created
  2. Boot: Pod pulls image (~30-60s), starts Whisper + bot
  3. Meeting: Bot joins Google Meet/Zoom/Teams, transcribes
  4. Exit: Bot exits → reaper detects (15s poll) → pod deleted
Bot pod startup latency is ~30-60s vs ~2s for Docker Compose. Plan accordingly for time-sensitive meetings.