Mimic — Estado atual e progresso
Estado — Mimic
Spike 4 COMPLETO — Pipeline IA LIVE end-to-end (2026-05-14)
Sistema produção rodando. User logga, grava voz, dispara video gen, recebe mp4 dublado em <1min.
S4a Voice transcribe (LIVE)
- Whisper base no scraper VPS (CPU int8)
/api/voicesPOST -> auto-transcribe -> voice.status=ready com metadata.refText
S4b LoRA train (PROVEN)
- RunPod endpoint
iwau73ni3uu456(A40 48GB Medium Supply) - Worker
ghcr.io/kodama1/mimic-worker-lora:latest(diffusers + PEFT + SDXL base) - 12 fotos -> 500 steps -> 44MB safetensors. ~10min, ~$0.30/avatar.
S4c Video gen V1 (PROVEN)
- RunPod endpoint
rb6t98cug8ysil(A40 48GB) - Worker
ghcr.io/kodama1/mimic-worker-video:latest(torch 2.5.1 + f5-tts 1.1.5 + faster-whisper + yt-dlp + ffmpeg) - Pipeline: yt-dlp -> ffmpeg audio -> Whisper transcribe -> F5-TTS clone -> ffmpeg mux -> upload mp4
- 30s execution warm. ~$0.02/video.
- Teste real: TikTok PT-BR ("dois mil seguidores em sete dias no TikTok Shop") dublado com voz EN clonada -> 8.7MB mp4 entregue
Live URLs
- App: https://mimic.kodama.solutions
- API: https://mimic-api.kodama.solutions
- CDN: https://mimic-cdn.kodama.solutions
- Repo: https://github.com/kodama1/mimic
Bugs corrigidos no caminho S4
- S3 SDK v3 checksum -> MinIO 403. Fix:
requestChecksumCalculation: WHEN_REQUIRED - SigV4 host validation. Fix: subdomain dedicado
mimic-cdnsem path rewrite - transformers 4.47 + torch 2.4 schema_infer. Fix: torch 2.5.1
- F5-TTS API mudou:
F5TTS()no-arg ctor - RunPod 10min default -> bump 1800s no endpoint LoRA
- Whisper VAD removia audio sem fala. Fix: fallback no-VAD + filler
- RunPod cacheia imagem em workers ativos. Fix: delete worker manual após release
Proximos passos (S5)
- Face swap real (Roop/ReActor + LoRA inference): output com rosto do user trocado
- Billing Stripe + créditos
- Consent onboarding (selfie + código)
- Watermark invisível anti-misuse
Spike 4a — Voice transcribe (2026-05-13)
Entregue
mimic-scraperganhou faster-whisper (CPU, int8) e endpointPOST /transcribe- WHISPER_MODEL=base configurado no VPS (~145MB, qualidade boa em PT-BR e EN)
/api/voicesPOST agora transcreve audio durante upload e:- Marca
voice.status='ready'se transcricao OK - Salva
metadata.refText,metadata.language,metadata.transcribedAt
- Marca
- Voice page mostra preview da transcricao + badge ready/pending
- Volume
mimic_whisper_cacheevita re-download em recreates
Testado live
Audio: espeak-ng "hello world this is a test of mimic voice transcription"
POST /api/voices -> {
voice: { status: "ready", durationSeconds: 3, refAudioPath: "voices/.../wN805ZX9lmp7.wav" },
transcription: { text: "Hello world, this is a test of mimic voice transcription.",
language: "en", duration: 3.49 }
}
Spike 4b — LoRA training (em progresso, 2026-05-13)
Entregue (codigo)
apps/worker-lora/: RunPod serverless handler em Python- SDXL base + PEFT LoRA (rank 16, ~1000 steps, ~10-15min em A100 80GB)
- Baixa fotos do MinIO (URLs publicas), treina, sobe
.safetensorsvia signed PUT - Callback
POST /api/webhooks/runpodcom Bearer token validado
/api/avatars/[id]/train:- Gera
callback_tokenper-job (nanoid 32) - Gera signed PUT URL via
presignPut(reescreve host MinIO interno -> public proxy viaSTORAGE_PUBLIC_ENDPOINT) - Dispatch via
dispatch()que chama RunPod /run real seRUNPOD_API_KEYsetado, senao stub - Guarda metadata.callbackToken/triggerWord/loraKey/runpodJobId
- Gera
/api/webhooks/runpod:- Valida Bearer == metadata.callbackToken
- Atualiza avatar.status (ready/failed),
loraPath,readyAt
- Nginx /storage/ aceita PUT ate 500MB (LoRA upload)
- GH Actions: workflow nova job
worker-lora->ghcr.io/kodama1/mimic-worker-lora:latest
Pendente
- CI buildando imagem GPU agora (gh run 25769742978). Imagem deve ficar ~5-8GB
- USER: criar endpoint RunPod Serverless apontando para
ghcr.io/kodama1/mimic-worker-lora:latest- GPU: A100 80GB ou H100 80GB
- Min workers: 0, max: 1-2
- Pegar endpoint ID, gravar em VPS
.envcomoRUNPOD_LORA_ENDPOINT
- Apos: testar avatar real
Status containers VPS
mimic-postgres healthy (9 tables)
mimic-redis up
mimic-minio up (bucket "mimic", anonymous download)
mimic-scraper healthy (whisper base loaded)
mimic-api up
mimic-web up (Better-Auth + bearer + RunPod dispatch real)
Pendencias do user
- DNS GoDaddy:
mimic.kodama.solutions -> 187.127.24.217mimic-api.kodama.solutions -> 187.127.24.217
- Apos DNS propagar:
ssh root@187.127.24.217 'bash /home/mimic/src/scripts/issue-cert.sh mimic-api.kodama.solutions' ssh root@187.127.24.217 'bash /home/mimic/src/scripts/issue-cert.sh mimic.kodama.solutions' - Criar endpoint RunPod Serverless:
- https://www.runpod.io/console/serverless
- Template: Custom -> Container image
ghcr.io/kodama1/mimic-worker-lora:latest - GPU: A100 80GB ou H100 80GB
- Container disk: 20GB+, Network volume opcional (20GB para cache HF)
- Min: 0, Max: 1-2, Idle timeout: 5s
- Pegar endpoint ID + adicionar em
.env:RUNPOD_LORA_ENDPOINT=<id> - Restart:
cd /home/mimic/src && docker compose -f infra/compose/docker-compose.vps.yml --env-file .env up -d mimic-web
Proximos passos (S4c)
Video gen worker:
- Considerar 2 paths:
- Premium: MimicMotion / AnimateAnyone (pose-driven, qualidade alta, GPU intensa)
- Shortcut: face swap simples + lip-sync (Roop + MuseTalk, custo baixo)
- Pipeline: TikTok URL -> Demucs (separa voz/musica) -> Whisper (transcricao) -> F5-TTS (TTS com voz clonada) -> motion extract -> avatar swap -> lip sync -> compose final
- POST /api/jobs cria record, dispatch worker
- UI: pagina /jobs/[id] com progresso step-by-step
Decisoes operacionais novas (S4)
- Whisper roda CPU no scraper VPS (faster-whisper, int8, ~5x realtime). Nao precisa GPU pra transcrever ref audio.
- LoRA training usa SDXL base (licenca aberta, comercial OK). Flux seria melhor mas tem clausulas restritivas.
- Hyperparams: rank=16, steps=1000, resolution=1024, lr=1e-4, batch=1. Conservador, evita overfit em 10-50 fotos.
- Trigger word
mimic<id4>-- evita conflito com tokens existentes do CLIP. - Callback validado por token per-job (nanoid 32 em metadata.callbackToken), nao secret global. Reduz blast radius se vazar.
- Signed PUT URL com TTL 4h (training pode demorar). MinIO V4 signature funciona com endpoint publico via host rewrite.
- Imagem worker-lora vai pro GHCR via Actions (kodama1 org). Workflow trigger por path
apps/worker-lora/**ou manual.
Custos esperados (quando RunPod ativo)
| Operacao | GPU | Tempo | Custo |
|---|---|---|---|
| Voice transcribe | CPU VPS | ~3-10s | $0 (selfhost) |
| LoRA train (avatar) | A100 80GB | 10-15min | $0.40-0.60 |
| F5-TTS inference (audio gen) | L4/T4 | 5-10s/100chars | $0.005 |
| MimicMotion video (S4c) | A100 80GB | 3-5min | $0.20-0.30 |
Avatar onboarding total: ~$0.50
Video gen end-to-end: ~$0.30-0.50