One-click cross-platform installer for VoxCPM2 Portable — multilingual TTS with Voice Design, Cloning & end-to-end LoRA from video/audio. ElevenLabs at home.
This repository is the Pinokio launcher for VoxCPM2 Portable — the actual app lives there. This repo only contains the scripts that Pinokio runs to install, start, update and reset the app in an isolated cross-platform environment.
- Download and install Pinokio
- Open in Pinokio:
- 🚀 Install (1-click) — direct install URL
- 📂 Browse app page — catalog page on beta.pinokio.co
- Click Install inside Pinokio — it will clone the app, create a Python 3.12 venv, install PyTorch (right build for your GPU), pull the 4-5 GB VoxCPM2 model on first generation
- Isolated Python
venvwith Python 3.12 via uv — no system-wide installs - PyTorch auto-selected by GPU/OS — CUDA 12.8 (NVIDIA x64), CUDA 13.0 (aarch64), DirectML (AMD Win), ROCm 6.3 (AMD Linux), MPS/CPU (macOS), CPU fallback
- Flash-Attention 2 cp312 wheels on NVIDIA Win/Linux (auto-skipped on unsupported GPUs, graceful SDPA fallback)
- Triton + xformers 0.0.31.post1 pinned for torch 2.7.1 compat
- Bundled Node.js + ffmpeg + CUDA from Pinokio's
aibundle (no separate downloads) - Gradio auto-picks next free port via
kernel.port()— no conflicts NO_AUTO_BROWSER=trueenv var — prevents duplicate system Chrome tab (upstream patch)- Cross-platform env isolation:
HF_HOME,TRANSFORMERS_CACHE,TORCH_HOME,MODELSCOPE_CACHE,XDG_CACHE_HOMEall point inside the launcher folder - Voice pack (~100 voices), VoxCPM2 model (~4-5 GB), Parakeet ASR (~670 MB, lazy) cached under
app/models - Cross-platform: Windows / Linux x64 & aarch64 / macOS ARM & Intel
| Menu item | What it runs |
|---|---|
| Start | python app.py with Gradio on auto-assigned port, full 4-tab UI (TTS / Voice Design / Cloning / LoRA) |
| Open Folder | File explorer at Generated Audio / Models cache / Voice Pack / LoRA Checkpoints / Training Data |
| Update | git pull launcher + app, then uv pip install -r requirements.txt --upgrade |
| Save Disk Space | Dedup venv libraries via fs.link |
| Reset | Wipe app/ folder (full pre-install state) |
| OS | GPU | Status | Acceleration |
|---|---|---|---|
| Windows 10/11 | NVIDIA RTX 40xx–50xx | ✅ tested | CUDA 12.8 + Triton + Flash-Attn 2 cp312 |
| Windows 10/11 | NVIDIA RTX 20xx–30xx | ✅ expected | CUDA 12.8 + Triton + xformers (SDPA fallback) |
| Linux x64 | NVIDIA RTX 20xx–50xx | ✅ expected | CUDA 12.8 + Triton + Flash-Attn 2 cp312 |
| Linux aarch64 | NVIDIA DGX Spark / Jetson | ✅ expected | CUDA 13.0 |
| Windows | AMD RDNA3+ | ✅ expected | DirectML |
| Linux | AMD RDNA3+ | ✅ expected | ROCm 6.3 |
| macOS | Apple Silicon M1–M4 | ✅ expected | MPS |
| macOS | Intel | legacy torch 2.2.2 (no Intel-Mac wheels in newer torch) | |
| Any | CPU only | CPU | |
| Win/Linux | NVIDIA GTX 10xx (Pascal) | CUDA 12.8 + SDPA only |
Minimum: 8 GB VRAM on NVIDIA for comfortable generation. Recommended: RTX 3060+ with 12 GB VRAM.
- 30 languages TTS — RU / EN / ZH (+9 Chinese dialects) / AR / FR / DE / HI / IT / JA / KO / PT / ES + more
- 48 kHz studio output via AudioVAE V2 super-resolution (16 → 48 kHz)
- Voice Design — create voices from text descriptions (gender, age, tone, emotion, pace, accent), zero-shot
- Voice Cloning — clone from 5-50 sec reference, ~100 voices bundled + 743 extra Russian voices on-demand
- LoRA Auto Pipeline — drop video/podcast → ffmpeg → Parakeet TDT ASR → sentence-aware split → auto-tune → training, one click
- LoRA Manual Mode — upload pre-cleaned clips + transcripts, official OpenBMB defaults
- Hot-swap LoRAs across TTS / Voice Design / Cloning without restart
- MP3 / WAV / FLAC / OGG output (MP3 default via bundled FFmpeg)
- Live-streaming playback — audio starts playing during generation
- i18n RU / EN interface, dark theme
Full feature list: App repo → README · Russian
- 🎙 App source / issues — VoxCPM2_portable
- 📰 Changelog — CHANGELOG.md
- 🧠 Base model — VoxCPM2 on HuggingFace · OpenBMB
- 🎧 ASR — Parakeet TDT 0.6B v3 (NVIDIA NeMo)
- 🚀 Pinokio — pinokio.co
- 🎵 Sister launcher — ACE-Step-Studio-pinokio (local AI music generation)
I build software and do research in AI and music/voice generation. Most of what I create is free and open source. Your donations allow me to keep creating and exploring without worrying about where the next meal comes from =)
All donation methods | dalink.to/nerual_dreming | boosty.to/neuro_art
- BTC:
1E7dHL22RpyhJGVpcvKdbyZgksSYkYeEBC - ETH (ERC20):
0xb5db65adf478983186d4897ba92fe2c25c594a0c - USDT (TRC20):
TQST9Lp2TjK6FiVkn4fwfGUee7NmkxEE7C
- Nerual Dreming — @timoncool · Telegram · neuro-cartel.com · ArtGeneration.me
MIT — same as the main VoxCPM2 Portable project. Base model VoxCPM2 is MIT-licensed by OpenBMB.