VoxCPM2 Portable — Pinokio launcher

One-click cross-platform installer for VoxCPM2 Portable — multilingual TTS with Voice Design, Cloning & end-to-end LoRA from video/audio. ElevenLabs at home.

This repository is the Pinokio launcher for VoxCPM2 Portable — the actual app lives there. This repo only contains the scripts that Pinokio runs to install, start, update and reset the app in an isolated cross-platform environment.

Install

Download and install Pinokio
Open in Pinokio:
- 🚀 Install (1-click) — direct install URL
- 📂 Browse app page — catalog page on beta.pinokio.co
Click Install inside Pinokio — it will clone the app, create a Python 3.12 venv, install PyTorch (right build for your GPU), pull the 4-5 GB VoxCPM2 model on first generation

What this launcher does

Isolated Python venv with Python 3.12 via uv — no system-wide installs
PyTorch auto-selected by GPU/OS — CUDA 12.8 (NVIDIA x64), CUDA 13.0 (aarch64), DirectML (AMD Win), ROCm 6.3 (AMD Linux), MPS/CPU (macOS), CPU fallback
Flash-Attention 2 cp312 wheels on NVIDIA Win/Linux (auto-skipped on unsupported GPUs, graceful SDPA fallback)
Triton + xformers 0.0.31.post1 pinned for torch 2.7.1 compat
Bundled Node.js + ffmpeg + CUDA from Pinokio's ai bundle (no separate downloads)
Gradio auto-picks next free port via kernel.port() — no conflicts
NO_AUTO_BROWSER=true env var — prevents duplicate system Chrome tab (upstream patch)
Cross-platform env isolation: HF_HOME, TRANSFORMERS_CACHE, TORCH_HOME, MODELSCOPE_CACHE, XDG_CACHE_HOME all point inside the launcher folder
Voice pack (~100 voices), VoxCPM2 model (~4-5 GB), Parakeet ASR (~670 MB, lazy) cached under app/models
Cross-platform: Windows / Linux x64 & aarch64 / macOS ARM & Intel

Launch modes

Menu item	What it runs
Start	`python app.py` with Gradio on auto-assigned port, full 4-tab UI (TTS / Voice Design / Cloning / LoRA)
Open Folder	File explorer at Generated Audio / Models cache / Voice Pack / LoRA Checkpoints / Training Data
Update	`git pull` launcher + app, then `uv pip install -r requirements.txt --upgrade`
Save Disk Space	Dedup venv libraries via `fs.link`
Reset	Wipe `app/` folder (full pre-install state)

Platform support matrix

OS	GPU	Status	Acceleration
Windows 10/11	NVIDIA RTX 40xx–50xx	✅ tested	CUDA 12.8 + Triton + Flash-Attn 2 cp312
Windows 10/11	NVIDIA RTX 20xx–30xx	✅ expected	CUDA 12.8 + Triton + xformers (SDPA fallback)
Linux x64	NVIDIA RTX 20xx–50xx	✅ expected	CUDA 12.8 + Triton + Flash-Attn 2 cp312
Linux aarch64	NVIDIA DGX Spark / Jetson	✅ expected	CUDA 13.0
Windows	AMD RDNA3+	✅ expected	DirectML
Linux	AMD RDNA3+	✅ expected	ROCm 6.3
macOS	Apple Silicon M1–M4	✅ expected	MPS
macOS	Intel	⚠️ CPU-only	legacy torch 2.2.2 (no Intel-Mac wheels in newer torch)
Any	CPU only	⚠️ very slow (minutes per phrase)	CPU
Win/Linux	NVIDIA GTX 10xx (Pascal)	⚠️ Flash-Attn unavailable	CUDA 12.8 + SDPA only

Minimum: 8 GB VRAM on NVIDIA for comfortable generation. Recommended: RTX 3060+ with 12 GB VRAM.

Features (inside the app)

30 languages TTS — RU / EN / ZH (+9 Chinese dialects) / AR / FR / DE / HI / IT / JA / KO / PT / ES + more
48 kHz studio output via AudioVAE V2 super-resolution (16 → 48 kHz)
Voice Design — create voices from text descriptions (gender, age, tone, emotion, pace, accent), zero-shot
Voice Cloning — clone from 5-50 sec reference, ~100 voices bundled + 743 extra Russian voices on-demand
LoRA Auto Pipeline — drop video/podcast → ffmpeg → Parakeet TDT ASR → sentence-aware split → auto-tune → training, one click
LoRA Manual Mode — upload pre-cleaned clips + transcripts, official OpenBMB defaults
Hot-swap LoRAs across TTS / Voice Design / Cloning without restart
MP3 / WAV / FLAC / OGG output (MP3 default via bundled FFmpeg)
Live-streaming playback — audio starts playing during generation
i18n RU / EN interface, dark theme

Full feature list: App repo → README · Russian

Links

🎙 App source / issues — VoxCPM2_portable
📰 Changelog — CHANGELOG.md
🧠 Base model — VoxCPM2 on HuggingFace · OpenBMB
🎧 ASR — Parakeet TDT 0.6B v3 (NVIDIA NeMo)
🚀 Pinokio — pinokio.co
🎵 Sister launcher — ACE-Step-Studio-pinokio (local AI music generation)

Support This Project

I build software and do research in AI and music/voice generation. Most of what I create is free and open source. Your donations allow me to keep creating and exploring without worrying about where the next meal comes from =)

All donation methods | dalink.to/nerual_dreming | boosty.to/neuro_art

BTC: 1E7dHL22RpyhJGVpcvKdbyZgksSYkYeEBC
ETH (ERC20): 0xb5db65adf478983186d4897ba92fe2c25c594a0c
USDT (TRC20): TQST9Lp2TjK6FiVkn4fwfGUee7NmkxEE7C

Author

Nerual Dreming — @timoncool · Telegram · neuro-cartel.com · ArtGeneration.me

License

MIT — same as the main VoxCPM2 Portable project. Base model VoxCPM2 is MIT-licensed by OpenBMB.

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
.github		.github
.gitignore		.gitignore
DONATE.md		DONATE.md
LICENSE		LICENSE
README.md		README.md
icon.jpg		icon.jpg
install.js		install.js
link.js		link.js
open_folder.js		open_folder.js
pinokio.js		pinokio.js
pinokio.json		pinokio.json
reset.js		reset.js
start.js		start.js
torch.js		torch.js
update.js		update.js

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

VoxCPM2 Portable — Pinokio launcher

Install

What this launcher does

Launch modes

Platform support matrix

Features (inside the app)

Links

Support This Project

Author

License

Star History

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

VoxCPM2 Portable — Pinokio launcher

Install

What this launcher does

Launch modes

Platform support matrix

Features (inside the app)

Links

Support This Project

Author

License

Star History

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages