Skip to content

timoncool/VoxCPM2_portable-pinokio

Repository files navigation

VoxCPM2 Portable — Pinokio launcher

One-click cross-platform installer for VoxCPM2 Portable — multilingual TTS with Voice Design, Cloning & end-to-end LoRA from video/audio. ElevenLabs at home.

Install on Pinokio Open in Pinokio Main repo Donate

Stars License Last Commit Issues Code size Pinokio

Windows Linux macOS NVIDIA AMD Python PyTorch Gradio

This repository is the Pinokio launcher for VoxCPM2 Portable — the actual app lives there. This repo only contains the scripts that Pinokio runs to install, start, update and reset the app in an isolated cross-platform environment.

Install

  1. Download and install Pinokio
  2. Open in Pinokio:
  3. Click Install inside Pinokio — it will clone the app, create a Python 3.12 venv, install PyTorch (right build for your GPU), pull the 4-5 GB VoxCPM2 model on first generation

What this launcher does

  • Isolated Python venv with Python 3.12 via uv — no system-wide installs
  • PyTorch auto-selected by GPU/OS — CUDA 12.8 (NVIDIA x64), CUDA 13.0 (aarch64), DirectML (AMD Win), ROCm 6.3 (AMD Linux), MPS/CPU (macOS), CPU fallback
  • Flash-Attention 2 cp312 wheels on NVIDIA Win/Linux (auto-skipped on unsupported GPUs, graceful SDPA fallback)
  • Triton + xformers 0.0.31.post1 pinned for torch 2.7.1 compat
  • Bundled Node.js + ffmpeg + CUDA from Pinokio's ai bundle (no separate downloads)
  • Gradio auto-picks next free port via kernel.port() — no conflicts
  • NO_AUTO_BROWSER=true env var — prevents duplicate system Chrome tab (upstream patch)
  • Cross-platform env isolation: HF_HOME, TRANSFORMERS_CACHE, TORCH_HOME, MODELSCOPE_CACHE, XDG_CACHE_HOME all point inside the launcher folder
  • Voice pack (~100 voices), VoxCPM2 model (~4-5 GB), Parakeet ASR (~670 MB, lazy) cached under app/models
  • Cross-platform: Windows / Linux x64 & aarch64 / macOS ARM & Intel

Launch modes

Menu item What it runs
Start python app.py with Gradio on auto-assigned port, full 4-tab UI (TTS / Voice Design / Cloning / LoRA)
Open Folder File explorer at Generated Audio / Models cache / Voice Pack / LoRA Checkpoints / Training Data
Update git pull launcher + app, then uv pip install -r requirements.txt --upgrade
Save Disk Space Dedup venv libraries via fs.link
Reset Wipe app/ folder (full pre-install state)

Platform support matrix

OS GPU Status Acceleration
Windows 10/11 NVIDIA RTX 40xx–50xx ✅ tested CUDA 12.8 + Triton + Flash-Attn 2 cp312
Windows 10/11 NVIDIA RTX 20xx–30xx ✅ expected CUDA 12.8 + Triton + xformers (SDPA fallback)
Linux x64 NVIDIA RTX 20xx–50xx ✅ expected CUDA 12.8 + Triton + Flash-Attn 2 cp312
Linux aarch64 NVIDIA DGX Spark / Jetson ✅ expected CUDA 13.0
Windows AMD RDNA3+ ✅ expected DirectML
Linux AMD RDNA3+ ✅ expected ROCm 6.3
macOS Apple Silicon M1–M4 ✅ expected MPS
macOS Intel ⚠️ CPU-only legacy torch 2.2.2 (no Intel-Mac wheels in newer torch)
Any CPU only ⚠️ very slow (minutes per phrase) CPU
Win/Linux NVIDIA GTX 10xx (Pascal) ⚠️ Flash-Attn unavailable CUDA 12.8 + SDPA only

Minimum: 8 GB VRAM on NVIDIA for comfortable generation. Recommended: RTX 3060+ with 12 GB VRAM.

Features (inside the app)

  • 30 languages TTS — RU / EN / ZH (+9 Chinese dialects) / AR / FR / DE / HI / IT / JA / KO / PT / ES + more
  • 48 kHz studio output via AudioVAE V2 super-resolution (16 → 48 kHz)
  • Voice Design — create voices from text descriptions (gender, age, tone, emotion, pace, accent), zero-shot
  • Voice Cloning — clone from 5-50 sec reference, ~100 voices bundled + 743 extra Russian voices on-demand
  • LoRA Auto Pipeline — drop video/podcast → ffmpeg → Parakeet TDT ASR → sentence-aware split → auto-tune → training, one click
  • LoRA Manual Mode — upload pre-cleaned clips + transcripts, official OpenBMB defaults
  • Hot-swap LoRAs across TTS / Voice Design / Cloning without restart
  • MP3 / WAV / FLAC / OGG output (MP3 default via bundled FFmpeg)
  • Live-streaming playback — audio starts playing during generation
  • i18n RU / EN interface, dark theme

Full feature list: App repo → README · Russian

Links

Support This Project

I build software and do research in AI and music/voice generation. Most of what I create is free and open source. Your donations allow me to keep creating and exploring without worrying about where the next meal comes from =)

All donation methods | dalink.to/nerual_dreming | boosty.to/neuro_art

  • BTC: 1E7dHL22RpyhJGVpcvKdbyZgksSYkYeEBC
  • ETH (ERC20): 0xb5db65adf478983186d4897ba92fe2c25c594a0c
  • USDT (TRC20): TQST9Lp2TjK6FiVkn4fwfGUee7NmkxEE7C

Author

License

MIT — same as the main VoxCPM2 Portable project. Base model VoxCPM2 is MIT-licensed by OpenBMB.


Star History

Star History Chart

About

One-click Pinokio launcher for VoxCPM2 Portable. Multilingual TTS (30 languages), Voice Design, Voice Cloning, end-to-end LoRA fine-tuning. Cross-platform (Win/Linux/macOS, NVIDIA/AMD/CPU).

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors