Clone any voice—your own, a colleague, a character, or any speaker you like.
| Requirement | Specification |
|---|---|
| Duration | 10-20 seconds of speech |
| Format | WAV |
| Sample rate | 24kHz or higher |
| Channels | Mono |
| Content | Single speaker, no background noise |
| Speech | Clear articulation, natural pace |
# Install sox if needed
brew install sox
# Record (speak naturally, press Ctrl+C when done)
rec -r 24000 -c 1 voice_ref.wav trim 0 20# From Voice Memos, podcast clip, or any audio file
ffmpeg -i input.m4a -ar 24000 -ac 1 -t 20 voice_ref.wav- Open QuickTime Player
- File → New Audio Recording
- Record 10-20 seconds of speech
- Save and convert with ffmpeg:
ffmpeg -i recording.m4a -ar 24000 -ac 1 voice_ref.wav
- Environment: Record in a quiet room. Closets work great for dampening echo.
- Content: Read a paragraph naturally, as if explaining something to a colleague.
- Tone: Avoid whispering, shouting, or exaggerated emotions.
- Quality: Use a decent microphone if available. Built-in Mac mic works but external is better.
Replace the bundled voice with your recording:
cp /path/to/your/voice_ref.wav ~/.claude/plugins/marketplaces/claude-mlx-tts/assets/default_voice.wavIf you want faster inference at the cost of quality, edit scripts/tts-notify.py:
# Options (size/quality tradeoff):
MLX_MODEL = "mlx-community/chatterbox-turbo-fp16" # ~4GB, best quality (default)
MLX_MODEL = "mlx-community/chatterbox-turbo-8bit" # ~1GB, great quality
MLX_MODEL = "mlx-community/chatterbox-turbo-4bit" # ~500MB, acceptable quality