Skip to content

Voice cloning

Voice cloning takes a short audio sample and returns a voice_id you can pass to /ai/text-to-speech for any future synthesis.

  1. Capture a clean sample.

    Record 10 seconds of a single speaker in a quiet environment. Use one of WAV, MP3, M4A, or FLAC at 16 kHz or higher. See Audio inputs for full guidance.

  2. Send it to /ai/voice-clone.

    Terminal window
    curl -X POST https://api.gocommotion.com/ai/voice-clone \
    -H "X-API-Key: eak_live_your_key_here" \
    -F "file=@sample.wav" \
    -F "name=narrator-en"

    The response includes a stable voice_id like vc_01HXYZ....

  3. Use the voice_id in synthesis.

    Terminal window
    curl -X POST https://api.gocommotion.com/ai/text-to-speech \
    -H "X-API-Key: eak_live_your_key_here" \
    -H "Content-Type: application/json" \
    -d '{
    "text": "Hello from your cloned voice.",
    "voice_id": "vc_01HXYZ..."
    }' \
    --output greeting.mp3