Speech-to-Text
Transcribe audio files. Returns plain text or per-segment output.
Commotion Voice AI is a single gateway to eleven speech and language APIs. Authenticate once with an X-API-Key header, get billed against your Voice AI credit wallet, and call any of the endpoints below.
Jump to the Quickstart to make your first call in under five minutes.
Speech-to-Text
Transcribe audio files. Returns plain text or per-segment output.
Text-to-Speech
Synthesize natural speech from text. Default voices and cloned voices both supported.
Voice Clone
Clone a voice from a 10-second audio sample. Returns a reusable voice_id.
Voice Design
Compose new synthetic voices from style and timbre parameters - no audio sample needed.
Speech Translation
Translate spoken audio across languages while preserving prosody.
Text Translation
Translate text between supported languages with terminology controls.
Reasoning
Long-context reasoning over text, transcripts, and structured input.
Speech-to-Speech
Patent-pending. Convert one speaker’s voice into another while preserving content and emotion.
Streaming Text
Token-streaming text output for low-latency conversational interfaces.
Language Intelligence
Intent, entity, sentiment, and topic detection across speech and text inputs.
Knowledge & Memory
Persistent retrieval over your own corpora and per-user conversation memory.