Skip to content

Voice AI

Commotion Voice AI is a single gateway to eleven speech and language APIs. Authenticate once with an X-API-Key header, get billed against your Voice AI credit wallet, and call any of the endpoints below.

Jump to the Quickstart to make your first call in under five minutes.

APIs

Speech-to-Text

Transcribe audio files. Returns plain text or per-segment output.

Text-to-Speech

Synthesize natural speech from text. Default voices and cloned voices both supported.

Voice Clone

Clone a voice from a 10-second audio sample. Returns a reusable voice_id.

Voice Design

Compose new synthetic voices from style and timbre parameters - no audio sample needed.

Speech Translation

Translate spoken audio across languages while preserving prosody.

Text Translation

Translate text between supported languages with terminology controls.

Reasoning

Long-context reasoning over text, transcripts, and structured input.

Speech-to-Speech

Patent-pending. Convert one speaker’s voice into another while preserving content and emotion.

Streaming Text

Token-streaming text output for low-latency conversational interfaces.

Language Intelligence

Intent, entity, sentiment, and topic detection across speech and text inputs.

Knowledge & Memory

Persistent retrieval over your own corpora and per-user conversation memory.