Streaming speech-to-text
For live calls and dictation, stream audio over a WebSocket and receive partial transcripts as the speaker talks. Latency is typically under 300 ms.
Connection
wss://api.gocommotion.com/ai/speech-to-text/streamAuthenticate by passing your API key as the X-API-Key header during the WebSocket handshake.
-
Open the connection.
const ws = new WebSocket("wss://api.gocommotion.com/ai/speech-to-text/stream",[],{ headers: { "X-API-Key": "eak_live_your_key_here" } },); -
Send a config frame, then audio chunks.
The first message is a JSON config frame describing the audio format. Every subsequent message is a binary audio chunk - 16-bit PCM at the rate you declared.
ws.addEventListener("open", () => {ws.send(JSON.stringify({sample_rate: 16000,encoding: "pcm_s16le",language: "en",}));// Then pipe microphone chunks as binary frames.}); -
Read partial and final transcripts.
The server emits JSON messages of two kinds:
{ "type": "partial", "text": "..." }while a phrase is in progress, and{ "type": "final", "text": "..." }when a phrase ends.ws.addEventListener("message", (event) => {const message = JSON.parse(event.data);if (message.type === "final") {console.log("final:", message.text);}}); -
Close cleanly.
Send
{ "type": "close" }when you are done so the server flushes the final phrase before closing the socket.