Streaming speech-to-text

For live calls and dictation, stream audio over a WebSocket and receive partial transcripts as the speaker talks. Latency is typically under 300 ms.

Connection

wss://api.gocommotion.com/ai/speech-to-text/stream

Authenticate by passing your API key as the X-API-Key header during the WebSocket handshake.

Open the connection.

const ws = new WebSocket(
  "wss://api.gocommotion.com/ai/speech-to-text/stream",
  [],
  { headers: { "X-API-Key": "eak_live_your_key_here" } },
);

Send a config frame, then audio chunks.

The first message is a JSON config frame describing the audio format. Every subsequent message is a binary audio chunk - 16-bit PCM at the rate you declared.

ws.addEventListener("open", () => {
  ws.send(JSON.stringify({
    sample_rate: 16000,
    encoding: "pcm_s16le",
    language: "en",
  }));
  // Then pipe microphone chunks as binary frames.
});

Read partial and final transcripts.

The server emits JSON messages of two kinds: { "type": "partial", "text": "..." } while a phrase is in progress, and { "type": "final", "text": "..." } when a phrase ends.
```
ws.addEventListener("message", (event) => {
  const message = JSON.parse(event.data);
  if (message.type === "final") {
    console.log("final:", message.text);
  }
});
```
Close cleanly.

Send { "type": "close" } when you are done so the server flushes the final phrase before closing the socket.