Skip to content

Streaming speech-to-text

For live calls and dictation, stream audio over a WebSocket and receive partial transcripts as the speaker talks. Latency is typically under 300 ms.

Connection

wss://api.gocommotion.com/ai/speech-to-text/stream

Authenticate by passing your API key as the X-API-Key header during the WebSocket handshake.

  1. Open the connection.

    const ws = new WebSocket(
    "wss://api.gocommotion.com/ai/speech-to-text/stream",
    [],
    { headers: { "X-API-Key": "eak_live_your_key_here" } },
    );
  2. Send a config frame, then audio chunks.

    The first message is a JSON config frame describing the audio format. Every subsequent message is a binary audio chunk - 16-bit PCM at the rate you declared.

    ws.addEventListener("open", () => {
    ws.send(JSON.stringify({
    sample_rate: 16000,
    encoding: "pcm_s16le",
    language: "en",
    }));
    // Then pipe microphone chunks as binary frames.
    });
  3. Read partial and final transcripts.

    The server emits JSON messages of two kinds: { "type": "partial", "text": "..." } while a phrase is in progress, and { "type": "final", "text": "..." } when a phrase ends.

    ws.addEventListener("message", (event) => {
    const message = JSON.parse(event.data);
    if (message.type === "final") {
    console.log("final:", message.text);
    }
    });
  4. Close cleanly.

    Send { "type": "close" } when you are done so the server flushes the final phrase before closing the socket.