How to stream model responses

LLMs can stream completions as generated, which helps visualize tokens before the response is complete. This helps improve the user experience of those interacting with the LLM since it reduces idle time waiting for an answer.

The following LLMs support streaming out of the box:

  1. OpenAI

  2. Anthropic

  3. Replicate

Inside Stack AI, you can enable streaming in your LLMs and get a streamed response every time you want to fetch a response for your interface. For that, you can use libraries like fetch-event-source to read the following endpoint:

https://www.stack-inference.com/stream_exported_flow?flow_id='YOUR FLOW ID'&org='YOUR Organization'

This endpoint has the following properties:

  1. Needs to be signed with your public API key in the authorization.

  2. Receives a body with a JSON structure containing the value for each input. Example:

body = {'in-0': '<Value of input 0', 'in-1': ..., 'in-2': ...}

The endpoint will return error messages if the flow fails to execute. See the example below:

import { fetchEventSource } from '@microsoft/fetch-event-source';

const streamDataFromStack = () => {
    const inputsAPI = { 'in-0': 'value_0', 'in-1': 'value_1', user_id: user_id }
    var outputs = null

    await fetchEventSource(`https://www.stack-inference.com/stream_exported_flow?flow_id=${flow_id}&org=${org}`, {
        method: "POST",
        headers: {
            "Content-Type": "application/json",
            "Authorization": `Bearer ${PUBLIC_API_KEY}`,
            "Accept": "text/event-stream, text/plain",
        },
        body: JSON.stringify(inputsAPI),
        signal: signal,
        openWhenHidden: true,
        async onopen(res) {
            if (res.ok && res.status === 200) {
            }
            else if (
                res.status >= 400 &&
                res.status < 500 &&
                res.status !== 429
            ) {
                console.error("Client side error ", res);
            }
        },
        onmessage(event) {
            outputs = JSON.parse(event.data);
            if (data.error) {
                controller.current.abort();
                return;
            }
        },
        onclose() {
        },
        onerror(err) {
            throw err;    
        },
    }).catch((err) => {
        console.error("There was an error from server", err);
    }
}

Last updated