Back to Blog
Frontend16 min readJun 2026

AI-Native UX: Streaming, Optimistic UI, and Designing for Uncertainty

LLM features break the request/response habits of the web. Here is how to stream tokens, give instant feedback with rollback, and design honestly for a model that is sometimes wrong.

AI UXStreamingOptimistic UILLM Apps
SB

Sri Balaji

Founder

On this page

Why AI features break your UX habits

You wire up an LLM endpoint, fire a request, and show a spinner. Eight seconds later a wall of text appears. It works in the demo. Then real users arrive, the spinner feels broken, people click again, costs double, and nobody trusts the answer because they cannot tell where it came from. The model is not the problem here, your interaction model is.

Who this is for

Frontend and full-stack engineers shipping their first (or fifth) LLM-powered feature. You know React. You have an [LLM API working](/blog/working-with-the-llm-api). Now you need a UI that handles latency, non-determinism, and the fact that the model is sometimes confidently wrong.

AI-native UX is the set of patterns that make slow, probabilistic, occasionally-wrong responses feel fast, trustworthy, and in the user's control. Three pillars: stream so latency disappears, give optimistic feedback so actions feel instant, and design for uncertainty so people can verify, edit, and undo.

A new mental model: thinking out loud vs the vending machine

A traditional API call is a vending machine: you pay, you wait, you either get the exact item or an error. An LLM is a person thinking out loud, they start talking before they have finished the thought, they might be wrong, and you can interrupt them.
A colleague starts answering before they have the full thought, words arriving one at a timeToken streaming over SSE, render each chunk as it arrives instead of waiting for the full response
You raise a hand and say 'stop, that is not what I meant'An abort/stop button that cancels the in-flight stream (and the billing)
They cite where they read something so you can checkShowing sources, retrieval chunks, or tool calls alongside the answer
They say 'I think it was Tuesday, but double-check'Confidence signals and edit/undo affordances instead of presenting output as fact
A vending machine: pay, wait, get the item or a flashing errorThe old request/response model, one blocking call, one final payload, a spinner in between
Stop designing AI features like vending machines. Design them like conversations.

The streaming flow, end to end

Before any code, hold the whole pipeline in your head. The browser opens one request to your own server route. Your route calls the model provider, which streams tokens back. Your route relays those tokens to the browser as Server-Sent Events, and the UI appends each chunk to the screen as it lands.

POST messagesprompttoken streamrelay chunksdata eventssetState
Chat UI

useChat hook

API Route

/api/chat

LLM Provider

streamText

SSE Stream

text/event-stream

Progressive Render

append per token

Tokens flow left to right and render progressively; nothing waits for the full answer.

  1. 1

    User submits

    The hook POSTs the full message history to your own route, never call the provider directly from the browser, that leaks your API key.

  2. 2

    Route opens a model stream

    streamText starts the provider call and returns immediately with a readable stream, not a finished string.

  3. 3

    Tokens relay as SSE

    Each model chunk is forwarded to the browser over a long-lived text/event-stream response. One HTTP request, many events.

  4. 4

    UI appends per chunk

    The hook concatenates each delta into the assistant message and re-renders, so words appear as they are generated.

  5. 5

    Stream closes

    On the final chunk the connection ends, the stop button flips back to send, and you persist the completed message.

Why SSE and not WebSockets

Streaming an answer is one-directional server-to-client text, which is exactly what SSE is built for, it is plain HTTP, auto-reconnects, and needs no extra infra. Reach for WebSockets only when you need true bidirectional, low-latency exchange. See [realtime APIs: WebSockets, SSE, and long polling](/blog/realtime-apis-websockets-sse) for the full comparison.

A streaming chat UI with the Vercel AI SDK

The Vercel AI SDK collapses the entire pipeline above into two pieces: a useChat hook on the client and a streamText call in the route. The hook manages the message list, the streaming append, the input state, and the loading flag. Here is a complete client component, including the all-important stop button.

app/chat/Chat.tsx
tsx
'use client';

import { useChat } from 'ai/react';

export function Chat() {
  const {
    messages,
    input,
    handleInputChange,
    handleSubmit,
    status,
    stop,
    reload,
    error,
  } = useChat({ api: '/api/chat' });

  const isStreaming = status === 'streaming' || status === 'submitted';

  return (
    <div className="flex flex-col gap-4">
      <div aria-live="polite" aria-atomic="false" className="space-y-3">
        {messages.map((m) => (
          <article key={m.id} data-role={m.role}>
            <span className="text-xs uppercase opacity-60">
              {m.role === 'user' ? 'You' : 'Assistant'}
            </span>
            {/* tokens append here as they arrive */}
            <p className="whitespace-pre-wrap">{m.content}</p>
          </article>
        ))}
      </div>

      {error && (
        <p role="alert" className="text-red-400">
          Something went wrong. <button onClick={() => reload()}>Retry</button>
        </p>
      )}

      <form onSubmit={handleSubmit} className="flex gap-2">
        <input
          value={input}
          onChange={handleInputChange}
          placeholder="Ask anything…"
          className="flex-1 rounded border bg-transparent px-3 py-2"
        />
        {isStreaming ? (
          <button type="button" onClick={stop} className="rounded bg-red-600 px-4">
            Stop
          </button>
        ) : (
          <button type="submit" className="rounded bg-amber-600 px-4">
            Send
          </button>
        )}
      </form>
    </div>
  );
}

The route is where tokens are born. streamText kicks off the provider call and hands back a stream you turn straight into a streaming HTTP response. Note it is an Edge-friendly handler returning a Response, not JSON.

app/api/chat/route.ts
typescript
import { streamText } from 'ai';
import { openai } from '@ai-sdk/openai';

export const runtime = 'edge';
export const maxDuration = 30;

export async function POST(req: Request) {
  const { messages } = await req.json();

  const result = streamText({
    model: openai('gpt-4o-mini'),
    system: 'You are concise. If unsure, say so and suggest how to verify.',
    messages,
    // abort propagates: if the client disconnects, stop billing
    abortSignal: req.signal,
  });

  // relays tokens to the browser as an SSE-style stream
  return result.toDataStreamResponse();
}

The system prompt is part of your UX

Telling the model to admit uncertainty ("if unsure, say so") changes what the user sees. Pair it with the security posture from [prompt injection and LLM security](/blog/prompt-injection-and-llm-security), never trust message content as instructions, and never render model output as raw HTML.

Optimistic UI with rollback

Streaming hides latency for the answer. Optimistic UI hides it for the side effects, saving a generated draft, applying an AI edit, accepting a suggestion. The pattern: update local state immediately as if the server already said yes, fire the request in the background, and if it fails, roll the state back and tell the user.

app/notes/useOptimisticSave.ts
tsx
'use client';

import { useState } from 'react';

type Note = { id: string; text: string; aiGenerated: boolean };

export function useOptimisticNotes(initial: Note[]) {
  const [notes, setNotes] = useState<Note[]>(initial);
  const [errorId, setErrorId] = useState<string | null>(null);

  async function acceptSuggestion(note: Note) {
    const snapshot = notes; // keep a rollback point
    setErrorId(null);

    // 1. optimistic: show it instantly
    setNotes((prev) => [...prev, note]);

    try {
      const res = await fetch('/api/notes', {
        method: 'POST',
        body: JSON.stringify(note),
      });
      if (!res.ok) throw new Error('save failed');

      // 2. reconcile with the server's canonical record
      const saved: Note = await res.json();
      setNotes((prev) => prev.map((n) => (n.id === note.id ? saved : n)));
    } catch {
      // 3. rollback + surface the failure
      setNotes(snapshot);
      setErrorId(note.id);
    }
  }

  return { notes, errorId, acceptSuggestion };
}

Always keep the snapshot

The single most common optimistic-UI bug is mutating state in place so there is nothing to roll back to. Capture the pre-change value first, then update. React 19's `useOptimistic` hook automates the snapshot/rollback dance when the mutation lives in a transition.

Traditional UX vs AI-native UX

The shift is not cosmetic. Four assumptions baked into classic web UX break the moment an LLM is in the loop.

DimensionTraditionalAI-native
LatencySub-second; a spinner is fineSeconds; stream tokens, never block
DeterminismSame input, same outputSame input, varying output, design for variance
Errors4xx/5xx with clear statesPlausible-but-wrong; needs verification UI
TrustImplicit; the system is rightEarned; show sources, allow edit/undo
What changes when the response is slow, probabilistic, and fallible.

Designing for uncertainty

The hardest part is not technical. A model that is wrong 5% of the time but presented as authoritative erodes trust faster than one that is wrong 15% of the time but honest about it. Your UI is where that honesty lives.

Three rules for honest AI UX

**Show your work**, surface the sources, retrieval chunks, or tool calls behind an answer so users can verify instead of believing. **Make it editable**, every generated artifact should be tweakable and undoable; AI output is a draft, not a verdict. **Never hide that it is AI**, label generated content clearly; disguising the model as a human is both a trust and a compliance risk.

Structured output deserves special paranoia. When you ask the model for JSON to drive your UI, treat that JSON as hostile input: it can be truncated mid-stream, contain extra prose, or simply omit a required field. Validate before you render.

lib/parseStructured.ts
typescript
import { z } from 'zod';

const Suggestion = z.object({
  title: z.string().min(1),
  confidence: z.number().min(0).max(1),
});

// model output is untrusted: never assume the shape
export function parseSuggestion(raw: string) {
  try {
    const json = JSON.parse(raw);
    const result = Suggestion.safeParse(json);
    if (!result.success) {
      return { ok: false as const, reason: 'invalid shape' };
    }
    return { ok: true as const, value: result.data };
  } catch {
    // partial / malformed during stream, show a 'still thinking' state
    return { ok: false as const, reason: 'incomplete' };
  }
}
  • Render a low-confidence result with a visible badge, not silently as fact.
  • If parsing fails mid-stream, keep the skeleton, do not flash an error for a response that is simply not finished.
  • Give the user a one-click way to regenerate when the structured output is unusable.

Accessibility of streaming content

Text that appears a token at a time is invisible to screen-reader users unless you tell assistive tech to announce it. The tool is aria-live, but the wrong setting is worse than none, assertive on a streaming region interrupts the user on every single token.

Use aria-live="polite", not assertive

Wrap the message list in `aria-live="polite"` with `aria-atomic="false"` so the screen reader queues updates and reads new text after the current utterance, instead of stuttering on every chunk. Announce the start of generation ("Assistant is responding") and completion so users know the boundaries. Keep the stop button reachable and labelled by keyboard at all times.

Cancellation and cost

Every streamed token is a billed token. If a user navigates away or hits stop, an un-cancelled stream keeps generating, and keeps charging you, for the full max-tokens budget. Wiring up abort is both a UX feature and a cost control.

app/chat/useAbortableStream.ts
tsx
'use client';

import { useRef } from 'react';

export function useAbortableStream() {
  const controllerRef = useRef<AbortController | null>(null);

  async function run(prompt: string, onToken: (t: string) => void) {
    controllerRef.current?.abort(); // cancel any prior stream
    const controller = new AbortController();
    controllerRef.current = controller;

    const res = await fetch('/api/chat', {
      method: 'POST',
      body: JSON.stringify({ prompt }),
      signal: controller.signal, // <- propagates the abort to the server
    });

    const reader = res.body!.getReader();
    const decoder = new TextDecoder();

    try {
      while (true) {
        const { done, value } = await reader.read();
        if (done) break;
        onToken(decoder.decode(value, { stream: true }));
      }
    } catch (err) {
      if ((err as Error).name === 'AbortError') return; // expected
      throw err;
    }
  }

  function stop() {
    controllerRef.current?.abort();
  }

  return { run, stop };
}

Abort must reach the provider

Cancelling the browser fetch only stops rendering, the model keeps generating server-side unless you forward the signal. Pass `req.signal` into `streamText` (as in the route above) so the abort propagates all the way to the provider and the meter stops.

Common mistakes that cost hours (and dollars)

  1. A blocking spinner instead of streaming. An 8-second spinner reads as broken; the same 8 seconds with tokens flowing reads as fast. Stream from day one, retrofitting it later means rewriting the data layer.
  2. Trusting structured output blindly. JSON.parse(modelOutput) will crash in production on a truncated or chatty response. Validate with a schema and handle the partial-stream case explicitly.
  3. No stop or cancel. Users feel trapped watching a wrong answer generate, click again out of frustration, and you pay for two full completions. A stop button that aborts the server stream fixes the UX and the bill at once.
  4. Presenting output as fact. No sources, no edit, no confidence, the first hallucination the user catches destroys trust in the whole feature.
  5. aria-live="assertive" on the stream. It interrupts screen-reader users on every token. Use polite, and announce start and end.

Takeaways

The whole article in seven lines

  • LLM features are conversations, not vending machines, slow, probabilistic, sometimes wrong.
  • Stream tokens over SSE so latency disappears; a blocking spinner is a UX bug.
  • The Vercel AI SDK reduces it to `useChat` on the client and `streamText` in the route.
  • Always ship a stop button, and propagate the abort to the provider to stop the bill.
  • Use optimistic UI with a captured snapshot so failures roll back cleanly.
  • Design for uncertainty: show sources, allow edit/undo, never hide that it is AI.
  • Validate structured output as hostile input, and announce streams politely for screen readers.

Where to go next

AI-native UX sits on top of three layers worth deepening. Get the transport right, get the API call right, and lock down the security boundary, the front-end patterns here only shine when those are solid.

Want to go deeper?

This article covers concepts taught hands-on in the Cloud Engineer and DevOps career paths, with real terminal labs, production scenarios, and structured lessons.