Ai Coding

How to Stream Ai Responses from OpenAi API in Next.js

4 Min Read

17 0

Updated on July 23, 2025

What Does Streaming Ai Responses Mean?

When you interact with AI models like ChatGPT, responses can be delivered in two ways:

Table Of Content

What Does Streaming Ai Responses Mean?
Should You Use Streaming for Your AI App?
Setting Up a Basic AI API Route (Without Streaming)
Step 1: Enable Streaming
Step 2: Process the Streamed Data
Full Code: AI Streaming in Next.js
Key Differences Between Standard and Streaming Responses
Final Thoughts: When to Use AI Streaming
Summary: Steps to Add AI Streaming in Next.js

Standard Response: The AI generates the entire response and sends it all at once.
Streaming Response: The AI sends small parts of the response as they are generated.

Streaming improves the user experience by making AI feel more responsive. Instead of waiting for a full answer, users see words appear in real time—similar to how a YouTube video loads while playing rather than waiting for the whole file to download.

Should You Use Streaming for Your AI App?

Streaming is a feature, not a requirement.

Think of it like pizza delivery. If you order a pizza, you usually expect it all at once—this is like a standard response. But imagine if the delivery guy showed up with one slice at a time instead of the full pizza. This might make sense in a restaurant where food is served course by course rather than all at once, but in many cases, it would be unnecessary and even annoying to have your pizza delivered one slice at a time. Streaming works similarly—it’s great when you want immediate feedback, but sometimes, waiting for the whole response is more practical.

Streaming is useful when:

You want responses to appear instantly, such as in chatbots or AI assistants.
The response is long, like an AI-generated article or code explanation.

However, streaming may not be necessary if:

You need the full response before processing it, such as in form validation.
The response is short and waiting a few milliseconds won’t affect the user experience.

Now, let’s walk through implementing AI streaming in Next.js step by step.

Setting Up a Basic AI API Route (Without Streaming)

We’ll start with a simple function that waits for the full AI response before sending it back. This is the default behavior.

import { OpenAI } from "openai";
const openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });
export async function POST(req) {
  const { prompt } = await req.json();
  if (!prompt) {
    return new Response("Prompt is required", { status: 400 });
  }
  try {
    // Request AI response (no streaming)
    const response = await openai.chat.completions.create({
      model: "gpt-4",
      messages: [{ role: "user", content: prompt }],
    });
    // Send full response after it's complete
    return new Response(JSON.stringify(response.choices[0].message), {
      headers: { "Content-Type": "application/json" },
    });
  } catch (error) {
    return new Response("Error fetching AI response", { status: 500 });
  }
}

Step 1: Enable Streaming

To enable streaming, we add stream: true when making the OpenAI request.

const response = await openai.chat.completions.create({
  model: "gpt-4",
  messages: [{ role: "user", content: prompt }],
  stream: true, // Enable streaming
});

Step 2: Process the Streamed Data

Instead of waiting for the full response, we process AI output chunk by chunk.

const encoder = new TextEncoder(); // Converts text into stream format
return new Response(
  new ReadableStream({
    async start(controller) {
      for await (const chunk of response) {
        const text = chunk.choices[0]?.delta?.content || ""; // Extract text from chunk
        controller.enqueue(encoder.encode(text)); // Send chunk immediately
      }
      controller.close(); // Close stream when done
    },
  }),
  {
    headers: { "Content-Type": "text/plain; charset=utf-8" },
  }
);

Full Code: AI Streaming in Next.js

Here’s the complete implementation of an API route with streaming enabled.

import { OpenAI } from "openai";
const openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });
export async function POST(req) {
  const { prompt } = await req.json();
  if (!prompt) {
    return new Response("Prompt is required", { status: 400 });
  }
  try {
    // Step 1: Call OpenAI with streaming enabled
    const response = await openai.chat.completions.create({
      model: "gpt-4",
      messages: [{ role: "user", content: prompt }],
      stream: true, // Enables streaming
    });
    // Step 2: Create a stream to send response in real-time
    const encoder = new TextEncoder();
    return new Response(
      new ReadableStream({
        async start(controller) {
          // Step 3: Process AI output in chunks
          for await (const chunk of response) {
            const text = chunk.choices[0]?.delta?.content || "";
            controller.enqueue(encoder.encode(text)); // Send chunk immediately
          }
          controller.close(); // Close stream when done
        },
      }),
      {
        headers: { "Content-Type": "text/plain; charset=utf-8" },
      }
    );
  } catch (error) {
    return new Response("Error fetching AI response", { status: 500 });
  }
}

Key Differences Between Standard and Streaming Responses

Feature	Without Streaming	With Streaming
OpenAI Request	`stream: false` (default)	`stream: true`
Response Type	Full response at once	Streamed response in chunks
Processing Time	Waits until AI finishes	Updates as AI generates
Code Complexity	Simpler	Uses a `ReadableStream`

Final Thoughts: When to Use AI Streaming

Use Streaming When:

You’re building an AI chatbot or assistant that needs real-time responses.
AI-generated content is long, and users benefit from seeing partial results.
You want a faster, more engaging user experience.

Avoid Streaming When:

You need the full response to process it before showing anything (e.g., data analysis, summarization).
The AI response is short and doesn’t impact user experience.
Your application doesn’t support handling streams properly.

Summary: Steps to Add AI Streaming in Next.js

Streaming AI responses allows data to be sent in smaller parts instead of waiting for the full response to be generated. This enhances real-time experiences like chatbots or AI assistants. However, it is a feature, not a requirement, and should be used when real-time feedback improves user experience.

Here’s how to implement streaming in Next.js:

Enable streaming in OpenAI’s API call by setting stream: true. This allows the AI to generate responses token-by-token.
Create a stream to send responses in real-time, using a ReadableStream.
Convert text into a stream-friendly format with TextEncoder().
Extract text from each AI-generated chunk as it arrives.
Send each chunk immediately to ensure real-time updates on the frontend.
Close the stream when complete to signal that the response has finished.

By integrating streaming into your Next.js application, you can significantly enhance user experience where it matters most. Whether you’re building an AI assistant or just exploring AI capabilities, understanding when and how to use streaming gives you the flexibility to create more dynamic applications.