Web Analytics
Real-Time AI Voice with Azure OpenAI GPT-4o & Node.js (TypeScript) -->

Real-Time AI Voice with Azure OpenAI GPT-4o & Node.js (TypeScript)

Article to build real-time application using Microsoft Azure OpenAI's real-time API for text and audio output.

Microsoft recently added support for real-time audio output via the gpt-4o-mini-realtime-preview model. In this tutorial, you'll learn how to use it from a Node.js TypeScript app, send a prompt, and play back the AI-generated speech in real-time.

🛠️ Prerequisites

  • Node.js v18+ (I'm using v22.15.0)

  • TypeScript installed globally

  • Azure OpenAI resource with gpt-4o-mini-realtime-preview deployed

  • .env file with:

    AZURE_OPENAI_ENDPOINT=<Endpoint for your Azure OpenAI resource>
    AZURE_OPENAI_API_KEY=<Key to your API> AZURE_OPENAI_DEPLOYMENT_NAME=gpt-4o-mini-realtime-preview OPENAI_API_VERSION=<API version>

📦 Setup

  1. Project Initialization

mkdir azure-gpt4o-audio && cd azure-gpt4o-audio npm init -y
npm pkg set type=module npm install openai @azure/identity dotenv speaker

Your package.json file might look something like this
  1. Environment Config

Create a .env file:

AZURE_OPENAI_ENDPOINT=<Endpoint for your Azure OpenAI resource>
AZURE_OPENAI_API_KEY=<Key to your API> AZURE_OPENAI_DEPLOYMENT_NAME=gpt-4o-mini-realtime-preview OPENAI_API_VERSION=<API version>

🎧 TypeScript Code


▶️ Run Your App

You can run the application using the npm run start shorthand that I've got configured in my package.json. It transpiles the Typescript into a JavaScript file and runs the output JS.

Your speaker will now play back the AI’s voice response in real time 🎧.

📌 Final Thoughts

Once you've built this application, read through the API reference and try to build on this by adding more features (maybe a 'Stop' button to stop the audio response?).

You may like these posts

  1. To insert a code use <i rel="pre">code_here</i>
  2. To insert a quote use <b rel="quote">your_qoute</b>
  3. To insert a picture use <i rel="image">url_image_here</i>