Web Analytics
Real-Time AI Voice with Azure OpenAI GPT-4o & Node.js (TypeScript) -->

Real-Time AI Voice with Azure OpenAI GPT-4o & Node.js (TypeScript)

Article to build real-time application using Microsoft Azure OpenAI's real-time API for text and audio output.

Microsoft recently added support for real-time audio output via the gpt-4o-mini-realtime-preview model. In this tutorial, you'll learn how to use it from a Node.js TypeScript app, send a prompt, and play back the AI-generated speech in real-time.

Prerequisites

  • Node.js v18+ (I'm using v22.15.0)

  • TypeScript installed globally

  • Azure OpenAI resource with gpt-4o-mini-realtime-preview deployed

  • .env file with:

    AZURE_OPENAI_ENDPOINT=<Endpoint for your Azure OpenAI resource>
    AZURE_OPENAI_API_KEY=<Key to your API> AZURE_OPENAI_DEPLOYMENT_NAME=gpt-4o-mini-realtime-preview OPENAI_API_VERSION=<API version>

Setup

  1. Project Initialization

mkdir azure-gpt4o-audio && cd azure-gpt4o-audio npm init -y
npm pkg set type=module npm install openai @azure/identity dotenv speaker

Your package.json file might look something like this
  1. Environment Config

Create a .env file:

AZURE_OPENAI_ENDPOINT=<Endpoint for your Azure OpenAI resource>
AZURE_OPENAI_API_KEY=<Key to your API> AZURE_OPENAI_DEPLOYMENT_NAME=gpt-4o-mini-realtime-preview OPENAI_API_VERSION=<API version>

TypeScript Code


Run Your App

You can run the application using the npm run start shorthand that I've got configured in my package.json. It transpiles the Typescript into a JavaScript file and runs the output JS.

Your speaker will now play back the AI’s voice response in real time.

Final Thoughts

Once you've built this application, read through the API reference and try to build on this by adding more features (maybe a 'Stop' button to stop the audio response?).

You may like these posts

  1. To insert a code use <i rel="pre">code_here</i>
  2. To insert a quote use <b rel="quote">your_qoute</b>
  3. To insert a picture use <i rel="image">url_image_here</i>