OpenAI has released three new voice models for developers through its Realtime API. Each model focuses on a different capability: reasoning, translation, and transcription.
The first model, GPT-Realtime-2, brings GPT-5-class reasoning to live voice conversations. According to OpenAI, it can handle complex requests, manage interruptions, and call external tools while keeping a conversation going naturally.
The second model, GPT-Realtime-Translate, performs live speech translation. It supports 70 input languages and converts speech into 13 output languages in real time, keeping pace with the speaker.
The third model, GPT-Realtime-Whisper, focuses on transcription. OpenAI describes it as a low-latency streaming model that converts spoken words into text as the speaker talks, making it suitable for live captions and meeting notes.
Pricing varies by model:
- GPT-Realtime-2: $32 per million audio input tokens, $64 per million audio output tokens
- GPT-Realtime-Translate: $0.034 per minute
- GPT-Realtime-Whisper: $0.017 per minute
Developers can test all three models in OpenAI’s Playground. OpenAI says the models are intended to help developers build a new class of voice applications.
Stay up to date
AI for content creation: the latest tools, tips and trends. Every two weeks in your inbox: