OpenAI adds advanced reasoning, translation, and transcription to its voice API

OpenAI has released three new voice models for developers through its Realtime API. Each model focuses on a different capability: reasoning, translation, and transcription.

The first model, GPT-Realtime-2, brings GPT-5-class reasoning to live voice conversations. According to OpenAI, it can handle complex requests, manage interruptions, and call external tools while keeping a conversation going naturally.

The second model, GPT-Realtime-Translate, performs live speech translation. It supports 70 input languages and converts speech into 13 output languages in real time, keeping pace with the speaker.

The third model, GPT-Realtime-Whisper, focuses on transcription. OpenAI describes it as a low-latency streaming model that converts spoken words into text as the speaker talks, making it suitable for live captions and meeting notes.

Pricing varies by model:

  • GPT-Realtime-2: $32 per million audio input tokens, $64 per million audio output tokens
  • GPT-Realtime-Translate: $0.034 per minute
  • GPT-Realtime-Whisper: $0.017 per minute

Developers can test all three models in OpenAI’s Playground. OpenAI says the models are intended to help developers build a new class of voice applications.

Sources: OpenAI, 9to5Mac

Stay up to date

AI for content creation: the latest tools, tips and trends. Every two weeks in your inbox:

More info …

About the author

Related posts:

Advertisement

×