Apple’s new speech technology beats OpenAI’s Whisper in transcription speed tests

Apple has introduced new speech recognition technology that significantly outperforms existing transcription tools in processing speed. The company unveiled SpeechAnalyzer and SpeechTranscriber as part of its developer beta releases at WWDC.

John Voorhees from MacStories tested the new Apple framework against popular transcription apps built on OpenAI’s Whisper model. His tests used a 34-minute, 7GB video file to compare processing times across different tools.

Apple’s technology completed the transcription in just 45 seconds. MacWhisper using the Large V3 Turbo model took 1 minute and 41 seconds. VidCap required 1 minute and 55 seconds, while MacWhisper’s Large V2 model needed 3 minutes and 55 seconds.

The speed advantage represents a 2.2 times improvement over the fastest Whisper-based alternative. Voorhees noted that transcription quality remained comparable across all tested tools.

The new Apple framework processes audio and video files directly on the device rather than using cloud-based services. This approach contributes to the faster processing times while maintaining user privacy.

All tested applications showed similar challenges with proper names and brand terms like “AppStories.” The tools typically split compound words incorrectly, though this can be corrected through find-and-replace operations.

The SpeechAnalyzer and SpeechTranscriber modules work across iPhone, iPad, Mac, and Vision Pro devices. They require the latest developer beta versions of Apple’s operating systems to function.

Related posts:

Stay up-to-date: