The magic wand for sound arrives with Meta’s latest AI model

Meta Platforms is introducing a new artificial intelligence model called SAM Audio that simplifies sound editing through simple prompts. This tool allows users to isolate or remove specific sounds from complex recordings with ease. Mike Wheatley reports for Silicon Angle that the model is now available through the Segment Anything Playground.

The technology functions similarly to previous Meta tools for image and video editing. It enables creators to separate tracks like vocals or instruments by typing a command. For example, a podcaster can eliminate background traffic noise or a barking dog by using natural language. The system supports three types of prompts: text descriptions, visual clicks on objects in a video, and specific time segments.

At the heart of the model is the Perception Encoder Audiovisual engine. This engine identifies sounds described by the user and slices them out without damaging the rest of the audio file. Meta suggests the tool will be useful for music production, film, and scientific research.

The company is also exploring practical applications for accessibility. It is collaborating with Starkey Laboratories to enhance hearing aids and working with 2gether-International to support disabled founders. While SAM Audio sets a new benchmark in its field, it still has limitations. It currently cannot process audio-based prompts and sometimes struggles to separate very similar sounds, such as an individual singer within a large choir. Despite these challenges, the model operates faster than real time and handles large datasets efficiently.

Stay up to date

Related posts: