New AI model combines speech recognition with privacy protection

Israeli startup aiOla has released Whisper-NER, an open-source AI model that transcribes audio while automatically masking sensitive information. As reported by Carl Franzen for VentureBeat, the model builds upon OpenAI’s Whisper framework and combines automatic speech recognition with named entity recognition to protect private data during transcription. The tool can identify and obscure sensitive details …

Read more

Meta rebuilds company strategy around open-source AI model Llama

Meta has fundamentally transformed its business strategy by focusing on Llama, its open-source artificial intelligence model. According to Sharon Goldman’s detailed report in Fortune, CEO Mark Zuckerberg made the pivotal decision to release Llama 2 as open-source in July 2023, despite internal concerns about monetization and security risks. The model has since been downloaded over …

Read more

AnyChat unifies access to multiple AI language models

AnyChat, a new development tool, enables seamless integration of multiple large language models (LLMs) through a single interface. Developer Ahsen Khaliq, machine learning growth lead at Gradio, created the platform to allow users to switch between models like ChatGPT, Google’s Gemini, Perplexity, Claude, and Meta’s LLaMA without being restricted to one provider, as reported by …

Read more

Microsoft unveils Magentic-One, an open-source framework for managing multi-agent AI systems

Microsoft has released Magentic-One, a new open-source infrastructure that enables a single AI model to manage multiple helper agents working together to complete complex, multi-step tasks in various scenarios. According to a paper by Microsoft researchers, Magentic-One is a generalist agentic system that can “fully realize the long-held vision of agentic systems that can enhance …

Read more

OmniGen: First unified model for image generation

Researchers have introduced OmniGen, the first diffusion model capable of unifying various image generation tasks within a single framework. Unlike existing models like Stable Diffusion, OmniGen does not require additional modules to handle different control conditions, according to the authors Shitao Xiao, Yueze Wang, Junjie Zhou, Huaying Yuan, et al. The model can perform text-to-image …

Read more

Hugging Face releases compact language models for smartphones and edge devices

Hugging Face has released SmolLM2, a new family of compact language models designed to run on smartphones and edge devices with limited processing power and memory. The models, released under the Apache 2.0 license, come in three sizes up to 1.7B parameters and achieve impressive performance on key benchmarks, outperforming larger models like Meta’s Llama …

Read more

Meta makes Llama AI models available for US defense applications

Meta is making its Llama AI models available to U.S. government agencies and contractors working on defense and national security applications. According to a blog post by Meta cited by TechCrunch, the company is partnering with firms like Accenture, Amazon Web Services, and Lockheed Martin to bring Llama to these entities. The move comes after …

Read more

Omnivore acquired by ElevenLabs to power new ElevenReader app

Omnivore, a reading app startup, has been acquired by ElevenLabs, an AI audio technology company, to help develop their new ElevenReader app. According to a note from Omnivore’s founders Jackson and Hongbo, the acquisition will enable them to create more accessible reading and listening experiences on a larger platform. Omnivore users are invited to create …

Read more

Amphion: open-source toolkit for audio, music and speech generation

Amphion is an open-source toolkit designed to support research and development in audio, music and speech generation. According to the project’s GitHub site, it offers unique visualizations of classic models and architectures to help junior researchers and engineers better understand them. The toolkit supports various individual generation tasks such as text-to-speech (TTS), singing voice synthesis …

Read more

Speech to text: Moonshine is fast and as accurate as OpenAI’s Whisper

Useful, an AI company focused on improving human-machine communication, has open-sourced Moonshine, a new speech-to-text model that aims to significantly reduce the latency of voice interfaces. According to Useful founder Pete Warden, Moonshine returns results 1.7 times faster than OpenAI’s Whisper model while matching or exceeding its accuracy. The model’s variable-length input window allows it …

Read more