Thinking Machines, the AI startup co-founded by former OpenAI CTO Mira Murati, has announced a research preview of what it calls “interaction models” — AI systems designed to perceive and respond in real time rather than waiting for a user to finish speaking or typing.
Current AI models work in turns: the user sends an input, the model processes it, then responds. Thinking Machines argues this structure limits collaboration, because real work often requires ongoing feedback and correction. Their new approach processes 200-millisecond chunks of audio, video, and text simultaneously, allowing the model to listen and respond at the same time.
The system behind the preview is called TML-Interaction-Small. It is a 276-billion parameter model, though only 12 billion parameters are active at any time. It uses two components working together:
- An interaction model that stays in constant exchange with the user
- A background model that handles complex reasoning and tasks asynchronously, feeding results back into the conversation
On the benchmark FD-bench, which measures interaction quality, TML-Interaction-Small scored 77.8, compared to 46.8 for GPT-realtime-2.0 and 54.3 for Gemini-3.1-flash-live. Its turn-taking latency was 0.40 seconds, faster than both competing systems tested.
The model also demonstrated visual awareness, responding to on-screen events without being prompted — a capability current real-time systems lack, according to Thinking Machines.
The preview is not yet publicly available. The company plans a limited research release before a wider rollout.
Sources: Thinking Machines Blog, VentureBeat
Stay up to date
AI for content creation: the latest tools, tips and trends. Every two weeks in your inbox: