Google’s Gemini gets new eyes and a bigger brain

Google’s latest and most capable AI model, Gemini 3 Pro, has advanced capabilities when it comes to tasks that require visual understanding. In a post on the Google Blog, the company outlined how the model processes and reasons about visual information from various sources.

According to Google, the model demonstrates strong performance in several key areas. These include:

  • Document analysis: The model can interpret unstructured documents, including those with handwritten text, complex tables, and scientific notation.
  • Spatial awareness: It can identify objects and their specific locations within an image, a feature applicable to robotics or augmented reality.
  • Screen and video comprehension: Gemini 3 Pro is shown to automate computer tasks by understanding on-screen elements and can analyze videos at a high frame rate to capture fine details.

Google highlights potential applications in fields like education, medical imaging analysis, and finance.

In a separate announcement, Google also introduced a new feature for its paid subscribers. The “Gemini 3 Deep Think” mode is now available to Google AI Ultra subscribers. This mode is specifically designed to tackle complex problems in math, science, and logic. Google states that the feature uses advanced parallel reasoning to explore multiple solution paths simultaneously.

Sources: Google Blog, Google Blog

About the author

Related posts:

Stay up-to-date:

Advertisement