Google Integrates Computer Use Capability into Gemini 3.5 Flash

Google has directly integrated the 'Computer Use' feature into its AI model 'Gemini 3.5 Flash,' enabling autonomous control of computers, browsers, and mobile devices. It achieved a score of 78.4 on the 'OSWorld' benchmark for PC operation evaluation, which is considered equivalent to OpenAI's GPT-4.5. Developers can access this functionality through the Gemini API, with expected applications in software testing automation and office workflow automation.

Google has announced the direct integration of the 'Computer Use' feature into its AI model 'Gemini 3.5 Flash.' This enables the model to autonomously operate personal computers, browsers, and mobile devices while recognizing their screens.

'Computer Use' refers to the capability of AI to operate computers on behalf of humans. For example, AI can independently judge and execute tasks such as opening a browser to fill out a form or launching an application to perform specific operations sequentially. Previously, such functionalities required dedicated software or complex configurations, but now the ability is embedded directly within the model itself.

The approach of AI autonomously operating computers has become a focal point across the industry in recent years as 'AI agents.' Major AI companies including OpenAI and Anthropic are advancing similar feature development, and Google's direct integration of this functionality into its model represents a strategic move within this competitive landscape.

In terms of performance, the model achieved a score of 78.4 on the 'OSWorld' benchmark, which is widely used as an evaluation metric for PC operation tasks. This score is considered equivalent to OpenAI's GPT-4.5. OSWorld is a benchmark that measures how accurately AI can operate on real operating systems, with higher scores indicating greater reliability in practical scenarios.

Developers can access this feature through Google's API (application programming interface) called 'Gemini API.' Anticipated use cases include software testing automation and office workflow automation, with expectations for adoption in developing tools that enhance corporate operational efficiency.

The significance of this feature integration extends beyond mere performance improvement. By enabling AI to independently view screens and complete operations within a single model, an environment is being established where AI agents can assume many of the routine PC tasks that previously required human labor. Should developers be able to more readily construct agents, integration into corporate systems may accelerate.

A key point of interest going forward is how stably the system operates in actual business environments. Since benchmark metrics do not necessarily align with real-world operational reliability, validation under various conditions is crucial. The approach of providing this through the Gemini API creates an environment that facilitates experimentation by many developers, potentially serving as a foundation for broader adoption.

#GenerativeAI#AIAgent#Gemini#Google#WorkflowAutomation#ComputerUse#LLM

AI issue Staff

This article is an original work independently written and edited by the AI issue editorial team based on factual reporting. © AI issue. Unauthorized reproduction, redistribution, or use for AI training is prohibited.

Google Integrates Computer Use Capability into Gemini 3.5 Flash

Comments