Google integrates computer control into Gemini 3.5 Flash, enabling the model to interact with screens
The update allows developers to build agents that can see and act across digital environments. The feature is available via the Gemini API and Enterprise Agent Platform.
Google has introduced a new capability in Gemini 3.5 Flash, allowing the model to directly interact with and control computer screens. This integration enables the model to see, reason, and take action across various digital environments, including browsers, mobile, and desktop platforms. The update marks a significant step in the evolution of AI agents, as it allows them to perform tasks that require visual and interactive engagement with software interfaces.
Previously, computer use was available only as a standalone model, Gemini 2.5 computer use. Now, this functionality is natively integrated into the main Gemini Flash model. This integration enhances the model's ability to perform complex tasks, such as continuous software testing and knowledge work across professional applications. The feature is already supported in the Gemini API and the Gemini Enterprise Agent Platform, making it accessible to developers and enterprises.
The integration of computer use into Gemini 3.5 Flash is expected to improve performance for long-horizon and enterprise automation tasks. By allowing the model to analyze and interact with software interfaces, developers can create more sophisticated agents capable of handling a wide range of tasks. This advancement is likely to influence the development of AI-powered tools and applications, as it expands the model's capabilities beyond traditional function calling and tool use.
The integration of computer use into Gemini 3.5 Flash may have broader implications for the AI industry. It could lead to increased reliance on such models for automation and task execution, potentially affecting the cost and complexity of developing AI-driven applications. Additionally, the feature may raise questions about governance, security, and vendor lock-in as organizations adopt these capabilities. Market reactions will likely depend on how effectively the model's new features are implemented and utilized.
As the technology matures, the impact of this integration will become more apparent. Developers and enterprises will need to evaluate how best to leverage the new capabilities while managing potential challenges related to integration, security, and scalability. The success of this feature will depend on its reliability, performance, and the extent to which it can be adapted to meet the needs of different industries and use cases.
Sources
- https://blog.google/innovation-and-ai/models-and-research/gemini-models/introducing-computer-use-gemini-3-5-flash/
- https://deepmind.google/blog/introducing-computer-use-in-gemini-3-5-flash/
- https://the-decoder.com/google-bakes-computer-control-directly-into-gemini-3-5-flash-letting-the-model-see-and-operate-your-screen/