Google DeepMind has announced a new feature for its Gemini 3.5 Flash model: 'computer use.' This capability enables the AI to directly manipulate graphical user interfaces—clicking buttons, filling forms, navigating menus—as a human user would.
The move marks a significant step beyond text-based or API-only AI agents. By training the model to interpret and act on pixel-level screen information, DeepMind positions Gemini to automate complex, multi-step workflows on existing software without requiring custom integrations.
For enterprises, this could radically simplify automation of legacy systems and standard desktop tasks. DeepMind has not yet detailed pricing, API availability, or public access timelines for the computer use feature.
The announcement intensifies the race in agentic AI, where Anthropic, OpenAI, and others have also experimented with desktop control. DeepMind’s approach focuses on safety layers that monitor and restrict actions to prevent misuse.
Early developer reactions highlight the raw potential but raise concerns about reliability—models may misinterpret screen elements or fail on edge cases. DeepMind acknowledges these limitations and promises iterative improvements.