On the Humanity’s Last Exam benchmark, Google’s Deep Research Agent achieved a score of 46.4%, surpassing OpenAI’s GPT-5 Pro, which scored 38.9%. Google introduced the Gemini Deep Research Agent, built on the Gemini 3 Pro model using the Interactions API, enabling developers to seamlessly integrate autonomous research capabilities into their applications.
By leveraging multi-step reinforcement learning for search, the agent can autonomously explore complex information landscapes with remarkable accuracy,” the company stated.
The tool conducts research in an iterative manner, beginning with query formulation, followed by reviewing sources, pinpointing information gaps, and then searching again to address them. Google noted that the latest release includes significantly enhanced web search capabilities, enabling the tool to explore websites more deeply to extract specific data.
On the Humanity’s Last Exam benchmark which evaluates AI models on expert-level reasoning and problem-solving across a wide array of academic subjects the Deep Research Agent achieved a score of 46.4%, surpassing OpenAI’s GPT-5 Pro, which scored 38.9%.
On the BrowseComp benchmark, which tests LLMs’ ability to uncover hard-to-find facts, Google’s Deep Research Agent scored 59.2%, narrowly trailing GPT-5 Pro, which scored 59.5%.
Along with the announcement, Google introduced a new benchmark, DeepSearchQA, aimed at evaluating agents’ thoroughness in web research tasks. The Gemini Deep Research Agent achieved a score of 66.1%, surpassing GPT-5 Pro, which scored 65.2%.

DeepSearchQA comprises 900 carefully crafted tasks spanning 17 fields, with each step building on the previous analysis. “Unlike conventional fact-based tests, DeepSearchQA evaluates comprehensiveness, requiring agents to produce exhaustive answer sets. This measures both research accuracy and retrieval recall,” Google explained.
Google stated that the agent will soon be accessible through Google Search, NotebookLM, Google Finance, and the Gemini app.
The API pricing aligns with the Gemini 3 Pro model: input tokens are billed at $2 per million, while output tokens cost $12 per million for prompts up to 200,000 tokens, and $18 per million for prompts exceeding that length.









