A diffusion-powered research agent that beats rivals
Google unveiled Test-Time Diffusion Deep Researcher (TTD-DR), an ai research agent that emulates how humans draft, search, and revise to produce long-form, coherent reports. Built on diffusion and evolutionary algorithms, it consistently outperformed OpenAI, Perplexity, and Grok deep research systems across industry-relevant benchmarks. The framework targets enterprise use cases where standard rag falters, from competitive analysis to market entry planning.
Points clés
- Google introduced Test-Time Diffusion Deep Researcher (TTD-DR), an ai framework inspired by human drafting and iterative revision.
- The agent replaces linear pipelines with a diffusion-style process that starts from a “noisy” draft and progressively refines it.
- Two core mechanisms drive the system: “Denoising with Retrieval” (iterative search-informed revisions) and “Self-Evolution” (component-level evolutionary optimization).
- TTD-DR was built using Google’s Agent Development Kit (ADK) with Gemini 2.5 Pro as the core llm, though models are swappable.
- The system targets enterprise research tasks beyond standard rag, including competitive analyses and market entry reports across finance, biomedical, recreation, and technology.
- Benchmarks included DeepConsult and the LongForm Research dataset for reports, plus HLE (Humanity’s Last Exam) and GAIA for multi-hop reasoning.
- In long-form head-to-head tests versus OpenAI Deep Research, TTD-DR achieved win rates of 69.1% and 74.5% on two datasets.
- On multi-hop benchmarks, it surpassed OpenAI’s system by 4.8%, 7.7%, and 1.7% depending on the task.
- Comparators included OpenAI Deep Research, Perplexity Deep Research, Grok DeepSearch, and the open-source GPT-Researcher.
- Google research scientist Rujun Han emphasized that component-level evolution improves denoising effectiveness, fluency, and coherence—key “helpfulness” criteria.
À retenir
Thinking of upgrading your research game? Start where TTD-DR shines: high-stakes reports that need depth, citations, and structure—not “ctrl+c from the internet.” Pilot it on one or two business-critical analyses, wire it into your existing tools via ADK, and compare outputs against your current stack. And yes, keep an eye on the benchmarks; it’s nicer to claim you picked the winner because of data, not vibes—your boss will definitely pretend to notice.
Sources





