
Best of your X follows: June 15
Today: Mollick surfaces a paper showing frontier LLMs outperform dedicated clinical AI tools; LeCun amplifies enterprise pushback against Anthropic's new prompt-retention policy; Anthropic explains why biology agents lag coding agents and how a deterministic retrieval layer fixes it; Google DeepMind launches a European Robotics Accelerator with 15 startups; and Project Genie expands globally to Ultra subscribers.

Today's digest covers posts from June 12 UTC.
Quick scan
| Who | What | Cluster |
|---|---|---|
| emollick | Frontier LLMs beat clinical AI tools in head-to-head study | AI tools / research |
| ylecun | Anthropic's new prompt-retention policy draws enterprise pushback | Society / ethics |
| AnthropicAI | Why AI advanced faster in coding than biology — and how to fix it | Research |
| GoogleDeepMind | Robotics Accelerator launches with 15 European startups | Open source / community |
| GoogleDeepMind | Project Genie expands to Google AI Ultra 5X subscribers globally | Model releases |
AI tools / research
General-purpose LLMs beat dedicated clinical AI tools
Ethan Mollick flagged a paper comparing frontier LLMs against OpenEvidence — a clinical AI tool built specifically for physicians — across three evaluation sets. The paper's finding: frontier models won all three. OpenEvidence performed roughly on par with auto-enabled Google Search AI Overview. 1
The implication isn't that clinical AI tools are useless. It's that the moat built on "domain-specific tuning" narrows quickly when the general frontier keeps advancing. A tool that beat GPT-4 in 2023 may be outpaced by Claude Fable 5 in 2026 without a single update to its own model.
コンテンツカードを読み込んでいます…
Research
Why biology fell behind software — and a fix
Anthropic's science blog published an essay by Laura Luebbert arguing that the gap between coding agents and biology agents isn't mainly about reasoning capability — it's about infrastructure. 2
The essay's analogy: biological databases are like cities built before cars. The streets weren't designed for modern traffic. Software infrastructure, by contrast, was basically built for agents from the start — version control, documented APIs, machine-readable outputs.
To test the gap concretely, Luebbert's team ran four AI systems (Claude, Biomni OSS, Edison Analysis, GPT) against VirBench, a benchmark of 120 viral-sequence retrieval tasks from NCBI Virus. Results without any extra tooling: accuracy ranged from 16.9% to 91.3%, and the same model often returned wildly different answers on identical queries run three times. In one run, Sonnet 4 retrieved 106 sequences for a query where the correct answer was 266; the next run returned 15; the run after that, 5.
The team then added gget virus — a deterministic retrieval layer that standardizes how agents query NCBI Virus across its multiple underlying APIs. With it, accuracy climbed above 90% for all models, peaking at 99.7% for GPT-5.5. Run-to-run variability nearly disappeared.
The kicker: adding the deterministic layer made model choice much less important. Cheaper models paired with the right tool matched or beat expensive models without it.

Society / ethics
Anthropic's prompt-retention policy draws enterprise pushback
Yann LeCun amplified a tweet from Naveen Rao (VP of AI at Databricks) calling Anthropic's reported new policy of retaining user prompts and usage data a "red line" for enterprise customers. 3
Rao's tweet read: "My team loves Claude from @AnthropicAI. But this new policy of retaining prompts and usage is a red line... we simply can't." The comment drew attention because Databricks is one of Anthropic's larger enterprise partners.
Enterprises running sensitive internal workflows — code review, legal analysis, customer data pipelines — treat prompt confidentiality as a non-negotiable. If Anthropic has changed how it handles that data, this is a trust issue that goes beyond pricing or capability.
コンテンツカードを読み込んでいます…
Open source / community
Google DeepMind launches European Robotics Accelerator
Google DeepMind announced its Robotics Accelerator has launched with 15 startups focused on physical AI in Europe. 4 The three-month program gives participants access to DeepMind's AI stack and Gemini Robotics models, plus direct support from DeepMind's teams.
The selection of 15 companies is notable not just for the number but for the geography. Most major AI hardware and robotics accelerators have been US-centric. A DeepMind-backed cohort based in Europe, with access to Gemini Robotics, shifts some of the applied robotics R&D pipeline west of the Atlantic.
Model releases
Project Genie goes global for Google AI Ultra 5X subscribers
Google DeepMind confirmed that Project Genie — Google Labs' world-model and interactive environment generator — is now expanding access to Google AI Ultra 5X subscribers globally, not just in selected markets. 5
Project Genie (first announced in 2024 as a model capable of generating interactive 2D environments from a single image) has been under limited access since its initial rollout. The Ultra 5X subscriber tier is Google's highest consumer AI subscription. Opening it there signals DeepMind is moving Genie toward a production offering rather than a pure research demo.
コンテンツカードを読み込んでいます…
このコンテンツについて、さらに観点や背景を補足しましょう。