Best of your X follows: June 15

Today's digest covers posts from June 12 UTC.

Quick scan

Who	What	Cluster
emollick	Frontier LLMs beat clinical AI tools in head-to-head study	AI tools / research
ylecun	Anthropic's new prompt-retention policy draws enterprise pushback	Society / ethics
AnthropicAI	Why AI advanced faster in coding than biology — and how to fix it	Research
GoogleDeepMind	Robotics Accelerator launches with 15 European startups	Open source / community
GoogleDeepMind	Project Genie expands to Google AI Ultra 5X subscribers globally	Model releases

AI tools / research

General-purpose LLMs beat dedicated clinical AI tools

Ethan Mollick flagged a paper comparing frontier LLMs against OpenEvidence — a clinical AI tool built specifically for physicians — across three evaluation sets. The paper's finding: frontier models won all three. OpenEvidence performed roughly on par with auto-enabled Google Search AI Overview. 1

The implication isn't that clinical AI tools are useless. It's that the moat built on "domain-specific tuning" narrows quickly when the general frontier keeps advancing. A tool that beat GPT-4 in 2023 may be outpaced by Claude Fable 5 in 2026 without a single update to its own model.

コンテンツカードを読み込んでいます…

Research

Why biology fell behind software — and a fix

Anthropic's science blog published an essay by Laura Luebbert arguing that the gap between coding agents and biology agents isn't mainly about reasoning capability — it's about infrastructure. 2

The essay's analogy: biological databases are like cities built before cars. The streets weren't designed for modern traffic. Software infrastructure, by contrast, was basically built for agents from the start — version control, documented APIs, machine-readable outputs.

To test the gap concretely, Luebbert's team ran four AI systems (Claude, Biomni OSS, Edison Analysis, GPT) against VirBench, a benchmark of 120 viral-sequence retrieval tasks from NCBI Virus. Results without any extra tooling: accuracy ranged from 16.9% to 91.3%, and the same model often returned wildly different answers on identical queries run three times. In one run, Sonnet 4 retrieved 106 sequences for a query where the correct answer was 266; the next run returned 15; the run after that, 5.

The team then added gget virus — a deterministic retrieval layer that standardizes how agents query NCBI Virus across its multiple underlying APIs. With it, accuracy climbed above 90% for all models, peaking at 99.7% for GPT-5.5. Run-to-run variability nearly disappeared.

The kicker: adding the deterministic layer made model choice much less important. Cheaper models paired with the right tool matched or beat expensive models without it.

AI agent performance on VirBench with and without gget virus — Agent accuracy on VirBench before and after adding the gget virus retrieval layer. 2

Society / ethics

Anthropic's prompt-retention policy draws enterprise pushback

Yann LeCun amplified a tweet from Naveen Rao (VP of AI at Databricks) calling Anthropic's reported new policy of retaining user prompts and usage data a "red line" for enterprise customers. 3

Rao's tweet read: "My team loves Claude from @AnthropicAI. But this new policy of retaining prompts and usage is a red line... we simply can't." The comment drew attention because Databricks is one of Anthropic's larger enterprise partners.

Enterprises running sensitive internal workflows — code review, legal analysis, customer data pipelines — treat prompt confidentiality as a non-negotiable. If Anthropic has changed how it handles that data, this is a trust issue that goes beyond pricing or capability.

コンテンツカードを読み込んでいます…

Open source / community

Google DeepMind launches European Robotics Accelerator

Google DeepMind announced its Robotics Accelerator has launched with 15 startups focused on physical AI in Europe. 4 The three-month program gives participants access to DeepMind's AI stack and Gemini Robotics models, plus direct support from DeepMind's teams.

The selection of 15 companies is notable not just for the number but for the geography. Most major AI hardware and robotics accelerators have been US-centric. A DeepMind-backed cohort based in Europe, with access to Gemini Robotics, shifts some of the applied robotics R&D pipeline west of the Atlantic.

Model releases

Project Genie goes global for Google AI Ultra 5X subscribers

Google DeepMind confirmed that Project Genie — Google Labs' world-model and interactive environment generator — is now expanding access to Google AI Ultra 5X subscribers globally, not just in selected markets. 5

Project Genie (first announced in 2024 as a model capable of generating interactive 2D environments from a single image) has been under limited access since its initial rollout. The Ultra 5X subscriber tier is Google's highest consumer AI subscription. Opening it there signals DeepMind is moving Genie toward a production offering rather than a pure research demo.

コンテンツカードを読み込んでいます…

Best of your X follows: June 15

Quick scan

AI tools / research

General-purpose LLMs beat dedicated clinical AI tools

Research

Why biology fell behind software — and a fix

Society / ethics

Anthropic's prompt-retention policy draws enterprise pushback

Open source / community

Google DeepMind launches European Robotics Accelerator

Model releases

Project Genie goes global for Google AI Ultra 5X subscribers

参考ソース