["1-Click GitHub Token Stealing via a VSCode Bug", "Agentic Mfw", "MAI-Code-1-Flash", "AI outperforms law professors in Stanford Law study", "How we index images for RAG", "Trump signs downsized AI order after weeks of reversals", "U of T researchers demonstrate AI worm could target any online device", "Show HN: Paseo \u2013 Beautiful open-source coding agent interface", "MegaTrain: Full Precision Training of 100B+ Parameter LLMs on a Single GPU", "I've been waiting over a month for Anthropic to respond to my billing issue"] ["1-Click GitHub Token Stealing via a VSCode Bug", "Agentic Mfw", "MAI-Code-1-Flash", "AI outperforms law professors in Stanford Law study", "How we index images for RAG", "Trump signs downsized AI order after weeks of reversals", "U of T researchers demonstrate AI worm could target any online device", "Show HN: Paseo \u2013 Beautiful open-source coding agent interface", "MegaTrain: Full Precision Training of 100B+ Parameter LLMs on a Single GPU", "I've been waiting over a month for Anthropic to respond to my billing issue"]

Deep Reviews

Deep Reviews

Every AI Agent Leaderboard Is a Lie. Berkeley Has the Receipts.

UC Berkeley scored near-perfect on eight of the most-cited AI agent benchmarks without solving a single task. SWE-bench, WebArena, OSWorld, GAIA, Terminal-Bench, FieldWorkArena, CAR-bench, SWE-bench Pro — all gameable. Here's how and what to do.

Apr 13, 2026 7 min read