METRs Joel Becker on exponential Time Horizon Evals, Threat Models, and the Limits of AI Productivity

Published 27 Feb 2026

Duration: 3374

The podcast explores the intersection of AI capabilities and societal risks, discussing challenges in evaluating and deploying AI systems.

Episode Description

This is a free preview of a paid episode. To hear more, visit www.latent.spaceAIE Europe CFP and AIE Worlds Fair paper submissions for CAIS peer revie...

Overview

The podcast explores the META framework, a method for evaluating AI models through two main components: Model Evaluation (M-E), which assesses AI capabilities and real-world performance, and Threat Research (T-R), which investigates potential societal risks associated with AI advancements. It introduces the Model Time Horizon Chart, a visualization that depicts the linear progression of model capabilities over time, based on the difficulty and reliability of human-equivalent tasks. The discussion also covers challenges in measuring AI performance, the selection of appropriate tasks for evaluation, and how AI advancements could impact developer workflows and productivity.

The conversation further delves into concerns related to AI safety, the potential for rapid capability growth, and the future of AI autonomy. It addresses limitations in current benchmarking systems, the role of increasing computational power, and the possibility of AI systems improving themselves independently. The podcast also touches on topics like AI-driven trading and prediction markets, along with the ethical considerations that arise from evaluating and deploying AI technologies. Finally, it reflects on the ongoing evolution of AI research and the importance of developing open-ended evaluation methods to balance progress with long-term safety and risk management.

Recent Episodes of Latent Space

5 May 2026 Doing Vibe Physics Alex Lupsasca, OpenAI

AI is advancing theoretical physics by rapidly solving complex problems like quantum field theory calculations and simulating models such as SYK, though it still relies on human collaboration for original insights and contextual validation, reshaping research methodologies and education.

27 Apr 2026 Physical AI that Moves the World Qasar Younis & Peter Ludwig, Applied Intuition

Applied Intuition develops safety-critical physical AI for automotive, construction, mining, and defense sectors, selling AI technology to manufacturers and governments through simulation, infrastructure, and proprietary systems to advance industrial innovation with reliable autonomy.

23 Apr 2026 AIE Europe Debrief + Agent Labs Thesis: Unsupervised Learning x Latent Space Crossover Special (2026)

The text discusses AI's evolving landscape, focusing on experimental agents potentially breaking containment by 2026, market disruptions from foundation models, infrastructure advancements like RAG, debates between infrastructure and application firms, outsourcing strategies, pre-2023 training data advantages, competitive coding AI sectors, and future trends in personalization and industry transformation amid scalability and quality challenges.

22 Apr 2026 Shopifys AI Phase Transition: 2026 Usage Explosion, Unlimited Opus-4.6 Token Budget, Tangle, Tangent, SimGym with Mikhail Parakhin, Shopify CTO

Shopify's AI strategies involve in-house tools like Tangled and QMD to automate workflows, collaborate with the AI community, address challenges in token usage and code quality, and explore applications in e-commerce, CI/CD optimization, and scalable AI experimentation.

15 Apr 2026 Notions Token Town: 5 Rebuilds, 100+ Tools, MCP vs CLIs and the Software Factory Future Simon Last & Sarah Sachs of Notion

CLIs and MCPs are emphasized for enterprise efficiency, alongside challenges in early AI integration, custom agent development for automation, strategic AGI management, and balancing automation with oversight, pricing, and collaboration tools like Notion.

More Latent Space episodes