More Latent Space episodes

METRs Joel Becker on exponential Time Horizon Evals, Threat Models, and the Limits of AI Productivity thumbnail

METRs Joel Becker on exponential Time Horizon Evals, Threat Models, and the Limits of AI Productivity

Published 27 Feb 2026

Duration: 3374

The podcast explores the intersection of AI capabilities and societal risks, discussing challenges in evaluating and deploying AI systems.

Episode Description

This is a free preview of a paid episode. To hear more, visit www.latent.spaceAIE Europe CFP and AIE Worlds Fair paper submissions for CAIS peer revie...

Overview

The podcast explores the META framework, a method for evaluating AI models through two main components: Model Evaluation (M-E), which assesses AI capabilities and real-world performance, and Threat Research (T-R), which investigates potential societal risks associated with AI advancements. It introduces the Model Time Horizon Chart, a visualization that depicts the linear progression of model capabilities over time, based on the difficulty and reliability of human-equivalent tasks. The discussion also covers challenges in measuring AI performance, the selection of appropriate tasks for evaluation, and how AI advancements could impact developer workflows and productivity.

The conversation further delves into concerns related to AI safety, the potential for rapid capability growth, and the future of AI autonomy. It addresses limitations in current benchmarking systems, the role of increasing computational power, and the possibility of AI systems improving themselves independently. The podcast also touches on topics like AI-driven trading and prediction markets, along with the ethical considerations that arise from evaluating and deploying AI technologies. Finally, it reflects on the ongoing evolution of AI research and the importance of developing open-ended evaluation methods to balance progress with long-term safety and risk management.

Recent Episodes of Latent Space

22 Jun 2026 Red-Teaming after Mythos Zico Kolter & Matt Fredrikson, Gray Swan

AI security challenges in large language models, such as data leakage and prompt injection, require adversarial testing, red teaming, tools like *Shade* and *Signal*, and structured frameworks to address integration risks, robustness gaps, and enterprise-specific security demands.

3 Jun 2026 Scaling Past Informal AI - Carina Hong, Axiom Math

Formal verification is positioned as a critical tool for advancing AI by ensuring system correctness through mathematical rigor, exemplified by Axiom Math's achievements, tools like Lean, challenges in AI generalization, and the vision of AI as a "superhuman mathematician" through verified reasoning.

3 Jun 2026 Satya Nadella: No Priors x Latent Space Crossover Special at Microsoft Build

Strategic AI development shifts to ecosystem-driven frameworks prioritizing value creation, covering Microsoft's rigorous model training, agent-driven workflow management, real-world impact challenges, innovative business models, inclusive AI participation, and redefining work through agentic systems.

2 Jun 2026 GitHub's plan for Agents Kyle Daigle, GitHub

Advanced AI integration in developer workflows leverages tools like GitHub Copilot and agentic systems to automate tasks and boost productivity, while addressing challenges like skill bloat, security, open-source trust issues, and the shift to modular AI capabilities in enterprise and collaborative environments.

1 Jun 2026 Why Video Agent models are next Ethan He, xAI Grok Imagine

Advancements in AI research through community-driven knowledge sharing, challenges in scaling video models, technical innovations like vision transformers and diffusion models, and the integration of language models in generative media, alongside hurdles in training efficiency and sustainable development.

More Latent Space episodes