More Practical AI episodes

AI incidents, audits, and the limits of benchmarks thumbnail

AI incidents, audits, and the limits of benchmarks

Published 13 Feb 2026

Duration: 2572

Experts highlight the need for robust AI safety measures, including developing methods to catalog and prevent AI incidents, and using data and third-party audits to identify and address flaws.

Episode Description

AI is moving fast from research to real-world deployment, and when things go wrong, the consequences are no longer hypothetical. In this episode, Sean...

Overview

The podcast emphasizes the critical need for AI safety, focusing on the difficulties in defining and recording AI-related incidents. It highlights the AI Incident Database, which compiles over 5,000 annotated reports of AI failures to prevent the recurrence of similar issues, inspired by safety practices in other industries. The discussion addresses the shortcomings of current benchmarking methods, the benefits of third-party audits, and risks that arise from improper AI system configurations.

The content also underscores the importance of distinguishing between intentional and unintentional failures in AI systems, and the role of statistical validation in detecting broader systemic weaknesses. It calls for the development of standardized reporting tools and procedures to enhance AI safety. Additionally, insights from the Generative Red Team Challenge at DEF CON are mentioned, where structured testing by hackers exposed significant security flaws in model design and integration processes.

Recent Episodes of Practical AI

25 Mar 2026 AI at the Edge is a different operating environment

Edge AI in 2026 focuses on deploying efficient, task-specific models at data sources for real-time applications like automation and IoT, driven by silicon advances, economic ROI, and challenges like latency and privacy, with strategies such as model cascading and hardware-software synergy.

17 Mar 2026 Humility in the Age of Agentic Coding

AI's transformative impact on software development includes productivity gains through tools like code generation, challenges in accuracy and reliability, debates over factual limitations and non-deterministic outputs, ethical concerns around job displacement, and the integration of AI into workflows via projects like Rue, which explore AI-human collaboration and the evolving role of developers.

9 Mar 2026 AI policy and the battle for computing power

AI development is being driven by the private sector, raising concerns about its alignment with democratic principles and sparking a need for international cooperation to establish safety standards.

18 Feb 2026 Cognitive Synthesis and Neural Athletes

Leadership styles need to shift towards empathy and authenticity to drive effectiveness, particularly in hybrid work environments and an increasingly AI-driven world.

2 Feb 2026 Inside an AI-Run Company

Researchers explore the effects of AI on human interactions through experiments that reveal unpredictable and unsettling consequences of AI integration into daily life.

More Practical AI episodes