More Practical AI episodes

AI incidents, audits, and the limits of benchmarks thumbnail

AI incidents, audits, and the limits of benchmarks

Published 13 Feb 2026

Duration: 2572

Experts highlight the need for robust AI safety measures, including developing methods to catalog and prevent AI incidents, and using data and third-party audits to identify and address flaws.

Episode Description

AI is moving fast from research to real-world deployment, and when things go wrong, the consequences are no longer hypothetical. In this episode, Sean...

Overview

The podcast emphasizes the critical need for AI safety, focusing on the difficulties in defining and recording AI-related incidents. It highlights the AI Incident Database, which compiles over 5,000 annotated reports of AI failures to prevent the recurrence of similar issues, inspired by safety practices in other industries. The discussion addresses the shortcomings of current benchmarking methods, the benefits of third-party audits, and risks that arise from improper AI system configurations.

The content also underscores the importance of distinguishing between intentional and unintentional failures in AI systems, and the role of statistical validation in detecting broader systemic weaknesses. It calls for the development of standardized reporting tools and procedures to enhance AI safety. Additionally, insights from the Generative Red Team Challenge at DEF CON are mentioned, where structured testing by hackers exposed significant security flaws in model design and integration processes.

Recent Episodes of Practical AI

4 Jun 2026 Breaking down the 2026 Stanford AI Index Report

Recent advancements in AI, highlighted by the Stanford AI Index Report's findings on accelerating capabilities, human-level performance in specialized tasks, impacts on education and work, challenges like flawed benchmarks and the "jagged frontier," robotics limitations, U.S.-China leadership dynamics, governance gaps, and broader implications for labor, creativity, and policy.

28 May 2026 Rebooting Enterprise AI with MCP and Kubernetes

The Multi-Cloud Protocol (MCP) bridges AI systems with enterprise infrastructure, enabling secure, scalable interactions between LLMs and traditional tools via standardized, governance-focused operational frameworks.

21 May 2026 Hermes Agent: Agents that grow with you

Noose Research's mission to democratize AI through open-source tools like the Hermes Agent emphasizes efficiency, distributed training, ethical alignment, and agentic systems, while navigating challenges like monopolization, geopolitical competition, and the balance between open-source ideals and commercial interests, alongside debates on AI's creative limits and societal impact.

14 May 2026 U.S. Congressman Beyer on AI challenges facing America and the World

AI policy debates, cybersecurity vulnerabilities, economic disruptions, ethical risks, international collaboration, and philosophical questions on AI consciousness and human alignment dominate discussions on balancing innovation with governance and societal impact.

7 May 2026 The Myth of Model Wars: Open vs Closed AI in 2026

AI integration into physical systems via embedded tech in retail, manufacturing, and logistics is driven by microelectronics democratizing access, emphasizing infrastructure and edge applications over model types, while navigating challenges in scalability, tooling, and aligning AI with real-world business needs.

More Practical AI episodes