Controlling AI Models from the Inside

Published 20 Jan 2026

Show Notes: share.transistor.fm/s/df33214d

Duration: 2635

The podcast delves into the AI safety crisis, discussing ongoing struggles with AI-generated harm, the limitations of current security measures, and emerging solutions for real-time monitoring and more sophisticated safety protocols.

Episode Description

As generative AI moves into production, traditional guardrails and input/output filters can prove too slow, too expensive, and/or too limited. In this...

Overview

The podcast explores the ongoing challenges in AI safety, particularly the risks of AI systems generating harmful or unintended content such as violence, pornography, or dangerous advice. It differentiates between using AI for security and ensuring AI systems themselves are secure, stressing the importance of proactive safety measures beyond basic input and output filtering. Current approaches are criticized for being reactive, often relying on post-hoc analysis of outputs and struggling with detecting harmful content in complex media like video and audio.

The discussion highlights emerging solutions that use internal model instrumentation to identify unsafe behavior in real-time, offering a more efficient and scalable alternative. It also addresses the value of interpretability in AI, the need for layered defense strategies, and the potential of edge devices to support safety mechanisms with lower computational requirements. The conversation touches on economic and practical barriers to implementing strong safety measures and the difficulty of tailoring these systems to industry-specific needs, while envisioning a future of more adaptable and context-aware AI security frameworks.

Recent Episodes of Practical AI

14 May 2026 U.S. Congressman Beyer on AI challenges facing America and the World

AI policy debates, cybersecurity vulnerabilities, economic disruptions, ethical risks, international collaboration, and philosophical questions on AI consciousness and human alignment dominate discussions on balancing innovation with governance and societal impact.

7 May 2026 The Myth of Model Wars: Open vs Closed AI in 2026

AI integration into physical systems via embedded tech in retail, manufacturing, and logistics is driven by microelectronics democratizing access, emphasizing infrastructure and edge applications over model types, while navigating challenges in scalability, tooling, and aligning AI with real-world business needs.

23 Apr 2026 The mythos of Mythos and Allbirds takes flight to the neocloud

Allbirds' shift to AI compute infrastructure amid financial struggles and a 700% stock surge sparks discussions on neocloud scalability, embedded AI trends in retail/manufacturing, Anthropic's Mythos AI usage, ethical risks of AI-generated content, token maxing critiques, and calls for improved governance and legal frameworks to address AI efficiency and security challenges.

16 Apr 2026 Open Source Self-Driving with Comma AI

OpenPilot, an open-source self-driving system, evolves from a niche project to a GitHub leader through end-to-end imitation learning and diffusion-based simulation, contrasting with commercial systems by prioritizing innovation over scalability, while facing hardware and adaptability challenges in advancing autonomous driving.

9 Apr 2026 Post-Mortem of Anthropic's Claude Code Leak

A 2026 leak of Anthropic's Claude codebase, via a malicious Axios package and exposed internal tools, exposed critical AI safety risks, supply chain vulnerabilities, and the outsized importance of the "agent harness" infrastructure in enabling advanced capabilities beyond model weights.

More Practical AI episodes