Prompt Injection Might Never Be Solved w/ Paul Vann

Published 28 May 2026

Show Notes: podcasters.spotify.com/pod/show/thesecuredisclosure/episodes/Prompt-Injection-Might-Never-Be-Solved-w-Paul-Vann-e3k12m1

Duration: 00:30:39

The text details AI security threats like prompt injection, jailbreak attacks, and distillation attacks, along with vulnerabilities such as AI bias and autonomous agent risks, highlighting detection challenges, emerging malware, supply chain exploits, and the industry's struggle to keep pace with rapidly evolving AI technologies.

Episode Description

In this episode of Secure Disclosure, host Matt sits down with Paul Van, CEO and founder of Validia, to explore the frontier of AI security. Instead o...

Overview

The podcast explores critical AI security threats, including prompt injection, which remains unsolvable due to its parallels with social engineering, and jailbreak attacks that enable AI models to bypass ethical constraints or reveal internal processes. Distillation attacks pose significant risks by allowing adversaries to extract reasoning and reward models from AI systems like ChatGPT or Gemini, often indistinguishable from legitimate user queries. Additionally, biases within AI models are highlighted as exploitable vulnerabilities, such as an antivirus tool misclassifying malicious code as safe due to similarities with game code. The discussion also addresses AI bias as a systemic risk, emphasizing the need for proactive safeguards against unintended outcomes.

Efforts to secure AI models focus on shifting from traditional firewall approaches to "inside-out" strategies, as advocated by Validias CEO, Paul Van. This involves addressing internal vulnerabilities, such as AI bias, and analyzing model behavior during attacks rather than relying on input/output classification. The podcast delves into distillation techniques, where adversaries use prompts to replicate models reasoning, enabling malicious applications like malware generation. Evasion tactics, such as proxy usage, complicate real-time detection, while mechanistic interpretability remains limited, with current methods capturing less than 5% of a models behavior. Researchers also warn of emerging threats like AI-driven penetration testing tools and the potential for a "machine vs. machine" arms race as AI agents become more autonomous.

The discussion underscores the rapid evolution of AI technologies, exemplified by tools like OpenClaw, which transitioned from obscurity to prominence in months. Concerns about unregulated AI tools are raised, including supply chain attacks and polymorphic malware capable of evading detection. While AI holds potential to enhance security once its risks are fully understood, the field remains in a "problem discovery phase," with solutions lagging behind emerging threats. Legal and ethical challenges, such as the controversy around publishing speculative theories on AI-driven malware, further complicate the landscape. The podcast concludes by highlighting the urgency of developing robust detection methods and behavioral analysis frameworks to mitigate the growing risks of adversarial AI systems.

What If

What if you prioritized behavioral analysis to detect prompt injection attacks in real time?
- Move: Develop a lightweight behavioral monitoring module that tracks anomalies in AI model responses, such as unexpected shifts in tone or deviation from training data patterns.
- Why Now?: Traditional input/output classifiers have failed to generalize against novel attacks, and open-source tools like PromptFu underscore the urgency of adaptive solutions.
- Expected Upside: Early detection of prompt injection attempts could prevent unauthorized access, enhancing your product's security posture before adversaries exploit gaps.
What if you implemented request anomaly detection to block distillation attacks?
- Move: Integrate a proxy detection system that flags repeated high-volume requests from single IP ranges or patterns mimicking user-generated prompts.
- Why Now?: Distillation attacks are rising, and their stealthy nature makes them ideal for IP theft or malicious model replication without user suspicion.
- Expected Upside: This would reduce the risk of your models being reverse-engineered, protecting your IP and maintaining user trust in your products integrity.
What if you built a bias-checking layer to mitigate AI bias as a vulnerability?
- Move: Create a post-processing pipeline that evaluates model outputs against predefined fairness metrics and flags decisions that mirror known biases (e.g., false negatives in security tools).
- Why Now?: The Rocket League antivirus example shows how bias can create exploitable weaknesses, and Validias focus on inside-out strategies aligns with addressing root causes rather than symptoms.
- Expected Upside: Reducing bias-related vulnerabilities could prevent critical failures like false negatives in security tools, improving both reliability and compliance with ethical AI standards.

Takeaway

Implement behavior-based detection mechanisms to monitor AI model responses for anomalies, focusing on jailbreaks and prompt injection by analyzing response patterns rather than relying solely on input/output classifiers.
Audit and mitigate AI bias in models by testing for contextual weaknesses (e.g., flagging code as safe when it shouldnt be) and using diverse training data to reduce exploitable gaps in ethical constraints.
Enforce strict query rate limits and anomaly detection for API requests to prevent distillation attacks, which exploit mass queries to extract model knowledge without triggering obvious security alerts.
Adopt "inside-out" security strategies by embedding safeguards directly into AI model architecture (e.g., behavioral "trauma response" analysis) instead of relying on external firewalls or LLM-based prompt analyzers.
Regularly update AI security protocols to address emerging threats like polymorphic malware or adversarial distillation, using tools like mechanistic interpretability and real-time behavioral analysis to generalize protection against novel attacks.

Recent Episodes of The Secure Disclosure

1 Jul 2026 Solving the Supply Chain Security & Malware Crisis w/John Amaral

Escalating software supply chain threats target open-source ecosystems through credential exploitation, AI-fueled malware, and upstream compromises, with challenges in dependency management and outdated libraries driving AI-driven remediation strategies like automated patching and version pinning, though human oversight remains critical for validating fixes.

25 Jun 2026 AI is an Amplifier: Why Bad Infrastructure Gets Wronger Faster w/ Abdel SGHIOUAR

AI agents in cloud-native ecosystems like Kubernetes offer automation and monitoring benefits but face skepticism, security risks, trust issues from hallucinations, and require safeguards and human oversight to balance innovation with control and ensure reliability.

16 Jun 2026 Your Microphone Became a Keylogger w/ David vonThenen

Machine learning analyzes keystroke acoustic signatures to infer typed characters over remote platforms, highlighting high accuracy with known keyboards, privacy risks from surveillance, and challenges in noise and variability, while proposing defenses and noting AI's dual-use implications.

9 Jun 2026 Understand the Software Supply Chain Chaos w/ Roeland Delrue

Rapidly evolving supply chain security threats, including malicious open-source components and AI-driven malware, demand advanced AI-powered solutions like Akito Securitys self-securing software and tailored tools to address vulnerabilities in developer environments and package repositories.

22 May 2026 AI Broke the Security Ecosystem w/ Chris Hughes

Evolving cybersecurity challenges include supply chain threats, AI vulnerabilities, and outdated tools, highlighting the need for systemic reforms like developer incentives, regulatory clarity, and industry-government collaboration to address gaps in vulnerability management and the dual risks of AI's role in both threat detection and exploitation.

More The Secure Disclosure episodes