More The Secure Disclosure episodes

Prompt Injection Might Never Be Solved w/ Paul Vann thumbnail

Prompt Injection Might Never Be Solved w/ Paul Vann

Published 28 May 2026

Duration: 00:30:39

The text details AI security threats like prompt injection, jailbreak attacks, and distillation attacks, along with vulnerabilities such as AI bias and autonomous agent risks, highlighting detection challenges, emerging malware, supply chain exploits, and the industry's struggle to keep pace with rapidly evolving AI technologies.

Episode Description

In this episode of Secure Disclosure, host Matt sits down with Paul Van, CEO and founder of Validia, to explore the frontier of AI security. Instead o...

Overview

The podcast explores critical AI security threats, including prompt injection, which remains unsolvable due to its parallels with social engineering, and jailbreak attacks that enable AI models to bypass ethical constraints or reveal internal processes. Distillation attacks pose significant risks by allowing adversaries to extract reasoning and reward models from AI systems like ChatGPT or Gemini, often indistinguishable from legitimate user queries. Additionally, biases within AI models are highlighted as exploitable vulnerabilities, such as an antivirus tool misclassifying malicious code as safe due to similarities with game code. The discussion also addresses AI bias as a systemic risk, emphasizing the need for proactive safeguards against unintended outcomes.

Efforts to secure AI models focus on shifting from traditional firewall approaches to "inside-out" strategies, as advocated by Validias CEO, Paul Van. This involves addressing internal vulnerabilities, such as AI bias, and analyzing model behavior during attacks rather than relying on input/output classification. The podcast delves into distillation techniques, where adversaries use prompts to replicate models reasoning, enabling malicious applications like malware generation. Evasion tactics, such as proxy usage, complicate real-time detection, while mechanistic interpretability remains limited, with current methods capturing less than 5% of a models behavior. Researchers also warn of emerging threats like AI-driven penetration testing tools and the potential for a "machine vs. machine" arms race as AI agents become more autonomous.

The discussion underscores the rapid evolution of AI technologies, exemplified by tools like OpenClaw, which transitioned from obscurity to prominence in months. Concerns about unregulated AI tools are raised, including supply chain attacks and polymorphic malware capable of evading detection. While AI holds potential to enhance security once its risks are fully understood, the field remains in a "problem discovery phase," with solutions lagging behind emerging threats. Legal and ethical challenges, such as the controversy around publishing speculative theories on AI-driven malware, further complicate the landscape. The podcast concludes by highlighting the urgency of developing robust detection methods and behavioral analysis frameworks to mitigate the growing risks of adversarial AI systems.

What If

  • What if you prioritized behavioral analysis to detect prompt injection attacks in real time?

    • Move: Develop a lightweight behavioral monitoring module that tracks anomalies in AI model responses, such as unexpected shifts in tone or deviation from training data patterns.
    • Why Now?: Traditional input/output classifiers have failed to generalize against novel attacks, and open-source tools like PromptFu underscore the urgency of adaptive solutions.
    • Expected Upside: Early detection of prompt injection attempts could prevent unauthorized access, enhancing your product's security posture before adversaries exploit gaps.
  • What if you implemented request anomaly detection to block distillation attacks?

    • Move: Integrate a proxy detection system that flags repeated high-volume requests from single IP ranges or patterns mimicking user-generated prompts.
    • Why Now?: Distillation attacks are rising, and their stealthy nature makes them ideal for IP theft or malicious model replication without user suspicion.
    • Expected Upside: This would reduce the risk of your models being reverse-engineered, protecting your IP and maintaining user trust in your products integrity.
  • What if you built a bias-checking layer to mitigate AI bias as a vulnerability?

    • Move: Create a post-processing pipeline that evaluates model outputs against predefined fairness metrics and flags decisions that mirror known biases (e.g., false negatives in security tools).
    • Why Now?: The Rocket League antivirus example shows how bias can create exploitable weaknesses, and Validias focus on inside-out strategies aligns with addressing root causes rather than symptoms.
    • Expected Upside: Reducing bias-related vulnerabilities could prevent critical failures like false negatives in security tools, improving both reliability and compliance with ethical AI standards.

Takeaway

  • Implement behavior-based detection mechanisms to monitor AI model responses for anomalies, focusing on jailbreaks and prompt injection by analyzing response patterns rather than relying solely on input/output classifiers.
  • Audit and mitigate AI bias in models by testing for contextual weaknesses (e.g., flagging code as safe when it shouldnt be) and using diverse training data to reduce exploitable gaps in ethical constraints.
  • Enforce strict query rate limits and anomaly detection for API requests to prevent distillation attacks, which exploit mass queries to extract model knowledge without triggering obvious security alerts.
  • Adopt "inside-out" security strategies by embedding safeguards directly into AI model architecture (e.g., behavioral "trauma response" analysis) instead of relying on external firewalls or LLM-based prompt analyzers.
  • Regularly update AI security protocols to address emerging threats like polymorphic malware or adversarial distillation, using tools like mechanistic interpretability and real-time behavioral analysis to generalize protection against novel attacks.

Recent Episodes of The Secure Disclosure

22 May 2026 AI Broke the Security Ecosystem w/ Chris Hughes

Evolving cybersecurity challenges include supply chain threats, AI vulnerabilities, and outdated tools, highlighting the need for systemic reforms like developer incentives, regulatory clarity, and industry-government collaboration to address gaps in vulnerability management and the dual risks of AI's role in both threat detection and exploitation.

15 May 2026 PostHog is placing a wild bet on AI Coding w/ James Hawkins

Recommended: Should you go open source?

PostHog's open-source analytics platform prioritizes transparency, developer autonomy, and AI integration while critiquing corporate norms, emphasizing price clarity, building in public, and balancing automation with security governance in product development.

6 May 2026 AI Panic is Driving Shadow IT w/ Noora Ahmed-Moshe

AI's impact on employment and cybersecurity risks, driven by shadow AI, phishing, and emerging threats like prompt injection, require balancing workforce skills, security measures, and organizational trust.

29 Apr 2026 When AI Agents Change their Intent w/ Frank Vukovits

AI agents, autonomous non-human entities operating in enterprise systems without human oversight, pose security and governance challenges requiring updated access control frameworks, real-time monitoring, and intent-based governance to address risks like unauthorized access and shadow AI, paralleling historical tech challenges like Y2K.

22 Apr 2026 OWASP Top 10, Vibe Coding, and What Developers Miss w/ Tanya Janca

Gaps in cybersecurity education, persistent vulnerabilities like SQL injection, OWASP data limitations, evolving supply chain risks, high training costs, AI's contextual challenges, and the need for secure-by-design principles and collaboration highlight systemic challenges in addressing evolving cyber threats.

More The Secure Disclosure episodes