More MLOps.community episodes

Software Engineering in the Age of Coding Agents: Testing, Evals, and Shipping Safely at Scale thumbnail

Software Engineering in the Age of Coding Agents: Testing, Evals, and Shipping Safely at Scale

Published 10 Feb 2026

Duration: 00:57:24

Software engineers must adopt new approaches to validation and reliability as AI tools become increasingly integrated with traditional engineering, requiring systematic prompt management and careful balancing of AI guidance to prevent over-reliance.

Episode Description

Ereli Eran is the Founding Engineer at 7AI, where hes focused on building and scaling the companys agentic AI-driven cybersecurity platform developing...

Overview

The podcast examines how AI tools are increasingly affecting software engineering practices, pointing out the importance of handling AI's unpredictable outputs through sensitivity analysis, thorough testing, and precise prompt engineering. It addresses the evolving landscape where traditional software engineering and data science roles are merging, creating the need for new approaches to ensure the validation and reliability of AI-integrated systems. Key challenges mentioned include the inherent instability of large language models (LLMs), their limited context window, and the complexity of debugging and maintaining consistent behavior in AI applications.

Several strategies are discussed to enhance control and efficiency when working with AI, such as progressive disclosure, dynamic prompt injection, and graph-based prompting. The importance of domain knowledge in crafting effective prompts and managing user-generated inputs is emphasized, as well as the need for transparent UX design to build trust and clarity in AI-driven systems. For critical applications like security, the podcast highlights the necessity of prompt versioning, systematic testing, and observability to ensure dependable performance. It also warns against the overuse of LLMs and advocates for a balanced approach that treats prompts with the same care as code, avoiding over-engineering while maintaining necessary guidance.

Recent Episodes of MLOps.community

12 May 2026 The Latency Goldilocks Zone Explained

iFood's ILO AI agent leverages a Learning Context Model to deliver hyper-personalized food recommendations by integrating diverse AI techniques, navigating cultural nuances, and balancing familiar and novel choices while addressing multi-channel design, latency, scalability, data alignment, and experimental innovation challenges.

8 May 2026 Building MCP Before MCP Existed: Inside Despegar's Sofia Agent

Sophia, an AI-powered travel concierge using a multi-agent system and decentralized collaboration, aims to streamline bookings, in-trip services, and personalized experiences through AI-driven automation, chat/voice interfaces, and orchestration layers, while expanding capabilities and reducing friction in travel processes.

1 May 2026 Voice Agent Use Cases

Designing voice-based AI systems involves balancing user control with automation, addressing speech quality-latency trade-offs, creating intuitive non-technical interfaces, overcoming transcription and turn-taking challenges in real-world environments, integrating hybrid models and domain-specific tuning, while ensuring compliance, user trust, and ethical considerations in applications like customer support and dynamic environments through feedback loops.

24 Apr 2026 The Creator of Superpowers: Why Real Agentic Engineering Beats Vibe Coding

The text discusses using the Greenfield toolset to convert legacy code into structured specifications and the Superpowers framework to enhance AI agents through psychological persuasion techniques, emphasizing task decomposition, subagent roles, challenges in consistency and security, and future trends in agentic problem-solving and ethical AI development.

21 Apr 2026 It's 2026, and We're Still Talking Evals

Evaluations in AI product development must be integrated early, address real-world complexities, use nuanced metrics beyond accuracy, employ user-centric and iterative testing, leverage post-deployment data, and adapt tailored strategies to balance quality, domain-specific metrics, and system reliability.

More MLOps.community episodes