More MLOps.community episodes

Software Engineering in the Age of Coding Agents: Testing, Evals, and Shipping Safely at Scale thumbnail

Software Engineering in the Age of Coding Agents: Testing, Evals, and Shipping Safely at Scale

Published 10 Feb 2026

Duration: 00:57:24

Software engineers must adopt new approaches to validation and reliability as AI tools become increasingly integrated with traditional engineering, requiring systematic prompt management and careful balancing of AI guidance to prevent over-reliance.

Episode Description

Ereli Eran is the Founding Engineer at 7AI, where hes focused on building and scaling the companys agentic AI-driven cybersecurity platform developing...

Overview

The podcast examines how AI tools are increasingly affecting software engineering practices, pointing out the importance of handling AI's unpredictable outputs through sensitivity analysis, thorough testing, and precise prompt engineering. It addresses the evolving landscape where traditional software engineering and data science roles are merging, creating the need for new approaches to ensure the validation and reliability of AI-integrated systems. Key challenges mentioned include the inherent instability of large language models (LLMs), their limited context window, and the complexity of debugging and maintaining consistent behavior in AI applications.

Several strategies are discussed to enhance control and efficiency when working with AI, such as progressive disclosure, dynamic prompt injection, and graph-based prompting. The importance of domain knowledge in crafting effective prompts and managing user-generated inputs is emphasized, as well as the need for transparent UX design to build trust and clarity in AI-driven systems. For critical applications like security, the podcast highlights the necessity of prompt versioning, systematic testing, and observability to ensure dependable performance. It also warns against the overuse of LLMs and advocates for a balanced approach that treats prompts with the same care as code, avoiding over-engineering while maintaining necessary guidance.

Recent Episodes of MLOps.community

31 Mar 2026 This One Shift Makes Developers Obsolete

Processing live stream data involves transcription, AI-driven skill categorization, GitHub organization, multimedia-comment correlation, and knowledge graphs, while addressing redundancy, AI costs, and MLOps trends, AI agent debates, adversarial workflows, security risks, and tooling like Open Claw and Agent Zero.

30 Mar 2026 Operationalizing AI Agents: From Experimentation to Production // Databricks Roundtable

Deploying AI agents in real-world systems demands robust safety protocols, human oversight, and structured testing to address risks like errors and vulnerabilities, while balancing innovation with responsibility through observability, governance, domain expertise, and tools like MLflow, across use cases from workflow automation to critical system reliability.

27 Mar 2026 arrowspace: Vector Spaces and Graph Wiring

Epiplexity introduces a framework redefining entropy and complexity with structural information, while topological search and graph-based methods enhance semantic accuracy in machine learning by preserving data through high-dimensional embeddings and hybrid geometric-topological analysis, outperforming traditional approaches in retrieval and reasoning tasks.

20 Mar 2026 Agentic Marketplace

AI-driven agent systems in OLX's classifieds marketplace aim to innovate user experiences by overcoming UI constraints through dynamic intent extraction, hybrid chat/UI models, and trust-building in real estate and motors, with future focus on logistics automation, secure transactions, and human-agent integration.

17 Mar 2026 Durable Execution and Modern Distributed Systems

Temporal enhances developer productivity by enabling crash-proof workflows through deterministic programming models, separating business logic from fault tolerance, and simplifying distributed systems with durable execution, workflows, activities, and persistence layers like Cassandra/Postgres.

More MLOps.community episodes