Durable Execution and Modern Distributed Systems

Published 17 Mar 2026

Show Notes: podcasters.spotify.com/pod/show/mlops/episodes/Durable-Execution-and-Modern-Distributed-Systems-e3giukm

Duration: 01:00:36

Temporal enhances developer productivity by enabling crash-proof workflows through deterministic programming models, separating business logic from fault tolerance, and simplifying distributed systems with durable execution, workflows, activities, and persistence layers like Cassandra/Postgres.

Episode Description

Johann Schleier-Smith is the Technical Lead for AI at Temporal Technologies, working on reliable infrastructure for production AI systems and long-run...

Overview

The podcast focuses on enhancing developer productivity through tools and frameworks, particularly agentic systems that interact with the world asynchronously, reliably, and durably. It emphasizes durable execution, a methodology ensuring software tasks complete reliably despite failures like cloud outages or rate limits. Key principles include crash-proof software, separation of reliability mechanisms from business logic, and the use of platforms like Temporal, an open-source solution that decouples durability from application code. Temporal employs a programming model distinguishing between workflows (deterministic, business-logic-heavy code) and activities (IO-heavy tasks), enabling deterministic execution and cross-region failover. This approach abstracts complexity in retries and fault tolerance, allowing developers to focus on core logic while ensuring resilience in distributed systems.

The discussion extends to challenges and comparisons, such as the learning curve of deterministic workflows and the limitations of legacy systems lacking built-in durability. It contrasts durable execution with checkpointing, noting the latters limitations in handling complex, concurrent scenarios. Temporals "continue as new" feature allows long-running agents to maintain state through snapshots, combining checkpointing with durable execution. Use cases span agentic systems (e.g., LLM-driven workflows), transaction management, and serverless architecture, highlighting Temporals role in simplifying cloud workflows by managing state persistence and recovery internally. Key benefits include abstracting infrastructure complexity, supporting both linear workflows and branched, concurrent processes, and enabling scalability through auto-scaling and serverless integration.

The podcast also addresses advanced features like dynamic workflow control (signals, updates, pausing), handling large payloads via external storage, and future developments in streaming and agent interaction. It underscores the importance of separating Temporals state management from external systems (e.g., databases) while ensuring security through encryption and trusted execution environments. Ultimately, the focus is on Temporals role in modernizing cloud workflows, reducing developer overhead, and enabling scalable, reliable systems for agentic and non-agentic applications alike.

Recent Episodes of MLOps.community

19 Jun 2026 Sandboxing, Agent Harnesses, and Agent Teamwork

The text examines "Harness" componentsprompts, tools, and feedback systemsthat balance AI agent autonomy with control through adaptive strategies, human oversight, and iterative testing to improve reliability and alignment with human judgment in dynamic tasks.

16 Jun 2026 MCP Servers Are Becoming the UI for AI Agents

Gateways as proxies for AI via MCP address security, traffic control, and cost management while tackling server development challenges, optimization of tool calls, microservices scaling, protocol tracing limitations, ownership shifts, and the need for unbiased evaluations and agent-driven usability assessments.

12 Jun 2026 MCP, Agents & the $40M Bet on Multiplayer AI

Recommended: Multiplayer Bots as a Action Paradigm

The integration of AI into work practices shifts toward collaborative "multiplayer" systems using flocking-inspired dynamics, addressing challenges like limited AI time horizons, technical tools for shared collaboration, balancing human-AI roles, infrastructure scaling, and the need for adaptive governance and futureproofing.

9 Jun 2026 From Single-Player to Multi-Player: Operating AI Agents at Scale

AI agent infrastructure and governance require control planes for security, compliance, and risk mitigation, addressing operational challenges, productivity gains, and the need for standardized frameworks, modular designs, and transparent collaboration.

5 Jun 2026 The Control-vs-Magic Spectrum Building Agents

iFood Pago leverages AI-driven tools like ChatBank to automate financial services for Brazilian restaurants, balancing automation with personalization while addressing challenges in scaling AI, risk management, and the impact of declining training costs on software accessibility.

More MLOps.community episodes