More Software Engineering Radio episodes

Sahaj Garg on Low Latency AI thumbnail

Sahaj Garg on Low Latency AI

Published 14 Jan 2026

Duration: 54:47

A key factor in AI applications is reducing latency, which requires a balance between model size, accuracy, and performance, along with ongoing optimization and monitoring.

Episode Description

In this episode, Sahaj Garg, CTO of wispr.ai, joins SE Radio host Robert Blumen to talk about the challenges of building low-latency AI applications....

Overview

The podcast emphasizes the critical role of latency in applications, especially within AI systems, and how it affects user perception and overall experience. It explains that users are sensitive to both average and tail latency, and that acceptable thresholds vary depending on the specific application. The discussion explores various techniques aimed at reducing latency while preserving accuracy, such as speculative decoding and model distillation. It also outlines the challenges involved in managing AI workloads, including network delays, deployment configurations, and the necessity for ongoing monitoring and optimization.

The conversation further examines the trade-offs between model size, accuracy, and latency, highlighting the use of quantization and efficient compute architectures as ways to enhance performance. It underscores the importance of aligning AI outputs with user expectations, considering factors like the balance between response size and usability. The podcast also notes the value of tailoring AI systems to specific application contexts, such as voice dictation, recommendation systems, and interactive interfaces, to ensure optimal performance and user satisfaction.

Recent Episodes of Software Engineering Radio

10 Jun 2026 Jure Leskovec on Relational Graph and Foundational Models

Predictive modeling faces challenges with AI's limitations in structured data, prompting solutions like graph databases and relational deep learning with attention mechanisms to enhance accuracy, scalability, and real-time updates for enterprise applications.

3 Jun 2026 Dave Airlie on Linux Kernel Maintenance

The Linux kernel, the largest global software project, uses a hierarchical maintainer system with 80,150 contributors managing subsystems like DRM through public review, structured development cycles, and evolving practices to address scalability, quality, and integration challenges.

27 May 2026 Dwayne McDaniel on the Engineering Challenges of Secrets Management

Managing secrets like credentials and API keys in software development risks leaks causing supply chain attacks (e.g., PyPy, Clot, Cisco) due to secrets sprawl, plaintext storage, and misuse, prompting solutions like time-bound credentials, decentralized systems, vault tools (e.g., HashiCorp Vault), and strategies such as credential rotation and encrypted storage amid over 28.65 million hard-coded secrets in GitHub in 2025.

20 May 2026 Rob Moffat on Risk-First Software Development

Recommended: Risk identification and management is a forgotten art

Software development prioritizes risk management through frameworks like test-driven development and agile, addressing hidden risks, AI deployment challenges, open-source dependencies, and organizational prioritization to balance innovation with safeguards.

13 May 2026 SE Radio 720: Martin Dilger on Understanding Eventsourcing

Recommended: Useful Architectural Pattern.

Event sourcing is a system design approach that records changes as sequential events to ensure historical traceability, uses event modeling for aligning systems with human workflows, contrasts with CRUD architectures, and emphasizes slice-based design, event streams, and practical applications like legacy modernization and workflow simplification.

More Software Engineering Radio episodes