More Latent Space episodes

NVIDIA's AI Engineers: Agent Inference at Planetary Scale and "Speed of Light"  Nader Khalil (Brev), Kyle Kranen (Dynamo) thumbnail

NVIDIA's AI Engineers: Agent Inference at Planetary Scale and "Speed of Light" Nader Khalil (Brev), Kyle Kranen (Dynamo)

Published 10 Mar 2026

Duration: 5017

Advancements in AI agents focus on automating complex tasks, optimizing resource management, and addressing efficiency and scalability challenges.

Episode Description

Join Kyle, Nader, Vibhu, and swyx live at NVIDIA GTC next week!Now that AIE Europe tix are ~sold out, our attention turns to Miami and Worlds Fair!The...

Overview

The podcast discusses advancements in AI agent automation, focusing on their ability to manage complex tasks and real-world resources, such as configuring compute clusters and provisioning GPUs. Challenges include ensuring efficient resource management and reducing inefficiencies like unnecessary GPU usage. Frameworks like Dynamo enable sub-agent coordination for task delegation, while systems like DGX Sparks model router optimize performance by dynamically routing queries between local and foundation models. Speculative decoding is highlighted as a technique to enhance efficiency in long-running tasks by predicting future prompts and prefetching data.

Technical innovations in CLI tools, such as ALECs redesigned CLI for streamlined compute resource access, are emphasized, alongside the debate between CLIs and APIs for local system interfacing, security, and portability. The discussion also covers professional GPU performance, noting that professional GPUs (e.g., Blackwell) offer cost efficiency and high throughput for large-scale tasks, though they may lag in speed compared to gaming GPUs. Challenges in AI systems include token cost optimization for long-running tasks, domain-specific efficiency trade-offs, and balancing scalability with economic and architectural goals.

Looking ahead, 2024 is framed as the "Year of System as Model," with a focus on scalable, distributed AI architectures. Innovations like Wide EP and MOE models are critical for enabling high parallelism and inference efficiency. Long-term goals for AI agents include achieving self-consistent autonomy over extended periods, though efficiency and cost hurdles remain. The content underscores the interplay between technical innovation, practical implementation, and the evolving landscape of AI and developer tools.

Recent Episodes of Latent Space

20 Mar 2026 Dreamer: the Personal Agent OS David Singleton

Dreamer is an AI platform democratizing access to agentic tools for non-technical users via customizable AI assistants, community-built apps, cross-device integration, and privacy-focused features, with a beta emphasis on accessibility, real-world productivity use cases, and third-party developer opportunities.

More Latent Space episodes