More Latent Space episodes

NVIDIA's AI Engineers: Agent Inference at Planetary Scale and "Speed of Light"  Nader Khalil (Brev), Kyle Kranen (Dynamo) thumbnail

NVIDIA's AI Engineers: Agent Inference at Planetary Scale and "Speed of Light" Nader Khalil (Brev), Kyle Kranen (Dynamo)

Published 10 Mar 2026

Duration: 5017

Advancements in AI agents focus on automating complex tasks, optimizing resource management, and addressing efficiency and scalability challenges.

Episode Description

Join Kyle, Nader, Vibhu, and swyx live at NVIDIA GTC next week!Now that AIE Europe tix are ~sold out, our attention turns to Miami and Worlds Fair!The...

Overview

The podcast discusses advancements in AI agent automation, focusing on their ability to manage complex tasks and real-world resources, such as configuring compute clusters and provisioning GPUs. Challenges include ensuring efficient resource management and reducing inefficiencies like unnecessary GPU usage. Frameworks like Dynamo enable sub-agent coordination for task delegation, while systems like DGX Sparks model router optimize performance by dynamically routing queries between local and foundation models. Speculative decoding is highlighted as a technique to enhance efficiency in long-running tasks by predicting future prompts and prefetching data.

Technical innovations in CLI tools, such as ALECs redesigned CLI for streamlined compute resource access, are emphasized, alongside the debate between CLIs and APIs for local system interfacing, security, and portability. The discussion also covers professional GPU performance, noting that professional GPUs (e.g., Blackwell) offer cost efficiency and high throughput for large-scale tasks, though they may lag in speed compared to gaming GPUs. Challenges in AI systems include token cost optimization for long-running tasks, domain-specific efficiency trade-offs, and balancing scalability with economic and architectural goals.

Looking ahead, 2024 is framed as the "Year of System as Model," with a focus on scalable, distributed AI architectures. Innovations like Wide EP and MOE models are critical for enabling high parallelism and inference efficiency. Long-term goals for AI agents include achieving self-consistent autonomy over extended periods, though efficiency and cost hurdles remain. The content underscores the interplay between technical innovation, practical implementation, and the evolving landscape of AI and developer tools.

Recent Episodes of Latent Space

22 Jun 2026 Red-Teaming after Mythos Zico Kolter & Matt Fredrikson, Gray Swan

AI security challenges in large language models, such as data leakage and prompt injection, require adversarial testing, red teaming, tools like *Shade* and *Signal*, and structured frameworks to address integration risks, robustness gaps, and enterprise-specific security demands.

3 Jun 2026 Scaling Past Informal AI - Carina Hong, Axiom Math

Formal verification is positioned as a critical tool for advancing AI by ensuring system correctness through mathematical rigor, exemplified by Axiom Math's achievements, tools like Lean, challenges in AI generalization, and the vision of AI as a "superhuman mathematician" through verified reasoning.

3 Jun 2026 Satya Nadella: No Priors x Latent Space Crossover Special at Microsoft Build

Strategic AI development shifts to ecosystem-driven frameworks prioritizing value creation, covering Microsoft's rigorous model training, agent-driven workflow management, real-world impact challenges, innovative business models, inclusive AI participation, and redefining work through agentic systems.

2 Jun 2026 GitHub's plan for Agents Kyle Daigle, GitHub

Advanced AI integration in developer workflows leverages tools like GitHub Copilot and agentic systems to automate tasks and boost productivity, while addressing challenges like skill bloat, security, open-source trust issues, and the shift to modular AI capabilities in enterprise and collaborative environments.

1 Jun 2026 Why Video Agent models are next Ethan He, xAI Grok Imagine

Advancements in AI research through community-driven knowledge sharing, challenges in scaling video models, technical innovations like vision transformers and diffusion models, and the integration of language models in generative media, alongside hurdles in training efficiency and sustainable development.

More Latent Space episodes