More MLOps.community episodes

Speed and Scale: How Today's AI Datacenters Are Operating Through Hypergrowth thumbnail

Speed and Scale: How Today's AI Datacenters Are Operating Through Hypergrowth

Published 3 Feb 2026

Duration: 01:07:16

Experts discuss the challenges of building and maintaining complex AI infrastructure, including power availability, logistics, and component sourcing, and highlight the need for energy-efficient solutions and standardized management tools.

Episode Description

Kris Beevers is the CEO at NetBox Labs, working on turning NetBox into the system of record and automation backbone for modern and AI-driven infrastru...

Overview

The podcast examines the increasingly complex challenges involved in building and maintaining AI infrastructure, particularly as demand for computational power and specialized hardware such as GPUs continues to rise. It points out that while GPU limitations were once a primary concern, current obstacles are more related to power consumption, logistics, and the availability of necessary components. Energy demands have become a critical factor, with the discussion emphasizing the importance of sourcing reliable power, such as hydroelectric energy, and developing energy-efficient solutions to sustain large-scale AI operations.

Additionally, the podcast highlights the intricacies of managing extensive AI data centers, which require meticulous design, thorough documentation, and the seamless integration of both physical and logical infrastructure elements. It mentions the use of tools like Netbox to support a structured and data-driven approach to infrastructure management. However, the industry is still facing hurdles in standardization, automation, and the implementation of digital twin technologies to improve lifecycle management. The rapid pace of technological advancement, combined with a shortage of specialized expertise and limited knowledge sharing, adds further complexity to scaling and maintaining AI infrastructure effectively.

Recent Episodes of MLOps.community

12 May 2026 The Latency Goldilocks Zone Explained

iFood's ILO AI agent leverages a Learning Context Model to deliver hyper-personalized food recommendations by integrating diverse AI techniques, navigating cultural nuances, and balancing familiar and novel choices while addressing multi-channel design, latency, scalability, data alignment, and experimental innovation challenges.

8 May 2026 Building MCP Before MCP Existed: Inside Despegar's Sofia Agent

Sophia, an AI-powered travel concierge using a multi-agent system and decentralized collaboration, aims to streamline bookings, in-trip services, and personalized experiences through AI-driven automation, chat/voice interfaces, and orchestration layers, while expanding capabilities and reducing friction in travel processes.

1 May 2026 Voice Agent Use Cases

Designing voice-based AI systems involves balancing user control with automation, addressing speech quality-latency trade-offs, creating intuitive non-technical interfaces, overcoming transcription and turn-taking challenges in real-world environments, integrating hybrid models and domain-specific tuning, while ensuring compliance, user trust, and ethical considerations in applications like customer support and dynamic environments through feedback loops.

24 Apr 2026 The Creator of Superpowers: Why Real Agentic Engineering Beats Vibe Coding

The text discusses using the Greenfield toolset to convert legacy code into structured specifications and the Superpowers framework to enhance AI agents through psychological persuasion techniques, emphasizing task decomposition, subagent roles, challenges in consistency and security, and future trends in agentic problem-solving and ethical AI development.

21 Apr 2026 It's 2026, and We're Still Talking Evals

Evaluations in AI product development must be integrated early, address real-world complexities, use nuanced metrics beyond accuracy, employ user-centric and iterative testing, leverage post-deployment data, and adapt tailored strategies to balance quality, domain-specific metrics, and system reliability.

More MLOps.community episodes