More MLOps.community episodes

Speed and Scale: How Today's AI Datacenters Are Operating Through Hypergrowth thumbnail

Speed and Scale: How Today's AI Datacenters Are Operating Through Hypergrowth

Published 3 Feb 2026

Duration: 01:07:16

Experts discuss the challenges of building and maintaining complex AI infrastructure, including power availability, logistics, and component sourcing, and highlight the need for energy-efficient solutions and standardized management tools.

Episode Description

Kris Beevers is the CEO at NetBox Labs, working on turning NetBox into the system of record and automation backbone for modern and AI-driven infrastru...

Overview

The podcast examines the increasingly complex challenges involved in building and maintaining AI infrastructure, particularly as demand for computational power and specialized hardware such as GPUs continues to rise. It points out that while GPU limitations were once a primary concern, current obstacles are more related to power consumption, logistics, and the availability of necessary components. Energy demands have become a critical factor, with the discussion emphasizing the importance of sourcing reliable power, such as hydroelectric energy, and developing energy-efficient solutions to sustain large-scale AI operations.

Additionally, the podcast highlights the intricacies of managing extensive AI data centers, which require meticulous design, thorough documentation, and the seamless integration of both physical and logical infrastructure elements. It mentions the use of tools like Netbox to support a structured and data-driven approach to infrastructure management. However, the industry is still facing hurdles in standardization, automation, and the implementation of digital twin technologies to improve lifecycle management. The rapid pace of technological advancement, combined with a shortage of specialized expertise and limited knowledge sharing, adds further complexity to scaling and maintaining AI infrastructure effectively.

Recent Episodes of MLOps.community

19 Jun 2026 Sandboxing, Agent Harnesses, and Agent Teamwork

The text examines "Harness" componentsprompts, tools, and feedback systemsthat balance AI agent autonomy with control through adaptive strategies, human oversight, and iterative testing to improve reliability and alignment with human judgment in dynamic tasks.

16 Jun 2026 MCP Servers Are Becoming the UI for AI Agents

Gateways as proxies for AI via MCP address security, traffic control, and cost management while tackling server development challenges, optimization of tool calls, microservices scaling, protocol tracing limitations, ownership shifts, and the need for unbiased evaluations and agent-driven usability assessments.

12 Jun 2026 MCP, Agents & the $40M Bet on Multiplayer AI

Recommended: Multiplayer Bots as a Action Paradigm

The integration of AI into work practices shifts toward collaborative "multiplayer" systems using flocking-inspired dynamics, addressing challenges like limited AI time horizons, technical tools for shared collaboration, balancing human-AI roles, infrastructure scaling, and the need for adaptive governance and futureproofing.

9 Jun 2026 From Single-Player to Multi-Player: Operating AI Agents at Scale

AI agent infrastructure and governance require control planes for security, compliance, and risk mitigation, addressing operational challenges, productivity gains, and the need for standardized frameworks, modular designs, and transparent collaboration.

5 Jun 2026 The Control-vs-Magic Spectrum Building Agents

iFood Pago leverages AI-driven tools like ChatBank to automate financial services for Brazilian restaurants, balancing automation with personalization while addressing challenges in scaling AI, risk management, and the impact of declining training costs on software accessibility.

More MLOps.community episodes