More MLOps.community episodes

Speed and Scale: How Today's AI Datacenters Are Operating Through Hypergrowth thumbnail

Speed and Scale: How Today's AI Datacenters Are Operating Through Hypergrowth

Published 3 Feb 2026

Duration: 01:07:16

Experts discuss the challenges of building and maintaining complex AI infrastructure, including power availability, logistics, and component sourcing, and highlight the need for energy-efficient solutions and standardized management tools.

Episode Description

Kris Beevers is the CEO at NetBox Labs, working on turning NetBox into the system of record and automation backbone for modern and AI-driven infrastru...

Overview

The podcast examines the increasingly complex challenges involved in building and maintaining AI infrastructure, particularly as demand for computational power and specialized hardware such as GPUs continues to rise. It points out that while GPU limitations were once a primary concern, current obstacles are more related to power consumption, logistics, and the availability of necessary components. Energy demands have become a critical factor, with the discussion emphasizing the importance of sourcing reliable power, such as hydroelectric energy, and developing energy-efficient solutions to sustain large-scale AI operations.

Additionally, the podcast highlights the intricacies of managing extensive AI data centers, which require meticulous design, thorough documentation, and the seamless integration of both physical and logical infrastructure elements. It mentions the use of tools like Netbox to support a structured and data-driven approach to infrastructure management. However, the industry is still facing hurdles in standardization, automation, and the implementation of digital twin technologies to improve lifecycle management. The rapid pace of technological advancement, combined with a shortage of specialized expertise and limited knowledge sharing, adds further complexity to scaling and maintaining AI infrastructure effectively.

Recent Episodes of MLOps.community

31 Mar 2026 This One Shift Makes Developers Obsolete

Processing live stream data involves transcription, AI-driven skill categorization, GitHub organization, multimedia-comment correlation, and knowledge graphs, while addressing redundancy, AI costs, and MLOps trends, AI agent debates, adversarial workflows, security risks, and tooling like Open Claw and Agent Zero.

30 Mar 2026 Operationalizing AI Agents: From Experimentation to Production // Databricks Roundtable

Deploying AI agents in real-world systems demands robust safety protocols, human oversight, and structured testing to address risks like errors and vulnerabilities, while balancing innovation with responsibility through observability, governance, domain expertise, and tools like MLflow, across use cases from workflow automation to critical system reliability.

27 Mar 2026 arrowspace: Vector Spaces and Graph Wiring

Epiplexity introduces a framework redefining entropy and complexity with structural information, while topological search and graph-based methods enhance semantic accuracy in machine learning by preserving data through high-dimensional embeddings and hybrid geometric-topological analysis, outperforming traditional approaches in retrieval and reasoning tasks.

20 Mar 2026 Agentic Marketplace

AI-driven agent systems in OLX's classifieds marketplace aim to innovate user experiences by overcoming UI constraints through dynamic intent extraction, hybrid chat/UI models, and trust-building in real estate and motors, with future focus on logistics automation, secure transactions, and human-agent integration.

17 Mar 2026 Durable Execution and Modern Distributed Systems

Temporal enhances developer productivity by enabling crash-proof workflows through deterministic programming models, separating business logic from fault tolerance, and simplifying distributed systems with durable execution, workflows, activities, and persistence layers like Cassandra/Postgres.

More MLOps.community episodes