RAG That Survives Production

Published 14 Jan 2026

Duration: 1342

RAG, a technique for customizing large language models, is facing challenges due to complex implementation and difficulties with unstructured or poor-quality data, prompting a shift towards more direct approaches like prompting or fine-tuning.

Episode Description

Adam Kamor (@tonicfakedata, Founder & Head of Engineering) talks about building RAG (Retrieval Augmented Generation) systems in production for AI data...

Overview

The podcast explores the development and complexities of RAG (Retrieval-Augmented Generation) as a technique for enhancing the performance of large language models (LLMs) without the need for extensive retraining. Initially viewed as a simpler alternative to fine-tuning, RAG has proven to involve significant challenges, including the need for data cleaning, embedding, and efficient retrieval systems. These complexities can hinder its effectiveness, particularly when dealing with unstructured or poor-quality data. As LLMs have evolved with larger context windows, certain applications no longer necessitate RAG, prompting a shift towards more straightforward methods like prompting or even fine-tuning in some cases.

The episode also emphasizes the importance of evaluating LLM outputs using reliable methods, such as human evaluation and custom scoring tools, to ensure alignment with business objectives. It underscores the critical role of high-quality, relevant data in achieving successful results with these models. Additionally, the discussion highlights the difficulties in sanitizing data, especially in sensitive industries like healthcare, where LLMs may struggle with handling confidential or complex information appropriately. Overall, the podcast provides a detailed look at the practical challenges and considerations involved in implementing and optimizing RAG and other LLM customization techniques.

Recent Episodes of The Cloudcast

25 Mar 2026 Living the Claude-centric Life

AI tools like Claude are rapidly transforming workflows by automating tasks such as emails and drafting, streamlining repetitive work, and enhancing productivity through iterative refinement and human-AI collaboration, while emphasizing strategic alignment with goals and balancing automation with critical oversight.

22 Mar 2026 Three Thoughts from NVIDIA GTC 2026

NVIDIA's strategic dominance in AI hinges on accelerated computing and AI inference growth, balancing proprietary control through CUDA and hybrid hardware with open-source collaboration, while navigating competition, vendor lock-in, and challenges in expanding agentic AI adoption across industries.

18 Mar 2026 Kagenti - A Kubernetes Control Plane for AI Agents

Integration of agentic AI with Kubernetes faces scalability, reliability, and security challenges, addressed by Kagenty's middleware for standardized agent orchestration, identity management, and secure communication via A2A protocols, zero-trust principles, and context-aware policies to balance innovation with enterprise control and accountability.

15 Mar 2026 Code Red

Rapid AI advancement demands integration to avoid productivity gaps, with AI-centric workflows outpacing traditional methods 5x10x, redefining roles through augmentation, stressing cross-functional collaboration, and highlighting early adoption, decentralized agents, and inference optimization as key to driving digital transformation.

15 Mar 2026 Code Red - All Jobs are Software

Rapid AI evolution demands AI-centric workflows and automation to achieve 510x productivity gains by treating AI as a foundational tool, requiring cross-disciplinary collaboration, specialized roles like AgentOps, and urgent adoption to avoid obsolescence.

More The Cloudcast episodes