Relational Foundation Models for Enterprise Data with Jure Leskovec

Published 21 May 2026

Show Notes: twimlai.com/podcast/twimlai/relational-foundation-models-enterprise-data

Duration: 01:06:21

Relational foundation models and graph-based machine learning, like GNNs, enable accurate predictions on structured data across biomedical research and industries by capturing complex relationships, integrating multi-scale data, and overcoming traditional limitations through automated feature extraction and hybrid modeling.

Episode Description

In this episode, Jure Leskovec, co-founder and chief scientist at Kumo and professor of computer science at Stanford, joins us to explore two fronts o...

Overview

The text discusses advancements in relational foundation models capable of reasoning over structured relational data, enabling predictive tasks without task-specific training. These models, particularly in their second iteration, show breakthrough potential by analyzing databases directly, such as predicting outcomes from multi-table schemas (e.g., customers, products, transactions) using graph-based attention mechanisms. They are applied to diverse domains, including biomedical research via projects like the AI Virtual Cell initiative, which constructs next-generation foundation models to represent human cells, molecular interactions, and patient-level data, driving discoveries in cancer therapy and molecule design. These models integrate single-cell RNA sequencing and protein language models (e.g., AlphaFold) to build "digital twins" of biological systems, emphasizing data-driven insights over predefined knowledge.

A key focus is graph-based machine learning, where data is represented as nodes and relationships (e.g., users, products, transactions), enabling graph neural networks (GNNs) to learn directly from raw relational data without manual feature engineering. This approach improves accuracy in complex, non-linear scenarios and outperforms traditional linear models, though it remains less effective for simple problems. The text highlights applications in fraud detection, customer behavior prediction, and link prediction, with real-world deployments at companies like DoorDash and Reddit. Challenges include handling noisy or incomplete data, scalability, and the need for hybrid models combining graph-based embeddings with manual features for interpretability. Relational foundation models also demonstrate efficiency in cold start scenarios and are optimized for deployment as SaaS platforms or cloud-based solutions, though limitations persist in certain use cases like multi-tabular relational problems or traditional analytics requiring pattern detection.

The analysis underscores a shift from traditional machine learning, which relies on human-engineered features and labels, to unsupervised/self-supervised learning that adapts to structured datas inherent relationships. This includes initiatives like Kumos platform, which employs in-context learning to process historical data and predict outcomes (e.g., fraud detection, purchase sums) with minimal training. While these models show "superhuman accuracy" in fine-tuned scenarios and achieve significant performance gains over state-of-the-art supervised models, challenges remain in operationalizing predictions for decision-making systems and ensuring compatibility with diverse data schemas. The text also emphasizes the importance of graph structures in unlocking deep learnings potential for relational data, paralleling breakthroughs in computer vision and NLP, and advocates for standardized benchmarks to evaluate multi-table prediction tasks.

What If

What if you applied a relational foundation model to your own database for real-time predictions without training?
Concrete move: Test the model on your existing relational data (e.g., customer transactions, product inventories) using a platform like Kumo.
Why now: Relational foundation models (e.g., Kumo RFM2) are already available and shown to outperform traditional supervised models by 12% in tasks like fraud detection and recommendation systems.
Expected upside: Reduce development time by eliminating the need for manual feature engineering, while gaining actionable insights (e.g., identifying fraud patterns or optimizing inventory) in minutes instead of weeks.
What if you built a hybrid model combining graph-based embeddings with manual business rules for your niche industry?
Concrete move: Integrate Kumos graph neural network embeddings with domain-specific rules (e.g., pricing thresholds, customer segmentation logic) in your application.
Why now: Hybrid models are proven to enhance explainability and compliance (e.g., Reddit saw a double-digit increase in click-through rates by combining automated signals with manual features).
Expected upside: Achieve higher accuracy than pure ML models (e.g., 12% improvement over baselines) while maintaining interpretability for stakeholders and regulatory compliance.
What if you deployed a graph-based fraud detection system using in-context learning on your transaction data?
Concrete move: Use Kumos in-context example generation to train a fraud detection model on your historical transaction data, leveraging graph traversal for forward-looking labels.
Why now: The system is already deployed at scale by companies like Coinbase and DoorDash, showing robustness to sparse or noisy data (e.g., cold start fraud detection with minimal historical labels).
Expected upside: Detect fraud with 512% higher accuracy than traditional models, even with incomplete data, while reducing manual oversight costs by automating flagging and prioritization.

Takeaway

Leverage pre-trained relational foundation models (e.g., Kumo RFM2) for immediate predictions on structured relational data without task-specific training, reducing development time and resource costs.
Integrate graph neural networks (GNNs) to automate feature extraction from multi-tabular data, eliminating manual feature engineering and improving accuracy in complex, non-linear scenarios like fraud detection or customer behavior prediction.
Deploy hybrid models that combine graph-based neural network embeddings (from platforms like Kumo) with domain-specific manual features to enhance interpretability and align with business rules, ensuring compliance and transparency.
Use in-context learning mechanisms with labeled historical data (e.g., fraud cases) to guide predictions on new, unlabeled transactions, improving performance in cold start scenarios with minimal training data.
Adopt SaaS or cloud-based deployment options for relational deep learning systems to maintain data privacy and scalability, enabling rapid integration into existing workflows without infrastructure overhauls.

Recent Episodes of The TWIML AI Podcast

9 Jun 2026 Is RAG Dead? Lessons from Building AI for Tax Law with Alex Bowcut

The podcast examines Retrieval-Augmented Generation's evolving role in AI-driven tax compliance, focusing on Spheres AI's TRAM model, challenges in processing fragmented legal data, and the need for accurate citations, taxonomy integration, and real-time compliance automation via a global tax legislation index.

7 May 2026 How to Find the Agent Failures Your Evals Miss with Scott Clark

Distributional employs post-production analytics, unsupervised learning, and LLMs to analyze agent traces, detect patterns and anti-patterns like hallucinations, address distributional shifts, and generate actionable insights for AI system refinement in security and enterprise settings, emphasizing adaptive analytics and domain expertise.

30 Apr 2026 How to Engineer AI Inference Systems with Philip Kiely

AI inference deployment is accelerating, emphasizing inference engineering's critical role in optimizing generative models with advanced hardware and complex systems, while addressing challenges like latency, scalability, and modality-specific optimizations amid evolving industry trends and fragmented yet open-source-driven markets.

16 Apr 2026 How Capital One Delivers Multi-Agent Systems with Rashmi Shetty

Capital One's *Chat Concierge* multi-agentic AI system streamlines car-buying through self-reflection, real-time APIs, and LLM-driven workflows, addressing enterprise AI challenges like governance, scalability, and legacy system integration while prioritizing compliance, observability, and flexible platform adoption.

26 Mar 2026 The Race to Production-Grade Diffusion LLMs with Stefano Ermon

The text traces generative models' evolution from early image generation to diffusion models' stability, highlights Mercury II's advancements in speed and efficiency, and addresses ongoing challenges in scalability, multimodal integration, and future research in controllability and cross-modal unification.

More The TWIML AI Podcast episodes