More Software Engineering Radio episodes

Jure Leskovec on Relational Graph and Foundational Models thumbnail

Jure Leskovec on Relational Graph and Foundational Models

Published 10 Jun 2026

Duration: 01:02:12

Predictive modeling faces challenges with AI's limitations in structured data, prompting solutions like graph databases and relational deep learning with attention mechanisms to enhance accuracy, scalability, and real-time updates for enterprise applications.

Episode Description

Jure Leskovec, Professor of Computer Science at Stanford University and Chief Scientist at Kumo.ai, speaks with host Sriram Panyam about relational an...

Overview

The text explores the state of predictive modeling and its challenges in enterprise settings, emphasizing its role in decision-making across sectors like finance, healthcare, and retail. While predictive modeling relies on structured data (e.g., databases, transaction records) to forecast outcomes, enterprises predominantly use traditional machine learning approaches despite AI advancements. Current AI systems, such as language models, struggle with structured tabular data, producing unreliable results for tasks like fraud detection or customer behavior prediction. Traditional workflows involving manual feature engineering, data normalization, and deployment are resource-intensive, requiring significant time and labor. Additionally, challenges like information leakage, real-time data updates, and adversarial threats in fraud detection further complicate model development and maintenance. Relational databases, though central to enterprise operations, are poorly suited for AI-driven analysis due to their structured, normalized formats, which obscure complex relationships between entities.

To address these limitations, the text proposes graph-based systems as a superior alternative to relational databases and traditional AI. Graph databases can retain contextual relationships in data without flattening tables, streamlining feature engineering and enabling real-time updates. A novel approach, relational deep learning, leverages attention mechanisms tailored to structured data, allowing models to analyze relationships between cells, rows, and tables rather than relying on sequential token attention. This method improves accuracy by capturing temporal and relational patterns (e.g., transaction timing, user-product interactions) while avoiding over-smoothing issues common in graph neural networks. Unlike large language models, relational foundation models (RFMs) focus on structured, domain-specific tasks, achieving higher accuracy in predictive applications like churn risk assessment and sales lead scoring. They also provide calibrated predictions with uncertainty estimates and rich debugging traces to identify data anomalies or model biases.

The text highlights practical applications of these advancements, including in-context learning for dynamic prediction tasks and counterfactual analysis to test hypothetical scenarios. Tools like KumoRFM and frameworks such as PyTorch Geometric are recommended for experimenting with relational foundation models, which are increasingly being integrated into enterprise platforms like Snowflake. The discussion underscores the growing importance of structured relational data understanding, urging data scientists and business units to adopt relational deep learning to address gaps in modern AI. This shift aims to bridge the gap between traditional predictive modeling and emerging AI techniques, offering scalable solutions for complex, high-stakes predictive tasks.

What If

  • What if you adopt a graph-based architecture to replace manual feature engineering?

    • Move: Use a graph database (e.g., Kuma) to structure data as interconnected nodes and relationships instead of flattening relational tables. Deploy attention mechanisms over rows, columns, and cross-table relationships for predictive modeling.
    • Why Now?: Traditional feature engineering is time-consuming and error-prone, while graph-based systems reduce manual effort and retain contextual data by preserving entity relationships (e.g., user-product-transaction links). Modern GPUs make this scalable for real-time updates.
    • Expected Upside: Up to 1020% faster model accuracy compared to traditional pipelines, with reduced development time and fewer data management bottlenecks.
  • What if you implement a relational foundation model for in-context predictive tasks?

    • Move: Train or use a pre-built relational foundation model (e.g., KumoRFM) to answer ad-hoc predictive questions (e.g., "Predict churn as no purchase in 30 days") without task-specific retraining. Leverage attention-based explanations for debugging.
    • Why Now?: Enterprises are stuck with rigid pipelines for high-stakes tasks (e.g., fraud detection), but foundation models offer 20-30% higher accuracy in domain-specific predictions. Public SDKs (e.g., relationalfoundationmodel.ai) are accessible now.
    • Expected Upside: Eliminate the need for 2 full-time employees per model by automating prediction, explanation, and debugging workflows. Enable rapid A/B testing of hypotheses (e.g., "What if we offer a 10% discount?").
  • What if you replace ETL pipelines with GPU-optimized graph processing?

    • Move: Skip ETL and use a GPU-accelerated graph database (e.g., Kuma) to process raw structured data for predictive modeling. Focus on local graph neighborhoods during inference to avoid over-smoothing.
    • Why Now?: Current ETL pipelines are CPU-bound and slow, while GPU-optimized systems reduce training time and improve accuracy (e.g., 1020% better results). Graph databases like Kuma are tailored for AI workloads.
    • Expected Upside: 30% faster data-to-insight cycles, with lower operational costs and fewer errors from manual data cleaning. Enable real-time fraud detection or dynamic pricing models without complex feature stores.

Takeaway

  • Adopt graph databases like Kuma or integrate with existing systems to streamline feature engineering by avoiding manual data flattening and maintaining contextual relationships in tabular data (e.g., user-product-transaction linkages), reducing reliance on time-consuming ETL pipelines.
  • Leverage relational foundation models (RFMs) such as KumoRFM for structured predictive tasks (e.g., sales lead scoring, churn prediction) to achieve 2030% higher accuracy compared to traditional LLMs or manual models, using open-source tools like PyTorch Geometric or Snowflake integrations.
  • Implement GPU-based models to process raw structured data directly, skipping ETL workflows and enabling faster training (up to 1020x speed improvements) while retaining relational context, as outlined in the efficiency gains section.
  • Use attention mechanisms tailored for structured data (row/column/cross-table attention) via frameworks like PyTorch Geometric to capture complex dependencies in relational data, avoiding over-smoothing issues and improving accuracy in sparse/noisy environments.
  • Automate real-time feature updates by designing models that dynamically adapt to new data (e.g., transactions) without requiring manual recalibration, addressing challenges in fraud detection and customer behavior prediction as described in the real-time data section.

Recent Episodes of Software Engineering Radio

3 Jun 2026 Dave Airlie on Linux Kernel Maintenance

The Linux kernel, the largest global software project, uses a hierarchical maintainer system with 80,150 contributors managing subsystems like DRM through public review, structured development cycles, and evolving practices to address scalability, quality, and integration challenges.

27 May 2026 Dwayne McDaniel on the Engineering Challenges of Secrets Management

Managing secrets like credentials and API keys in software development risks leaks causing supply chain attacks (e.g., PyPy, Clot, Cisco) due to secrets sprawl, plaintext storage, and misuse, prompting solutions like time-bound credentials, decentralized systems, vault tools (e.g., HashiCorp Vault), and strategies such as credential rotation and encrypted storage amid over 28.65 million hard-coded secrets in GitHub in 2025.

20 May 2026 Rob Moffat on Risk-First Software Development

Recommended: Risk identification and management is a forgotten art

Software development prioritizes risk management through frameworks like test-driven development and agile, addressing hidden risks, AI deployment challenges, open-source dependencies, and organizational prioritization to balance innovation with safeguards.

13 May 2026 SE Radio 720: Martin Dilger on Understanding Eventsourcing

Recommended: Useful Architectural Pattern.

Event sourcing is a system design approach that records changes as sequential events to ensure historical traceability, uses event modeling for aligning systems with human workflows, contrasts with CRUD architectures, and emphasizes slice-based design, event streams, and practical applications like legacy modernization and workflow simplification.

6 May 2026 Birol Yildiz on Building an Agentic AI SRE

AI agents in SRE leverage autonomous decision-making, agentic search, and lightweight architectures to replace static runbooks, balancing autonomy with reliability challenges, context management, and human oversight in dynamic environments.

More Software Engineering Radio episodes