More Latent Space episodes

Artificial Analysis: The Independent LLM Analysis House  with George Cameron and Micah Hill-Smith thumbnail

Artificial Analysis: The Independent LLM Analysis House with George Cameron and Micah Hill-Smith

Published 9 Jan 2026

Duration: 4694

Artificial Analysis, an independent AI benchmarking platform, provides standardized evaluations and reports to address the lack of comprehensive and impartial AI benchmarks.

Episode Description

dont miss Georges AIE talk: https://www.youtube.com/watch?v=sRpqPgKeXNkFrom launching a side project in a Sydney basement to becoming the independent...

Overview

The podcast outlines the development of Artificial Analysis, an independent benchmarking platform launched in January 2024 to evaluate AI models and hosting providers. Initially a side project, it has evolved into a service offering public and private benchmarking, standardized reports, and custom evaluations for enterprises and AI companies. The platform aims to address the lack of comprehensive and impartial AI benchmarks, focusing on challenges such as model accuracy, cost, and performance in AI development. It operates through a free website, generating revenue from enterprise subscriptions and private evaluations.

The discussion touches on the evolution of benchmarking methodologies, the difficulties in evaluating AI models, and the introduction of new metrics like the Omniscience Index. It also highlights the increasing importance of measuring hallucination rates and model openness. Other topics include trends in AI model costs and performance, the emergence of agentic workflows, and the growing complexity and diversity of the AI ecosystem.

Recent Episodes of Latent Space

22 Jun 2026 Red-Teaming after Mythos Zico Kolter & Matt Fredrikson, Gray Swan

AI security challenges in large language models, such as data leakage and prompt injection, require adversarial testing, red teaming, tools like *Shade* and *Signal*, and structured frameworks to address integration risks, robustness gaps, and enterprise-specific security demands.

3 Jun 2026 Scaling Past Informal AI - Carina Hong, Axiom Math

Formal verification is positioned as a critical tool for advancing AI by ensuring system correctness through mathematical rigor, exemplified by Axiom Math's achievements, tools like Lean, challenges in AI generalization, and the vision of AI as a "superhuman mathematician" through verified reasoning.

3 Jun 2026 Satya Nadella: No Priors x Latent Space Crossover Special at Microsoft Build

Strategic AI development shifts to ecosystem-driven frameworks prioritizing value creation, covering Microsoft's rigorous model training, agent-driven workflow management, real-world impact challenges, innovative business models, inclusive AI participation, and redefining work through agentic systems.

2 Jun 2026 GitHub's plan for Agents Kyle Daigle, GitHub

Advanced AI integration in developer workflows leverages tools like GitHub Copilot and agentic systems to automate tasks and boost productivity, while addressing challenges like skill bloat, security, open-source trust issues, and the shift to modular AI capabilities in enterprise and collaborative environments.

1 Jun 2026 Why Video Agent models are next Ethan He, xAI Grok Imagine

Advancements in AI research through community-driven knowledge sharing, challenges in scaling video models, technical innovations like vision transformers and diffusion models, and the integration of language models in generative media, alongside hurdles in training efficiency and sustainable development.

More Latent Space episodes