Artificial Analysis: The Independent LLM Analysis House with George Cameron and Micah Hill-Smith

Published 9 Jan 2026

Duration: 4694

Artificial Analysis, an independent AI benchmarking platform, provides standardized evaluations and reports to address the lack of comprehensive and impartial AI benchmarks.

Episode Description

dont miss Georges AIE talk: https://www.youtube.com/watch?v=sRpqPgKeXNkFrom launching a side project in a Sydney basement to becoming the independent...

Overview

The podcast outlines the development of Artificial Analysis, an independent benchmarking platform launched in January 2024 to evaluate AI models and hosting providers. Initially a side project, it has evolved into a service offering public and private benchmarking, standardized reports, and custom evaluations for enterprises and AI companies. The platform aims to address the lack of comprehensive and impartial AI benchmarks, focusing on challenges such as model accuracy, cost, and performance in AI development. It operates through a free website, generating revenue from enterprise subscriptions and private evaluations.

The discussion touches on the evolution of benchmarking methodologies, the difficulties in evaluating AI models, and the introduction of new metrics like the Omniscience Index. It also highlights the increasing importance of measuring hallucination rates and model openness. Other topics include trends in AI model costs and performance, the emergence of agentic workflows, and the growing complexity and diversity of the AI ecosystem.

Recent Episodes of Latent Space

5 May 2026 Doing Vibe Physics Alex Lupsasca, OpenAI

AI is advancing theoretical physics by rapidly solving complex problems like quantum field theory calculations and simulating models such as SYK, though it still relies on human collaboration for original insights and contextual validation, reshaping research methodologies and education.

27 Apr 2026 Physical AI that Moves the World Qasar Younis & Peter Ludwig, Applied Intuition

Applied Intuition develops safety-critical physical AI for automotive, construction, mining, and defense sectors, selling AI technology to manufacturers and governments through simulation, infrastructure, and proprietary systems to advance industrial innovation with reliable autonomy.

23 Apr 2026 AIE Europe Debrief + Agent Labs Thesis: Unsupervised Learning x Latent Space Crossover Special (2026)

The text discusses AI's evolving landscape, focusing on experimental agents potentially breaking containment by 2026, market disruptions from foundation models, infrastructure advancements like RAG, debates between infrastructure and application firms, outsourcing strategies, pre-2023 training data advantages, competitive coding AI sectors, and future trends in personalization and industry transformation amid scalability and quality challenges.

22 Apr 2026 Shopifys AI Phase Transition: 2026 Usage Explosion, Unlimited Opus-4.6 Token Budget, Tangle, Tangent, SimGym with Mikhail Parakhin, Shopify CTO

Shopify's AI strategies involve in-house tools like Tangled and QMD to automate workflows, collaborate with the AI community, address challenges in token usage and code quality, and explore applications in e-commerce, CI/CD optimization, and scalable AI experimentation.

15 Apr 2026 Notions Token Town: 5 Rebuilds, 100+ Tools, MCP vs CLIs and the Software Factory Future Simon Last & Sarah Sachs of Notion

CLIs and MCPs are emphasized for enterprise efficiency, alongside challenges in early AI integration, custom agent development for automation, strategic AGI management, and balancing automation with oversight, pricing, and collaboration tools like Notion.

More Latent Space episodes