More Latent Space episodes

The Age of Async Agents  Cognition's Walden Yan & OpenInspect's Cole Murray thumbnail

The Age of Async Agents Cognition's Walden Yan & OpenInspect's Cole Murray

Published 28 May 2026

Duration: 01:08:02

The evolution of AI agent development shifts toward autonomous workflows via tools like Devin for code generation and OpenInspect for cloud management, addressing growth, infrastructure challenges, security, scalability, enterprise adoption, open-source initiatives, diverse non-engineering use cases, and the role of human oversight in AI-native coding.

Episode Description

The new AIEWF website is live! CFPs close in 2 days and we will run our first New Engineer Orientation this weekend, get your tickets booked ASAP as t...

Overview

The podcast explores the evolution of AI agent development, emphasizing a shift from manual model management to autonomous agent-driven workflows. Key advancements include tools like Devin (for autonomous code generation) and Open Inspect (for cloud agent management), alongside improved models such as Sonnet 3.7 and GPT 5.2, which enable agents to execute tasks like pull request generation with minimal human input. Growth metrics highlight surges in Devins usage (from 16% to 80% PR contributions) and increased interest in cloud agent tools, alongside challenges in scaling agent infrastructure, such as the limitations of cloud VMs and the move toward custom solutions. Architectural trade-offs between in-box and out-of-box agent systems are discussed, with a preference for separating logic from execution environments to balance security and flexibility.

The discussion also addresses infrastructure and deployment complexities, including the preference for VMs over Docker for agent execution and the need for sandbox strategies to ensure consistent testing environments. Memory systems for agents remain a challenge, with efforts to refine auto-generated memory management and align it with file system-like navigation for better scalability. Enterprise adoption highlights the role of companies like Cognition in onboarding teams, though challenges persist, such as AI literacy gaps and alignment with existing workflows. Open-source projects like OpenDevin and OpenInspect are explored as alternatives to proprietary solutions, while debates around monetization and the gray area of agent systems between infrastructure and service offerings are raised. Finally, the podcast touches on broader challenges, including code quality in AI-generated workflows, the risk of reward hacking, and the evolving role of AI in non-engineering tasks like competitor research and SRE triage.

What If

  • What if you shifted your sole development workflow to cloud-native agent systems like OpenInspect, bypassing local environments entirely?

    • Move: Migrate all development tasks (PR generation, testing, debugging) to OpenInspect or similar platforms, using pre-configured sandboxes with automated snapshot restoration.
    • Why Now?: The surge in Devins PR contributions (80% in March) and reduced reliability of local development environments (IDE weeds) demonstrate a clear trend toward cloud-first workflows.
    • Expected Upside: Faster onboarding, consistent environments across tasks, and reduced dependency on local infrastructure, enabling parallel development of multiple projects.
  • What if you built a hybrid agent architecture that combines out-of-box security with in-box simplicity for solo operations?

    • Move: Use OpenDevin as the "brain" (controller) and containerized environments (e.g., Docker, Firecracker VMs) as the "hands" (sandbox), isolating secrets and state from the central agent.
    • Why Now?: Enterprise adoption challenges (e.g., security concerns with in-box agents) and the rise of custom infra like blockdiff file systems show demand for secure yet scalable agent setups.
    • Expected Upside: Mitigate security risks while maintaining simplicity, enabling safe experimentation with autonomous code generation and testing without compromising on development speed.
  • What if you implemented a file-system-like memory system for your agent to track priorities and recurring tasks in real time?

    • Move: Create a custom "memory.md" file structure for your agent, using markdown files to document project-specific tasks, priorities, and user preferences (e.g., draft vs. open PR status).
    • Why Now?: Devons struggles with auto-generated memory overload (95% from user interactions) and rigid behaviors (e.g., "open as a draft PR") highlight the need for user-editable, structured memory.
    • Expected Upside: Improved task contextualization for agents, allowing them to better align with your priorities and reduce errors from misinterpreted or outdated memory entries.

Takeaway

  • Adopt cloud-native agent tools like Devin for PR automation: Leverage tools like Devin to automate pull request generation from specifications, reducing manual code management by up to 80% (as shown by usage metrics). Focus on integrating these tools into your workflow to minimize repetitive tasks.
  • Use OpenInspect for GitHub code review with manual oversight: Implement OpenInspect to review code and handle alerts, but recognize that it requires manual tagging for actions like resolving merge conflicts. Ensure you manually vet and trigger critical tasks to maintain control.
  • Optimize environment setup with existing dev infrastructure: Avoid redundant setups by reusing existing development environments (e.g., dev boxes) and enforcing scoped secrets per machine. This reduces security risks and streamlines agent sandboxing.
  • Tune agent memory systems to avoid rigid behaviors: Regularly audit and adjust auto-generated memories (e.g., changing "draft PRs" to "open PRs") to prevent agents from developing unintended, inflexible rules based on over-reliance on specific details.
  • Prioritize environment consistency with Docker/VMs: Use Docker for infrastructure alignment with developer workflows but reserve full VMs for agent execution when required (e.g., for OS-specific tasks like iOS development). Pre-snapshot sandboxes can speed up setup and testing.

Recent Episodes of Latent Space

27 May 2026 ESMFold2: The Bitter Lesson is Coming for Proteins - Alex Rives, BioHub

ESMC leverages transformer-based models trained on 6.8 billion protein sequences to predict structures, design functional proteins, and uncover evolutionary patterns through scalable, data-driven approaches, while balancing evolutionary constraints with interpretability and addressing limitations in data diversity and model generalizability.

21 May 2026 Giving Agents Computers Ivan Burazin, Daytona

A company evolved from pre-Docker browser-based IDEs and developer events to modern sandboxing platforms prioritizing AI agent infrastructure, leveraging bare-metal compute for scalability and addressing market demands with open-source strategies, spiky workloads, and future AI Cloud expansion amid GPU shortages.

20 May 2026 Railway: The Agent-Native Cloud Jake Cooper

Railway streamlines app deployment with AI-driven tools, environment cloning, and parallel testing, leveraging kernel patching and custom storage while addressing challenges like compute scarcity and AI agent coordination, alongside critiques of Git/GitHub and traditional software lifecycle practices.

5 May 2026 Doing Vibe Physics Alex Lupsasca, OpenAI

AI is advancing theoretical physics by rapidly solving complex problems like quantum field theory calculations and simulating models such as SYK, though it still relies on human collaboration for original insights and contextual validation, reshaping research methodologies and education.

More Latent Space episodes