More Software Engineering Radio episodes

Eric Tschetter on Decoupling Observability thumbnail

Eric Tschetter on Decoupling Observability

Published 23 Apr 2026

Recommended: Telemetry is important, avoiding vendor lockin is even more important.

Duration: 01:00:13

Observability in microservices emphasizes decoupled architectures over traditional frameworks to address vendor lock-in, data interoperability, and scalability challenges, while balancing unstructured telemetry management, query language standardization, and cross-team collaboration.

Episode Description

In this episode, host Amey Ambade sits with Eric Tschetter, co-founder of Apache Druid and Chief Architect at Imply, to dissect the critical move towa...

Overview

The interview explores the concept of observability, emphasizing its role in monitoring and diagnosing complex systems, particularly in microservices architectures. Key components include logs, metrics, traces, and alerting infrastructure, with emerging technologies like AI influencing modern practices. The discussion highlights the challenges of managing distributed systems, where tight coupling of observability tools (e.g., Splunk, ELK Stack) often leads to vendor lock-in and fragmented infrastructure across teams. In contrast, decoupled systems, inspired by the evolution of business intelligence (BI), separate data collection, storage, and visualization, promoting flexibility and interoperability. However, observability tools remain largely vertically integrated, constrained by specialized query languages and the lack of a unified schema for unstructured data like logs.

The text underscores organizational and technical hurdles in scaling observability, such as conflicting tool adoption by different teams, data silos from vendor ecosystems, and the complexity of consolidating disparate systems. It draws parallels between observability and BIs transition from proprietary systems to decoupled layers, suggesting that open standards (e.g., OpenTelemetry) and query language interoperability could enhance flexibility. Challenges include maintaining trace consistency across systems, managing data portability, and balancing cost-efficient storage (e.g., cloud object stores) with performance needs for real-time monitoring versus historical investigations. The discussion also touches on governance in shared data environments and the trade-offs between sampling data for cost savings and retaining full records for compliance.

Finally, the interview emphasizes the need for a decoupled observability architecture to reduce vendor lock-in, enable cross-team collaboration, and support future-proof infrastructure. This approach requires incremental migration strategies, prioritizing centralized data access without disrupting existing workflows. It also highlights the importance of caching, indexing strategies, and query language standardization to address latency and scalability issues. While decoupling offers potential solutions to fragmentation and complexity, it necessitates careful governance, technical expertise, and a shift away from proprietary, tightly integrated systems toward modular, interoperable frameworks.

Final Notes

This extensive text provides valuable insights into observability, its challenges, and the benefits of decoupling observability stacks. Here are some key takeaways and their relevance:

  1. Definition of Observability: The text clarifies the concept of observability, emphasizing the importance of monitoring a system's internal operations to diagnose issues and take corrective actions. This concept is relevant to organizations that rely on complex software systems and need to ensure their reliability and performance.

  2. Microservices and Observability: The discussion highlights the challenges of observability in microservices architectures, where distributed components introduce complexity. This is a crucial concept for organizations that have adopted microservices or are considering it as a development approach.

  3. Tightly Coupled vs. Decoupled Observability Frameworks: The text explains the trade-offs between tightly coupled and decoupled observability frameworks, highlighting the benefits of decoupling for flexibility, scalability, and vendor lock-in. This concept is essential for organizations that seek to avoid vendor lock-in and ensure long-term flexibility in their observability infrastructure.

  4. Examples of Observability Tools: The discussion provides an overview of various observability tools, including Splunk, ELK Stack, Grafana, and others. This information is relevant for organizations that need to select the most suitable observability tools for their needs.

  5. Challenges in Modern Observability: The text identifies the challenges of modern observability, including the need to balance complexity with actionable insights in distributed systems. This concept is critical for organizations that need to navigate the complexities of modern software systems.

  6. Organization Challenges with Scaling: The discussion highlights the organizational challenges of scaling, including the need to manage conflicting observability tools, vendor lock-in, and data storage and processing practices. This concept is essential for organizations that need to ensure scalability and flexibility in their observability infrastructure.

  7. Path Toward Decoupling in Observability: The text suggests that moving toward decoupled observability stacks could address fragmentation, vendor lock-in, and scalability issues. This concept is relevant for organizations that seek to avoid vendor lock-in and ensure long-term flexibility in their observability infrastructure.

  8. Layered Architecture in Data Systems: The discussion explains the benefits of a layered architecture in data systems, including the decoupling of data ingestion, storage, query/compute, and visualization layers. This concept is essential for organizations that need to ensure the scalability and flexibility of their data systems.

  9. Observability and Security as Analogous to BI: The text highlights the similarities between observability, security, and BI, emphasizing the need for standardization and query language evolution in observability and security. This concept is critical for organizations that need to ensure effective observability and security practices.

  10. Query Language Flexibility: The discussion emphasizes the importance of query language flexibility in observability and security, highlighting the need for a unified query language to interface with data layers and visualization tools. This concept is essential for organizations that need to ensure effective observability and security practices.

The text provides valuable insights into the challenges and benefits of observability, decoupling, and layered architecture in data systems. It offers practical advice for organizations that need to navigate the complexities of modern software systems and ensure effective observability and security practices.

Actionable Recommendations:

  1. Assess Observability Needs: Evaluate current observability practices and tools to identify areas for improvement and potential decoupling opportunities.
  2. Implement Decoupling: Gradually adopt decoupling in observability, starting with specific datasets and expanding to unlock or access more data.
  3. Standardize Query Languages: Prioritize query language standardization to ensure flexibility, interoperability, and efficient querying across multiple systems.
  4. Develop Layered Architecture: Ensure a layered architecture in data systems, decoupling data ingestion, storage, query/compute, and visualization layers for scalability and flexibility.
  5. Monitor and Evaluate: Continuously monitor and evaluate observability practices, adjusting to address emerging challenges and ensure long-term flexibility in observability infrastructure.

Recent Episodes of Software Engineering Radio

3 Jun 2026 Dave Airlie on Linux Kernel Maintenance

The Linux kernel, the largest global software project, uses a hierarchical maintainer system with 80,150 contributors managing subsystems like DRM through public review, structured development cycles, and evolving practices to address scalability, quality, and integration challenges.

27 May 2026 Dwayne McDaniel on the Engineering Challenges of Secrets Management

Managing secrets like credentials and API keys in software development risks leaks causing supply chain attacks (e.g., PyPy, Clot, Cisco) due to secrets sprawl, plaintext storage, and misuse, prompting solutions like time-bound credentials, decentralized systems, vault tools (e.g., HashiCorp Vault), and strategies such as credential rotation and encrypted storage amid over 28.65 million hard-coded secrets in GitHub in 2025.

20 May 2026 Rob Moffat on Risk-First Software Development

Recommended: Risk identification and management is a forgotten art

Software development prioritizes risk management through frameworks like test-driven development and agile, addressing hidden risks, AI deployment challenges, open-source dependencies, and organizational prioritization to balance innovation with safeguards.

13 May 2026 SE Radio 720: Martin Dilger on Understanding Eventsourcing

Recommended: Useful Architectural Pattern.

Event sourcing is a system design approach that records changes as sequential events to ensure historical traceability, uses event modeling for aligning systems with human workflows, contrasts with CRUD architectures, and emphasizes slice-based design, event streams, and practical applications like legacy modernization and workflow simplification.

6 May 2026 Birol Yildiz on Building an Agentic AI SRE

AI agents in SRE leverage autonomous decision-making, agentic search, and lightweight architectures to replace static runbooks, balancing autonomy with reliability challenges, context management, and human oversight in dynamic environments.

More Software Engineering Radio episodes