More Software Engineering Radio episodes

Eric Tschetter on Decoupling Observability thumbnail

Eric Tschetter on Decoupling Observability

Published 23 Apr 2026

Recommended: Telemetry is important, avoiding vendor lockin is even more important.

Duration: 01:00:13

Observability in microservices emphasizes decoupled architectures over traditional frameworks to address vendor lock-in, data interoperability, and scalability challenges, while balancing unstructured telemetry management, query language standardization, and cross-team collaboration.

Episode Description

In this episode, host Amey Ambade sits with Eric Tschetter, co-founder of Apache Druid and Chief Architect at Imply, to dissect the critical move towa...

Overview

The interview explores the concept of observability, emphasizing its role in monitoring and diagnosing complex systems, particularly in microservices architectures. Key components include logs, metrics, traces, and alerting infrastructure, with emerging technologies like AI influencing modern practices. The discussion highlights the challenges of managing distributed systems, where tight coupling of observability tools (e.g., Splunk, ELK Stack) often leads to vendor lock-in and fragmented infrastructure across teams. In contrast, decoupled systems, inspired by the evolution of business intelligence (BI), separate data collection, storage, and visualization, promoting flexibility and interoperability. However, observability tools remain largely vertically integrated, constrained by specialized query languages and the lack of a unified schema for unstructured data like logs.

The text underscores organizational and technical hurdles in scaling observability, such as conflicting tool adoption by different teams, data silos from vendor ecosystems, and the complexity of consolidating disparate systems. It draws parallels between observability and BIs transition from proprietary systems to decoupled layers, suggesting that open standards (e.g., OpenTelemetry) and query language interoperability could enhance flexibility. Challenges include maintaining trace consistency across systems, managing data portability, and balancing cost-efficient storage (e.g., cloud object stores) with performance needs for real-time monitoring versus historical investigations. The discussion also touches on governance in shared data environments and the trade-offs between sampling data for cost savings and retaining full records for compliance.

Finally, the interview emphasizes the need for a decoupled observability architecture to reduce vendor lock-in, enable cross-team collaboration, and support future-proof infrastructure. This approach requires incremental migration strategies, prioritizing centralized data access without disrupting existing workflows. It also highlights the importance of caching, indexing strategies, and query language standardization to address latency and scalability issues. While decoupling offers potential solutions to fragmentation and complexity, it necessitates careful governance, technical expertise, and a shift away from proprietary, tightly integrated systems toward modular, interoperable frameworks.

Final Notes

This extensive text provides valuable insights into observability, its challenges, and the benefits of decoupling observability stacks. Here are some key takeaways and their relevance:

  1. Definition of Observability: The text clarifies the concept of observability, emphasizing the importance of monitoring a system's internal operations to diagnose issues and take corrective actions. This concept is relevant to organizations that rely on complex software systems and need to ensure their reliability and performance.

  2. Microservices and Observability: The discussion highlights the challenges of observability in microservices architectures, where distributed components introduce complexity. This is a crucial concept for organizations that have adopted microservices or are considering it as a development approach.

  3. Tightly Coupled vs. Decoupled Observability Frameworks: The text explains the trade-offs between tightly coupled and decoupled observability frameworks, highlighting the benefits of decoupling for flexibility, scalability, and vendor lock-in. This concept is essential for organizations that seek to avoid vendor lock-in and ensure long-term flexibility in their observability infrastructure.

  4. Examples of Observability Tools: The discussion provides an overview of various observability tools, including Splunk, ELK Stack, Grafana, and others. This information is relevant for organizations that need to select the most suitable observability tools for their needs.

  5. Challenges in Modern Observability: The text identifies the challenges of modern observability, including the need to balance complexity with actionable insights in distributed systems. This concept is critical for organizations that need to navigate the complexities of modern software systems.

  6. Organization Challenges with Scaling: The discussion highlights the organizational challenges of scaling, including the need to manage conflicting observability tools, vendor lock-in, and data storage and processing practices. This concept is essential for organizations that need to ensure scalability and flexibility in their observability infrastructure.

  7. Path Toward Decoupling in Observability: The text suggests that moving toward decoupled observability stacks could address fragmentation, vendor lock-in, and scalability issues. This concept is relevant for organizations that seek to avoid vendor lock-in and ensure long-term flexibility in their observability infrastructure.

  8. Layered Architecture in Data Systems: The discussion explains the benefits of a layered architecture in data systems, including the decoupling of data ingestion, storage, query/compute, and visualization layers. This concept is essential for organizations that need to ensure the scalability and flexibility of their data systems.

  9. Observability and Security as Analogous to BI: The text highlights the similarities between observability, security, and BI, emphasizing the need for standardization and query language evolution in observability and security. This concept is critical for organizations that need to ensure effective observability and security practices.

  10. Query Language Flexibility: The discussion emphasizes the importance of query language flexibility in observability and security, highlighting the need for a unified query language to interface with data layers and visualization tools. This concept is essential for organizations that need to ensure effective observability and security practices.

The text provides valuable insights into the challenges and benefits of observability, decoupling, and layered architecture in data systems. It offers practical advice for organizations that need to navigate the complexities of modern software systems and ensure effective observability and security practices.

Actionable Recommendations:

  1. Assess Observability Needs: Evaluate current observability practices and tools to identify areas for improvement and potential decoupling opportunities.
  2. Implement Decoupling: Gradually adopt decoupling in observability, starting with specific datasets and expanding to unlock or access more data.
  3. Standardize Query Languages: Prioritize query language standardization to ensure flexibility, interoperability, and efficient querying across multiple systems.
  4. Develop Layered Architecture: Ensure a layered architecture in data systems, decoupling data ingestion, storage, query/compute, and visualization layers for scalability and flexibility.
  5. Monitor and Evaluate: Continuously monitor and evaluate observability practices, adjusting to address emerging challenges and ensure long-term flexibility in observability infrastructure.

Recent Episodes of Software Engineering Radio

15 Apr 2026 Martin Kleppmann Local-First Software

Local First Software combines local data storage with cloud collaboration to enable offline access, real-time editing, and seamless syncing via AutoMerge and CRDTs, prioritizing user control, privacy, and decentralized workflows with future focus on open standards and AI integration.

8 Apr 2026 Sahaj Garg on Designing for Ambiguity in Human Input

Ambiguity in language and speech, arising from context, phrasing, and incomplete information, poses challenges for AI systems due to their limited context processing, while humans resolve it through contextual cues, tone, and prior knowledge, with strategies focusing on contextual prompts, audio training, data augmentation, and balancing AI efficiency with human-like adaptability in multilingual and ethical contexts.

1 Apr 2026 Costa Alexoglou on Remote Pair Programming

A discussion on pair programming's collaborative advantages, remote pairing challenges, AI's role in coding, the development of HAWP, and future remote work tools, highlighted by a five-month platform refactor case study and lessons in balancing performance, security, and user needs.

25 Mar 2026 Hector Ramon Jimenez on Building a GUI library in Rust

ICE is a Rust-based UI toolkit inspired by Elm's architecture, using message-passing to separate state, updates, and views, evolved from a game library module into a functional-focused standalone tool with Winit/WGPU rendering, cross-platform goals, and challenges in dependency stability, while emphasizing state-driven design, community development, and future improvements in rendering efficiency, accessibility, and multi-platform support.

18 Mar 2026 Dan Lorenc on Sigstore

Software supply chain attacks exploit vulnerabilities in development tools and open-source components, exemplified by the Shyhalood NPM breach, with SIGStore proposed as a cryptographic solution to verify software integrity, though challenges like enforcement and privacy persist in securing open-source ecosystems.

More Software Engineering Radio episodes