top of page
Search

Observability - suggested revisions to HoEN

  • Writer: Michelle Casey
    Michelle Casey
  • Jun 12
  • 4 min read

Updated: 6 days ago

Channel markers and a lighthouse seen from a boat at night.
Lights in the night or signals used to navigate?

The challenge with models

Do you ever debate things internally with your colleagues? Discuss, agree and disagree how an idea should be represented or how a shared cognitive artefact represents a complex subject matter? 


Our Hierarchy of Engineering Needs (HoEN) is a model designed to help organisations assess and address constraints across various aspects of engineering. HoEN offers a powerful way to see the engineering system and ensure stable foundations are in place before chasing higher order practices and capabilities. 


All of that being said, in the wise words of George Box, ‘all models are wrong, but some are useful’. Models are not perfect representations as they are simplifications of reality, however they are tools for understanding and providing value and insights rather than absolute accuracy. 


Observability, like security and quality, is a practice that is present at all levels of the HoEN, and in the model we call out specific needs on which it depends at each level. In this blog we offer an alternative and more detailed view of observability to that presented in our current version of the HoEN.


Observability is the ability to measure and interpret the state of a system based on the data it generates.

What is observability?

There are a variety of definitions, but we’re going to go with this one. Observability is the ability to measure and interpret the state of a system based on the data it generates. It is not enough to just have observability data in the form of logs, metrics and traces, you must also be able to make sense of this data.


Knowing everything that should be done to effectively observe complex distributed systems in production is not easy, and neither is knowing what steps to take to make forward progress. Common observability anti-patterns include focusing on technology choices over quality of instrumentation, observability as an afterthought, and failing to mature practices and behaviours around observability in line with advancing your monitoring and instrumentation. 


The following revisions to the HoEN reflect a considered approach  for maturing your observability, ensuring foundational practices and capabilities are established before progressing to more advanced methods. 


HoEN Model Revisions

Five suggested edits to the HoEN model to better represent Observibility
Five proposed Observibility changes to HoEN v7

📝 Basic Needs - Logging and Monitoring (Update)

We have high level awareness of production health and adequate tooling and instrumentation to investigate issues. 

  • Infrastructure and APM monitoring is in place and transaction tracing is available through an APM tool.

  • Structured logs are implemented for all components according to a unified standard.

  • Dashboards are in place for core signals such as the Google SRE Golden Signals - latency, errors, traffic and saturation. 

  • Basic alerting is in place for latency and error rate metrics.


Suggested update to Observibility


📝 Managed Work - Proactive Alerting (Update)

Alerting is trusted to inform us of abnormal system behaviour and distributed tracing and correlation enable effective troubleshooting and debugging.

  • Critical components of the system are monitored and alerted on where appropriate.

  • Browser synthetic tests are set up for core customer journeys.

  • Distributed tracing is enabled for all components.

  • There is correlation across signals for faster debugging.


Suggested update to Alerting


📝 Effective Ownership - User Centric Observability (Update)

We have a clear view of product health from the user or customer perspective, user actions can be traced end to end through the system and there is good hygiene around errors and alerts.

  • User journeys are defined and prioritised.

  • SLOs are set up for critical user journeys and are reviewed at regular intervals

  • SLO breaches are alerted on and there are established practices for SLO breaches.

  • Dashboards include specific metrics that show user centric product health.

  • Distributed tracing is enabled for services outside of your control, traces are sampled appropriately.

  • Noisy errors and alerts are actively managed and reduced.


Suggested Update to SLIs, SLOs


Sustainability - Proactive Quality of Service (New)

Observability informs release planning and roadmap prioritisation, we are proactively aware when the system state does not align to typical, and observability is continuously improved. 

  • Increased SLO error budget consumption is alerted on prior to SLO breach.

  • SLO breaches inform release planning and lead to roadmap changes to prioritise operational work over feature delivery.

  • Instrumentation and monitoring enables us to be aware of current system state compared to historic norms.

  • Observability is constantly iterated on and improved.



♻️ Flow - Integrated Observability and Resilience (Replace)

Observability and Resilience Engineering are integrated into the engineering system and broadly inform prioritisation, systematic improvements, innovation and customer needs.

  • SLOs are a key consideration for system design.

  • Product and engineering are equally invested in SLOs.

  • Observability data can be queried and interrogated to answer any question.

  • The socio-technical system has the capacity to adapt to emergent and surprising incidents.

  • Resilience helps us understand what forms of robustness to implement, which in term contributes to improving our reliability.


Suggested replacement to Chaos/Game Days


References


My thinking on observability has been heavily influenced by the following materials and their authors. 



Site Reliability Engineering (Google SRE book) edited by Betsy Beyer, Chris Jones, Jennifer Petoff and Niall Murphy


Observability Engineering: Achieving Production Excellence by Charity Majors, Liz Fong-Jones and George Miranda



 
 
 

Comments


SUBSCRIBE

Sign up to receive

Wires Uncrossed Engineering news and updates.

Thanks for submitting!

©2024 by Wires Uncrossed Engineering

Website Design By Solute Digital

  • LinkedIn
bottom of page