Know the three-phase approach to Observability

Hardik Shah
7 min readNov 7, 2022

--

Observability is one of the most important aspects of a software project. It involves observing what is going on in the system and making it accessible to stakeholders and developers.

Observability helps you understand how your application works and find errors in it so that they can be fixed before they are released. However, when you add more features to your application, observability can become difficult due to complexity — this is why it’s important to have an observability strategy that works well for your team as well as your product.

Phase 1 [Reactive Phase]: Collect, analyze and store data

Phase 1 of the observability process is where you collect, analyze and store data. The goal of this phase is to gain insight into how your system is performing. It involves collecting metrics about your infrastructure and applications so that you can make changes based on what you see.

This first phase is also important for making sure all stakeholders have access to the same information about the state of their systems. Without a central repository for storing metrics, accessing this data can be complicated because it may be stored in different places across multiple environments (production or development).

At this point in time, having an automation tool like Datadog will be helpful in driving consistency across all of these tools by providing a single place to store metrics from any number of sources such as AWS CloudWatch or Google Stackdriver Metrics.

Phase 2 [Proactive Phase]: Ensure the data is queryable and is actionable

This is where you store your data. This can be in a database, or in a file system, or even in a simple text file. The important thing is that it’s easy to query and process.

In order for the data gathered from your observability tools to be useful, they need to be stored in a format that can easily be queried and processed by various tools. To accomplish this goal, there are four things you should focus on when storing your data:

  • Store it in some sort of database
  • Store the data using standard schema
  • Make sure each column has an understandable name (i.e., don’t just call it “column”)

Phase 3 [Data-Driven Phase]: Enable developers to gain insight into all the data

Connecting the dots between the business and IT is the goal of phase 3. When you’re building an observability solution, think about how you can ensure that your business users and developers have access to all the data they need in order to make informed decisions. That’s why it’s important that you consider these four points when deciding how best to deliver this access:

Make sure it’s accessible and understandable. You want your engineers and analysts working with this information rather than getting lost in it, so make sure that whatever tool or system you introduce has a simple interface that will allow them easy access without getting bogged down by complex charts or confusing text reports.

Also consider giving people an option for viewing multiple sources of information at once. This will help keep things streamlined while still providing a wealth of detail at a glance so anyone can get answers quickly without having to dig through mountains of data first.

This shouldn’t be seen as a one-time effort but a process that requires continuous improvement.

As with any process, it’s important to keep an eye on the results of your observability efforts. You should focus on improving over time, rather than simply meeting a goal once.

For example, if you are new to observability and have just started collecting data from your systems, then one thing you could do is to make sure that your data is actionable. This means that the information being collected provides insights into what needs improving (or doing). It also means that there are clear takeaways from each session so you can see how things have improved over time.

By looking at trends in response times and error rates for example, you can figure out which parts of the system require more attention than others in order for them to run smoothly.

Observability best practices:

Get observability basics right.

As an engineer, your job is to build software that works. But you don’t always know how well your software is working until it breaks. That’s why observability is so important: it helps you understand exactly what’s happening inside your code, so you can fix issues before they affect users or customers.

But observability isn’t just something for engineers; it’s also a tool that product managers and business stakeholders can use to track the performance of their applications as they scale with traffic and usage.

In fact, observability shouldn’t just be a part of engineering culture — it should be part of every team’s culture in order to ensure everyone has access to key insights into application health at any time.

Don’t treat logs, metrics and traces as separate entities.

Logs, metrics and traces are three different ways to collect data. However, they’re often treated as separate entities that don’t need to be analyzed together. In fact, all three offer complementary insights into your application’s behavior.

Logs are a useful way to debug code, but their verbosity makes them difficult to interpret. They log only the most significant events in an app’s lifecycle — the ones that generate errors or exceptions — so it can be hard to see what’s going on unless you know exactly what kind of information you should look for in there.

Metrics are good for measuring system performance; they give you a baseline against which you can analyze changes over time (or between environments). But metrics alone aren’t enough: metrics tell us how fast something is happening; logs give us insight into why it’s happening at all.

Together these two types of data provide more complete information about our applications and allow us a better understanding of how different parts interact with each other at runtime.

Instrument your code.

Instrumentation is one of the most important parts of observability. It’s what allows us to measure how well our application is performing and troubleshoot when things go wrong.

It’s easy to understand why instrumentation is so important: if we can’t see what’s happening inside your code, it’s impossible for us to diagnose issues that occur in production!

Fortunately, instrumenting your code isn’t difficult at all; there are many libraries out there that will allow you do it with minimal effort. Some examples include [Librato] and [Prometheus].

Use distributed tracing.

Distributed tracing is a way to see the flow of requests through your application. It can be used to identify performance bottlenecks, or to see how long requests take to complete.

Distributed tracing is also a great tool for monitoring how your application behaves in production: if you’re experiencing problems with slow response times, distributed tracing can help pinpoint which part of the application is causing the issue and why.

Observe your observability tools.

So, you’re using observability tools. That’s great! But how well are they working for you? Are they telling you what you need to know, or just giving confusing and often contradictory data?

The best way to find out is to actually monitor the tool itself. If it’s not being used properly (or if there are bugs), then that can cause problems too. If the data isn’t correct, then your team will have an inaccurate view of reality when it comes time for decision making.

The observability tools themselves should be monitored by a qualified engineer who understands their limitations as well as their strengths — and knows when one has been breached by another metric or factor that isn’t part of the tool itself but which impacts its output anyway (such as network latency).

Observability is a practice that needs to be followed at every layer of the stack.

Observability is not a specific monitoring tool or tech stack, it’s a practice that needs to be followed at every layer of the stack. It’s about understanding your system from end to end and not just relying on one monitoring tool. The responsibility falls on everyone involved in building and running applications: developers, operators, testers and DevOps engineers.

Observability requires collaboration across teams in order to create an environment where all observations are captured by all layers of the stack, through logs or metrics data.

In conclusion, observability is not only about collecting data but also about making it actionable in a way that supports the business. This is why we’ve seen more tools emerging and evolving over time. The key takeaway here is that you need to be aware of what kind of data you want to collect and how you want to query it before implementing a new solution into production.

--

--

Hardik Shah
Hardik Shah

No responses yet