A Quick Introduction to Top Metrics & Tools to Track the Kubernetes Observability

Hardik Shah
6 min readSep 15, 2022

--

Breaking systems into smaller components or services and shifting these services to containers is one of the crucial steps of embracing the cloud native approach. But, when you deal with more containers, the container orchestration platform becomes essential. Kubernetes, or K8s, is one such platform.

Kubernetes is an open-source container management platform. It simplifies the management of your containerized cloud-native ecosystem. However, with many components that need to be monitored continuously, it can also become complex in no time.

Furthermore, Kubernetes enables users to automate containers’ deployment, management, and scaling. It makes it possible to work with hundreds or thousands of containers and ensure reliability and resilient services.

Good Read: What is Observability? A Comprehensive Guide

What are Kubernetes Metrics?

Kubernetes metrics help ensure all pods in a deployment run well. It is all about providing information, for example, total number of instances in a pod and how many were expected. If the number is too low, then the cluster may run out of resources.

Here are a few of Kubernetes metrics that you should keep an eye on:

  • Missing and failed pods
  • Pod restarts
  • Available & unavailable pods

Why do we need to monitor Kubernetes?

Kubernetes monitoring is pivotal since it can help users to gain insights into the state of workloads. In addition, the insights which are obtained from monitoring metrics can help to discover issues. For example, rogue pod connections and data loss attackers are some major Kubernetes threats you should monitor.

Now, let us walk you through the world of Kubernetes metrics.

Top Metrics to monitor in Kubernetes

1. Cluster metrics

When it comes to monitoring, it is essential to have complete visibility into the state of the Kubernetes cluster. In simple words, the monitoring solutions should provide relevant information about cluster performance.

If you want to gain complete visibility, you should take care of the following factors:

  • Network input/output (I/O) pressure
  • Number of running containers, nodes, and pods
  • Node resource usage metrics: CPU, memory, network bandwidth, and disk usage.

2. Pod metrics

Resource allocation is crucial when it comes to pod-monitoring. It is especially complicated to ensure that pods and containers run in optimal conditions without disturbing the app’s performance. If you want to track the whole process, ensure that the pods are not under or over-provisioned.

Check out a few of essential metrics, which can be monitored:

1. Kubernetes metrics: It gives you information about the types and numbers of resources in the pod. This metric is useful to avoid conditions when running out of system resources with the resource limit tracking feature. Furthermore, these metrics give the assurance of continued stability in the state of pods, which are running on Kubernetes.

2. Container metrics is all about resource utilization at the container level. CPU, memory, and network usage are the common examples of container metrics.

3. Kubernetes Node Metrics

Image Source

A Kuberenetes node is a collection of IT resources. It supports one or more containers. A node contains the services to run Pods (Kubernetes’s units of containers), communicate with components, configure networking, and run workloads. A node is capable of hosting one or multiple Pods.

Below are main metrics that you need to check:

1. Node resource usage: It includes disk and memory utilization, CPU, network bandwidth, etc. It enables users to decide if they want to increase or decrease the number & size of each node in the cluster. If you want to check the performance of clusters, you need to keep an eye on memory and disk usage at the node level. Suppose, if the pods exceed their limits, they will be terminated.

On the other hand, if a node runs low on available memory and disk space, then it begins to reclaim resources.

2. Number of nodes: Its availability of nodes reflects what a cluster is used for and what you are paying for if you are using cloud providers.

3. Total number of running pods per node: It shows the size of the available nodes is enough or not. Plus, it could handle the pod workload if a node fails to do that.

4. Memory & CPU Requests: It defines the minimum and maximum resources, which a node can allocate to containers. The allocated memory is the amount of memory on a node, which is available for pods.

Image Source

Image Source

Other metrics included in this category are: :

1. Disk-space usage

2. Node-network traffic (receive and transmit)

There are several node conditions that describe the status of running nodes, for example:

  • MemoryPressure
  • Ready
  • DiskPressure
  • OutOfDisk
  • NetworkUnavialble

4. Application Metrics

It measures the performance and availability of apps running inside Kubernetes pods. It would be best if you took benefits through RED metrics, i.e., Request Rate, Error Rate, and Duration. It is mainly beneficial for creating dashboards for real-time monitoring.

  • Request Rate = Number of requests the service is handling per second.
  • Error Rate = Number of failed requests/ second.
  • Duration = Amount of time each request takes.

Check out some recommended tools for monitoring.

Jaeger

An open-source distributed tracing system, Jaeger is developed by Uber. According to Jaeger’s documentation, this monitoring tool is designed to monitor and troubleshoot the distributed microservices. It mainly focuses on:

  • Root cause analysis
  • Service dependency analysis
  • Performance/latency optimization
  • Distributed context propagation
  • Distributed transaction monitoring

It provides client libraries for most programming languages, such as Go, Node.js, Python, and Java.

Image Source

Prometheus

An open-source monitoring system, Prometheus is a functional query language i.e. PromQL (Prometheus Query Language). It enables users to select the data to be aggregated. After that, it displays the results in tabular form in terms of graphs or data. It is best-suited for operational monitoring.

Image Source

Grafana

An open-source data visualization and analytics tool, Grafana can monitor time-series data. It enables users to query multiple data stores, visualize, send alerts, and acknowledge the metrics. Moreover, it also has native Prometheus support and supports a wide range of databases, such as Elasticsearch, AWS CloudWatch, Graphite, InfluxDB, etc. It also embraces features, such as several built-in reusable dashboards to bring data together and share it.

Image Source

The Grafana dashboard provides its users with more profound insights into the health and performance of the Kubernetes cluster and its apps. It embraces features, such as fetching data from the data sources.

Grafana comes with a number of official dashboards, which you can check here. Also, don’t miss out to check some useful plugins that can help users to enhance data visualization.

Kubernetes comes with some observability challenges. Therefore, it is important to consider the above-mentioned top metrics and monitoring tools while creating the monitoring strategy for Kubernetes-based production workloads.

If you want to share your thoughts with us, you can comment below. We’d like to hear from you.

--

--

Hardik Shah
Hardik Shah

No responses yet