Log Management Kubernetes

Centralized logging

When we try troubleshooting a problem, trace a security or performance issue , logs are best source of information. In the context of distributed environment centralized logging will help to accumulate the data and provide benefits like proactively managing your network , reducing the bug analysis time, improving security, reduce the risk of losing the data , providing aggregated performance statistics etc.

Business context

Centralized logging is more relevant in kubernetes ecosystem . While building any development environment around kubernetes, It’s highly recommended to have centralized monitoring of all the pods and nodes present in the the kubernetes ecosystem . As we are aware , Kubernetes is a container management system or in other words an orchestration system which will orchestrate all the pods in the cluster. Having said that, there might be many worker/master nodes running in the cluster with different work loads.In this context, if something goes wrong in the total ecosystem or in any particular worker nodes or a pod, it will challenge the developer to root cause the issue as the logs are scattered across different containers and nodes and there is no centralized way of accessing the logs.

With these problems in mind , we want to look at a solution, that lets the developers obtain the logs in a more simple way with a single entry point and with more visualized information for better analysis. Achieving this, can definitely improve the developer productivity , improves turnaround time in fixing the issues which inurn can help in providing high availability of the system.

How can i achieve centralized logging?

In Today’s distributed computing ,new set of solutions have been designed for high-volume and high-throughput log and event collection. Most of these solutions are event streaming and processing systems and logging is just one use case that can be solved using them. All of these have their specific features and differences but their architectures are almost same.. They generally consist of logging clients and/or agents on each specific host. The agents/Log Shippers forward logs to a cluster of collectors which in turn forward the messages to a scalable storage tier (eg: ElasticSearch ). The idea is that the collection tier is horizontally scalable to grow with the increase number of logging hosts and messages. Similarly, the storage tier is also intended to scale horizontally to grow with increased volume.

Logging Solutions

Today In the cloud native environment to provide better insight into the health and state of the system , there are different ways of accessing , processing and publishing the logs. Scribe, Flume, Logstash, Kafka, Splunk, fluentd are some of the log shippers available in today market. In the context of kubernetes deployments , The most light weight shippers like Logstash , Fluentd and Beats are frequently used for log management. Following are the stacks

ELK ( ElasticSearch and Kibana with Logstash )

EFK ( ElasticSearch and Kibana with Fluentd )

EBK ( ElasticSearch and Kibana with Beats )

Among the above three stacks available, most of the deployments make use of Fluentd or Beats as log shippers along with Elasticsearch and Kibana , reason being these two log shippers are lightweight compared to logstash ( logstash is more resource consumption( default heap size is 1GB ). Through performance is improved in logstash in latest versions , it still slower than its alternatives. In a Typical scenario logstash takes 120MB compared to fluentd 40 MB ).

Following are some other differences between logstash and Fluentd ( Reference :
https://logz.io/blog/fluentd-logstash/ )

When it comes to the choice between Fluentd and Beats, Following are some inputs

Fluentd can be used for shipping the logs ,However some of the downsides of fluentd are its different plugins which are maintained by different individuals across the globe and there is a potential risk of version incompatibilities, at its core fluentd lacks the concept of filters for massaging the data, at the same time fluentd does not provide any ready to go dashboards for visualizing the data. Beats are lightweight compared to fluentd.

On the other end Beats are also used for shipping of logs and are lightweight , extensible and ready to play agents and they also have better visibility into the containers . As they are developed by dedicated elastic community, there will not be any version compatibility issues, That’s not the only reason to choose beats, they also come up in different flavors for different requirements and each flavour has different set of modules to handle different use cases. As a developer we are just supposed to pick up proper beat and configure the required modules.Once proper beats and modules are configured, beats come up with default dashboards which are ready to use in kibana.

Different Beats offered by Elastic :

Filebeat – Used to monitor files, can deal with multiple files in one directory, has module for files in well known formats like nginx, apache https, etc.

Metricbeat – Monitors the resources of our hardware, think about used memory, used cpu, disk space, etc

Heartbeat – Used to check endpoint for availability. Can check times it took to connect and if the remote system is up

Packetbeat – Takes a deep dive into the packets going over the wire. It has a number of protocols to sniff like http, dns and amp.

Most frequently used beats from the the above beats family are Filebeats and Metricbeats

Filebeats – A tiny library with no dependencies. It takes very little resources and have lots of knobs to tweak . Used for monitoring files and can deal with multiple files in one directory,In the latest versions it can send data to kafka and Redis to support heavy loads.

Apart from the above filebeat provides other modules like Icinga , Kafka , Logstash , MySQL ,Nginx ,Osquery ,PostgreSQL ,Redis Traefik for captruing the respective logs.

Metric Beats : It’s also a tiny library with no dependencies and used for getting the metric information and has different modules like Docker , Kubernetes , System, Kafka etc.

Apart from the above mertricbeats provides other metric modules like Aerospike , Apache , Ceph ,Couchbase ,Docker ,Dropwizard ,Elasticsearch Etcd ,Golang ,Graphite ,HAProxy ,HTTP ,Jolokia ,Kafka ,Kibana ,Kubernetes Logstash ,Memcached ,MongoDB ,MySQL ,Nginx ,PHP_FPM ,PostgreSQL ,Prometheus RabbitMQ ,Redis ,System ,uwsgi ,uwsgi ,vSphere ,Windows ,ZooKeeper

Leave a comment