Monday, March 10, 2014

Data Collection @Heka Way

Tool for high performance data gathering, analysis, monitoring, and reporting.
Heka’s main component is hekad, a lightweight daemon program that can run on nearly any host machine which does the following: Gathers data through reading and parsing log files, monitoring server health, and/or accepting client network connections using any of a wide variety of protocols (syslog, statsd, http, heka, etc.).
Heka is written in Go, Pretty fast.
Inspired by log stash
apps/logs/sytems ==> Heka==> ES/Logfile
Input: Read in

Decoder: Decode info into a message instance

Filters: Parse, process,

Output: send out data, write out to logs, forward to heka.

Plugin: Go or Lua
Checking the logs by tail -f /var/log/* is very time consuming task specially when you have to troubleshoot a tons of logs. Better make a graph from that data and do analysis on top of that.
Heka is one of the latest tool that can help in logs collection and analysis. But you need some other tool as well to get some meaningful data. Heka alone is not enough.It is just a data collection tool.
You can use: Elastic Search, Kibana, Heka to build a complete logs analysis tool.
Install Heka on:
OS: Ubuntu 12.04
Install the compatible binary:
root@vmhost18:~# wget -c

root@vmhost18:~# dpkg -i heka_0.5.0_amd64.deb
Create a hekad.toml configuration file in /etc/
root@vmhost18:~# vim /etc/hekad.toml

 logfile = "/var/log/auth.log"
 decoder = "syslog_decoder"
 type = "PayloadRegexDecoder"
 match_regex = '^(?P\w+\s+\d+ \d+:\d+:\d+) (?P\S+) (?P\w+)\[?(?P\d+)?\]?:'
 timestamp_layout= 'Jan _2 15:04:05'
 Type = "SyslogLog"
 Host = "%Host%"
 Program = "%Program%"
 PID = "%PID%"
 message_matcher = "TRUE"
 message_matcher = "Type == 'SyslogLog'"
Now point the hekad to the configuration file.
root@vmhost18:~# hekad -config=/etc/hekad.toml | more
2014/03/10 21:05:57 Loading: [LogfileInput]
2014/03/10 21:05:57 Loading: [syslog_decoder]
2014/03/10 21:05:57 Loading: [LogOutput]
2014/03/10 21:05:57 Loading: [ElasticSearchOutput]
2014/03/10 21:05:57 Loading: [ProtobufDecoder]
2014/03/10 21:05:57 Starting hekad...
2014/03/10 21:05:57 Output started:  LogOutput
2014/03/10 21:05:57 Output started:  ElasticSearchOutput
2014/03/10 21:05:57 MessageRouter started.
2014/03/10 21:05:57 Input started: LogfileInput
2014/03/10 21:05:57 Input 'LogfileInput': Line matches, continuing from byte pos: 95266
Now download and Install elasticserarch
For ubuntu 12.04 download the .deb and install the same way as we did with Heka.
root@vmhost18:~# wget -c
You need to install java runtime first.
root@tc:~# apt-get install java7-runtime-headless
Reading package lists... Done Building dependency tree Reading state information... Done Note, selecting 'openjdk-7-jre-headless' instead of 'java7-runtime-headless'
root@vmhost18:~# dpkg -i elasticsearch-1.0.1.deb
Selecting previously unselected package elasticsearch. (Reading database ... 70512 files and directories currently installed.) Unpacking elasticsearch (from elasticsearch-1.0.1.deb) ... Setting up elasticsearch (1.0.1) ... Adding system user elasticsearch' (UID 110) ... Adding new userelasticsearch' (UID 110) with groupelasticsearch' ... Not creating home directory/usr/share/elasticsearch'.

NOT starting elasticsearch by default on bootup, please execute

sudo update-rc.d elasticsearch defaults 95 10

In order to start elasticsearch, execute

sudo /etc/init.d/elasticsearch start
Processing triggers for ureadahead ...
root@vmhost18:~# /etc/init.d/elasticsearch start
 * Starting Elasticsearch Server

Now install Kibana as well:

root@vmhost18:~# sudo apt-get install nginx
After this operation, 2,350 kB of additional disk space will be used. Do you want to continue [Y/n]?
root@vmhost18:~# sudo service nginx start
Starting nginx: nginx.
root@vmhost18:~# wget -c
$ tar xvzf kibana-3.0.0milestone5.tar.gz
$root@vmhost18:~#  sudo mv kibana-3.0.0milestone5 /usr/share/nginx/www/kibana

You can:
Use the multiple output plug-ins of Heka to write the files out to disk for longer retention (text compresses really nicely, and takes up a lot less space than the Elasticsearch data). Split the components up, and build clusters for each piece: an Elasticsearch cluster, a RabbitMQ cluster, an rsyslog host for long file retention, a Heka cluster — with nodes doing different types of message processing;
Other popular Open source routing systems: - Graylog2 (Supports only read/write from a single index), but other release will be supporting multiple index - LogStash
Both of these have builtin elastic search implementations.
Logstash and elastic search are disk and memory intensive as all other java applications are. You don't want to go past 32 GB of RAM dedicated to ElasticSearch and reserve atleast 8 GB to the OS for file-system caching.
Start with 3 servers in your ElasticSearch cluster. This gives you the flexibility to shutdown a server and maintain full use of your cluster.

 # Ensure ElasticSearch can open files and lock memory!
elasticsearch   soft    nofile          65536
elasticsearch   hard    nofile          65536
elasticsearch   -       memlock         unlimited
You should also configure ElasticSearch's minimum and maximum pool of memory be set to the same value. This takes care of all the memory allocation at startup, so you don't have threads waiting to get more memory from the kernel

No comments:

Post a Comment