Saturday, March 15, 2014


Load - 1

10:24  up 6 days, 14:52, 4 users, load averages: 1.74 2.07 2.09
1-5-15 minute load average
System load: avg no. of processes in a runnable or uninterruptible state.
Runnable: Waiting for CPU
Uninterruptible: waiting for I/O

1 CPU with  1 Load means : full utilization/ full load
1 CPU with  load 2 : Twice the load that system  can handle

Check what kind of load system is under:

As system that is out of memory can be due to I/O load because the system starts swapping and start using the swap space


Helpful in identifying what resource you are running out of system resources, once zeroed down to that then you can try to find out which what process are consuming those  resources.

top - 13:35:06 up 227 days, 19:01,  1 user,  load average: 0.00, 0.01, 0.05
Tasks: 124 total,   1 running, 123 sleeping,   0 stopped,   0 zombie
Cpu(s):  0.0%us,  0.0%sy,  0.0%ni,100.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Mem:   8167620k total,  5987132k used,  2180488k free,   187984k buffers
Swap:   498684k total,        8k used,   498676k free,  5428964k cached

    1 root      20   0 24332 1956 1036 S    0  0.0   0:08.08 init
    2 root      20   0     0    0    0 S    0  0.0   0:00.42 kthreadd
    3 root      20   0     0    0    0 S    0  0.0  16:48.59 ksoftirqd/0
    6 root      RT   0     0    0    0 S    0  0.0   0:33.68 migration/0
    7 root      RT   0     0    0    0 S    0  0.0   0:47.78 watchdog/0

PID: Process ID- unique number assigned to every process on a system

top -b -n 1 | tee output

 wa: I/O wait
This number represents the percentage of CPU time that is spent waiting for I/O. It is a particularly valuable metric when you are tracking down the cause of a sluggish system, because if this value is low, you can pretty safely rule out disk or network I/O as the cause

id: CPU idle time 
This is one of the metrics that you want to be high. It represents the percentage of CPU time that is spent idle. If you have a sluggish system but this number is high, you know the cause isn’t high CPU load.
 st: steal time
If you are running virtual machines, this metric will tell you the percentage of CPU time that was stolen from you for other tasks.


for i in `whois | grep '^Name Server' | awk '{print $NF}'`; do dig @$i www.mydomain; done

Monday, March 10, 2014


* Structured data
* Fast write
* Distributed - Can run of more than 2 nodes
* Column oriented database not like mysql(row oriented database)
* Good for very high volume of data

DynamoDB(amazon was) and HBase are also  column base data store.

- Consistency
- Availability
- Partition Tolerance

Install on ubuntu 12.04

Download and untar.

Data Collection @Heka Way

Tool for high performance data gathering, analysis, monitoring, and reporting.
Heka’s main component is hekad, a lightweight daemon program that can run on nearly any host machine which does the following: Gathers data through reading and parsing log files, monitoring server health, and/or accepting client network connections using any of a wide variety of protocols (syslog, statsd, http, heka, etc.).
Heka is written in Go, Pretty fast.
Inspired by log stash
apps/logs/sytems ==> Heka==> ES/Logfile
Input: Read in

Decoder: Decode info into a message instance

Filters: Parse, process,

Output: send out data, write out to logs, forward to heka.

Plugin: Go or Lua
Checking the logs by tail -f /var/log/* is very time consuming task specially when you have to troubleshoot a tons of logs. Better make a graph from that data and do analysis on top of that.
Heka is one of the latest tool that can help in logs collection and analysis. But you need some other tool as well to get some meaningful data. Heka alone is not enough.It is just a data collection tool.
You can use: Elastic Search, Kibana, Heka to build a complete logs analysis tool.
Install Heka on:
OS: Ubuntu 12.04
Install the compatible binary:
root@vmhost18:~# wget -c

root@vmhost18:~# dpkg -i heka_0.5.0_amd64.deb
Create a hekad.toml configuration file in /etc/
root@vmhost18:~# vim /etc/hekad.toml

 logfile = "/var/log/auth.log"
 decoder = "syslog_decoder"
 type = "PayloadRegexDecoder"
 match_regex = '^(?P\w+\s+\d+ \d+:\d+:\d+) (?P\S+) (?P\w+)\[?(?P\d+)?\]?:'
 timestamp_layout= 'Jan _2 15:04:05'
 Type = "SyslogLog"
 Host = "%Host%"
 Program = "%Program%"
 PID = "%PID%"
 message_matcher = "TRUE"
 message_matcher = "Type == 'SyslogLog'"
Now point the hekad to the configuration file.
root@vmhost18:~# hekad -config=/etc/hekad.toml | more
2014/03/10 21:05:57 Loading: [LogfileInput]
2014/03/10 21:05:57 Loading: [syslog_decoder]
2014/03/10 21:05:57 Loading: [LogOutput]
2014/03/10 21:05:57 Loading: [ElasticSearchOutput]
2014/03/10 21:05:57 Loading: [ProtobufDecoder]
2014/03/10 21:05:57 Starting hekad...
2014/03/10 21:05:57 Output started:  LogOutput
2014/03/10 21:05:57 Output started:  ElasticSearchOutput
2014/03/10 21:05:57 MessageRouter started.
2014/03/10 21:05:57 Input started: LogfileInput
2014/03/10 21:05:57 Input 'LogfileInput': Line matches, continuing from byte pos: 95266
Now download and Install elasticserarch
For ubuntu 12.04 download the .deb and install the same way as we did with Heka.
root@vmhost18:~# wget -c
You need to install java runtime first.
root@tc:~# apt-get install java7-runtime-headless
Reading package lists... Done Building dependency tree Reading state information... Done Note, selecting 'openjdk-7-jre-headless' instead of 'java7-runtime-headless'
root@vmhost18:~# dpkg -i elasticsearch-1.0.1.deb
Selecting previously unselected package elasticsearch. (Reading database ... 70512 files and directories currently installed.) Unpacking elasticsearch (from elasticsearch-1.0.1.deb) ... Setting up elasticsearch (1.0.1) ... Adding system user elasticsearch' (UID 110) ... Adding new userelasticsearch' (UID 110) with groupelasticsearch' ... Not creating home directory/usr/share/elasticsearch'.

NOT starting elasticsearch by default on bootup, please execute

sudo update-rc.d elasticsearch defaults 95 10

In order to start elasticsearch, execute

sudo /etc/init.d/elasticsearch start
Processing triggers for ureadahead ...
root@vmhost18:~# /etc/init.d/elasticsearch start
 * Starting Elasticsearch Server

Now install Kibana as well:

root@vmhost18:~# sudo apt-get install nginx
After this operation, 2,350 kB of additional disk space will be used. Do you want to continue [Y/n]?
root@vmhost18:~# sudo service nginx start
Starting nginx: nginx.
root@vmhost18:~# wget -c
$ tar xvzf kibana-3.0.0milestone5.tar.gz
$root@vmhost18:~#  sudo mv kibana-3.0.0milestone5 /usr/share/nginx/www/kibana

You can:
Use the multiple output plug-ins of Heka to write the files out to disk for longer retention (text compresses really nicely, and takes up a lot less space than the Elasticsearch data). Split the components up, and build clusters for each piece: an Elasticsearch cluster, a RabbitMQ cluster, an rsyslog host for long file retention, a Heka cluster — with nodes doing different types of message processing;
Other popular Open source routing systems: - Graylog2 (Supports only read/write from a single index), but other release will be supporting multiple index - LogStash
Both of these have builtin elastic search implementations.
Logstash and elastic search are disk and memory intensive as all other java applications are. You don't want to go past 32 GB of RAM dedicated to ElasticSearch and reserve atleast 8 GB to the OS for file-system caching.
Start with 3 servers in your ElasticSearch cluster. This gives you the flexibility to shutdown a server and maintain full use of your cluster.

 # Ensure ElasticSearch can open files and lock memory!
elasticsearch   soft    nofile          65536
elasticsearch   hard    nofile          65536
elasticsearch   -       memlock         unlimited
You should also configure ElasticSearch's minimum and maximum pool of memory be set to the same value. This takes care of all the memory allocation at startup, so you don't have threads waiting to get more memory from the kernel