Connecting a Hadoop cluster to log stash for collecting logs is quite simple since Hadoop uses log4j for logging. Following actions have to be taken in order to push Hadoop logs into log stash:

  1. Download logstash-gelf and drop it into your Hadoop directory share/hadoop/common/lib
  2. Add logstash-gelf to your log4j.properties in etc/hadoop/log4j.properties (see sample code)
  3. Adjust your hadoop-env.sh in etc/hadoop/log4j.properties (see sample code)
  4. Adjust two lines in hadoop-daemon.sh (it's actually a bugfix) in sbin (see sample code)

Deploy it to all your Hadoop nodes, that's it.

About hadoop-daemon.sh

All sbin-scripts contain a possibility to set the root logger using environment variables. Only hadoop-daemon.sh contains a fixed string for the root logger definition. You need to replace

export HADOOP_ROOT_LOGGER=INFO,DRFA
export HADOOP_SECURITY_LOGGER=INFO,DRFAS

by

export HADOOP_ROOT_LOGGER=${HADOOP_ROOT_LOGGER:-INFO,DRFA}
export HADOOP_SECURITY_LOGGER=${HADOOP_ROOT_SECURITY_LOGGER:-INFO,DRFAS}

 

(or take the code provided in the gist).

This how-to is based on hadoop 0.23.10 (Apache Distribution).

The code