Chukwa is an open source data collection system for monitoring large distributed systems. Chukwa is built on top of the Hadoop Distributed File System (HDFS) and Map/Reduce framework and inherits Hadoop’s scalability and robustness.

Chukwa also includes a flexible and powerful toolkit for displaying, monitoring and analyzing results to make the best use of the collected data. 

Chukwa has four primary components:

• Agents that run on each machine and emit data.

• Collectors that receive data from the agent and write it to stable storage.

• MapReduce jobs for parsing and archiving the data.

• HICC, the Hadoop Infrastructure Care Center; a web-portal style interface for displaying data.

Apache Hadoop: Apache Hadoop is an open-source software framework for distributed storage and distributed processing of very large data sets on computer clusters built from commodity hardware.

