Return-Path:
Delivered-To: apmail-hadoop-chukwa-commits-archive@minotaur.apache.org
Received: (qmail 61485 invoked from network); 10 Nov 2009 18:33:11 -0000
Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3)
by minotaur.apache.org with SMTP; 10 Nov 2009 18:33:11 -0000
Received: (qmail 77302 invoked by uid 500); 10 Nov 2009 18:33:11 -0000
Delivered-To: apmail-hadoop-chukwa-commits-archive@hadoop.apache.org
Received: (qmail 77285 invoked by uid 500); 10 Nov 2009 18:33:11 -0000
Mailing-List: contact chukwa-commits-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
List-Help:
List-Unsubscribe:
List-Post:
List-Id:
Reply-To: chukwa-dev@hadoop.apache.org
Delivered-To: mailing list chukwa-commits@hadoop.apache.org
Received: (qmail 77275 invoked by uid 99); 10 Nov 2009 18:33:11 -0000
Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136)
by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 10 Nov 2009 18:33:11 +0000
X-ASF-Spam-Status: No, hits=-2.6 required=5.0
tests=AWL,BAYES_00
X-Spam-Check-By: apache.org
Received: from [140.211.11.4] (HELO eris.apache.org) (140.211.11.4)
by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 10 Nov 2009 18:33:08 +0000
Received: by eris.apache.org (Postfix, from userid 65534)
id E998B238889D; Tue, 10 Nov 2009 18:32:47 +0000 (UTC)
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
Subject: svn commit: r834588 - in /hadoop/chukwa/trunk: CHANGES.txt
src/docs/src/documentation/content/xdocs/admin.xml
src/docs/src/documentation/content/xdocs/agent.xml
src/docs/src/documentation/content/xdocs/collector.xml
Date: Tue, 10 Nov 2009 18:32:47 -0000
To: chukwa-commits@hadoop.apache.org
From: asrabkin@apache.org
X-Mailer: svnmailer-1.0.8
Message-Id: <20091110183247.E998B238889D@eris.apache.org>
Author: asrabkin
Date: Tue Nov 10 18:32:47 2009
New Revision: 834588
URL: http://svn.apache.org/viewvc?rev=834588&view=rev
Log:
CHUKWA-413. Improve admin guide.
Modified:
hadoop/chukwa/trunk/CHANGES.txt
hadoop/chukwa/trunk/src/docs/src/documentation/content/xdocs/admin.xml
hadoop/chukwa/trunk/src/docs/src/documentation/content/xdocs/agent.xml
hadoop/chukwa/trunk/src/docs/src/documentation/content/xdocs/collector.xml
Modified: hadoop/chukwa/trunk/CHANGES.txt
URL: http://svn.apache.org/viewvc/hadoop/chukwa/trunk/CHANGES.txt?rev=834588&r1=834587&r2=834588&view=diff
==============================================================================
--- hadoop/chukwa/trunk/CHANGES.txt (original)
+++ hadoop/chukwa/trunk/CHANGES.txt Tue Nov 10 18:32:47 2009
@@ -8,6 +8,8 @@
IMPROVEMENTS
+ CHUKWA-413. Improve admin guide. (asrabkin)
+
CHUKWA-345. Remove redundant 'application' field from Chunk API. (asrabkin)
CHUKWA-409. Make SocketTeeWriter work in single-stage pipeline. (Thushara Wijeratna via asrabkin)
Modified: hadoop/chukwa/trunk/src/docs/src/documentation/content/xdocs/admin.xml
URL: http://svn.apache.org/viewvc/hadoop/chukwa/trunk/src/docs/src/documentation/content/xdocs/admin.xml?rev=834588&r1=834587&r2=834588&view=diff
==============================================================================
--- hadoop/chukwa/trunk/src/docs/src/documentation/content/xdocs/admin.xml (original)
+++ hadoop/chukwa/trunk/src/docs/src/documentation/content/xdocs/admin.xml Tue Nov 10 18:32:47 2009
@@ -25,38 +25,44 @@
Purpose
-
The purpose of this document is to help you install and configure Chukwa.
+
Chukwa is a system for large-scale reliable log collection and processing
+with Hadoop. The Chukwa design overview discusses the overall architecture of Chukwa.
+You should read that document before this one.
+The purpose of this document is to help you install and configure Chukwa.
Pre-requisites
-
-Supported Platforms
-
GNU/Linux is supported as a development and production platform. Chukwa has been demonstrated on Hadoop clusters with 2000 nodes.
ssh must be installed and sshd must be running to use the Chukwa scripts that manage remote Chukwa daemons
-
-
+
Chukwa should work on any POSIX platform, but GNU/Linux is the only
+ production platform that has been tested extensively. Chukwa has also been used
+ successfully on Mac OS X, which several members of the Chukwa team use for
+ development.
+The Chukwa cluster management scripts rely on ssh; these scripts, however,
+are not required if you have some alternate mechanism for starting and stopping
+daemons.
+
-Install Chukwa
-
Chukwa is installed on:
-
-
A hadoop cluster created specifically for Chukwa (referred to as the Chukwa cluster).
-
The source nodes that Chukwa monitors (referred to as the monitored source nodes).
+Installing Chukwa
+
A minimal Chukwa deployment has three components:
+
+
A Hadoop cluster on which Chukwa will store data (referred to as the Chukwa cluster).
+
A collector process, that writes collected data to HDFS, the Hadoop file system.
+
One or more agent processes, that send monitoring data to the collector.
+The nodes with active agent processes are referred to as the monitored source nodes.
-
-
-
Chukwa can also be installed on a single node, in which case the machine must have at least 16 GB of memory.
+
In addition, you may wish to run the Chukwa Demux jobs, which parse collected
+data, or HICC, the Chukwa visualization tool.
@@ -64,11 +70,55 @@
-General Install Procedure
-
1. Select one of the nodes in the Chukwa cluster:
+First Steps
+
+
+
Obtain a copy of Chukwa. You can find the latest release on the
+Chukwa release page.
+
Un-tar the release, via tar xzf.
+
Make sure a copy of Chukwa is available on each node being monitored, and on
+each node that will run a collector.
+
+We refer to the directory containing Chukwa as CHUKWA_HOME. It may
+be helpful to set CHUKWA_HOME explicitly in your environment,
+but Chukwa does not require that you do so.
+
+
+
+
+
+
+General Configuration
+
+
Agents and collectors are configured differently, but part of the process
+is common to both.
+
+
Make sure that JAVA_HOME is set correctly and points to a Java 1.6 JRE.
+It's generally best to set this in conf/chukwa-env.sh.
+
+In conf/chukwa-env.sh, set CHUKWA_LOG_DIR and
+CHUKWA_PID_DIR to the directories where Chukwa should store its
+console logs and pid files. The pid directory must not be shared between
+different Chukwa instances: it should be local, not NFS-mounted.
+
+
Optionally, set CHUKWA_IDENT_STRING. This string is
+ used to name Chukwa's own console log files.
+
+
+
+
+
+
-Chukwa Binary
-
To get a Chukwa distribution, download a recent stable release of Chukwa from one of the Apache Download Mirrors (see
- Hadoop Chukwa Releases.
-
+Agents
+
Agents are the Chukwa processes that actually produce data. This section
+describes how to configure and run them. More details are available in the
+Agent configuration guide.
+
+
+Configuration
+
This section describes how to set up the agent process on the source nodes.
+
+
+
+
The one mandatory configuration step is to set up
+ $CHUKWA_HOME/conf/collectors. This file should contain a list
+of hosts that will run Chukwa collectors. Agents will pick a random collector
+from this list to try sending to, and will fail-over to another listed collector
+on error. The file should look something like:
+
+
+
+
Edit the CHUKWA_HOME/conf/initial_adaptors configuration file. This is
+where you tell Chukwa what log files to monitor. See
+the adaptor configuration guide for
+a list of available adaptors.
+
+
There are a number of optional settings in
+$CHUKWA_HOME/conf/chukwa-agent-conf.xml:
+
+
The most important of these is the cluster/group name that identifies the
+monitored source nodes. This value is stored in each Chunk of collected data;
+you can therefore use it to distinguish data coming from different groups of
+machines.
+
+
+
+Another important option is chukwaAgent.checkpoint.dir.
+This is the directory Chukwa will use for its periodic checkpoints of running adaptors.
+It must not be a shared directory; use a local, not NFS-mount, directory.
+
To run an agent process on a single node, use bin/agent.sh.
+
+
+
+Typically, agents run as daemons. The script bin/start-agents.sh
+will ssh to each machine listed in conf/agents and start an agent,
+running in the background. The script bin/stop-agents.sh
+does the reverse.
+
You can, of course, use any other daemon-management system you like.
+For instance, tools/init.d includes init scripts for running
+Chukwa agents.
+
To check if an agent is working properly, you can telnet to the control
+port (9093 by default) and hit "enter". You will get a status message if
+the agent is running normally.
-
The default.properties file contains default parameter settings. To override these default settings use the build.properties file.
-For example, copy the TODO-JAVA-HOME environment variable from the default.properties file to the build.properties file and change the setting.
-Hadoop Configuration Files
-
The Hadoop configuration files are located in the HADOOP_HOME/conf directory. To setup Chukwa to collect logs from Hadoop, you need to change some of the hadoop configuration files.
+Configuring Hadoop for monitoring
+
+One of the key goals for Chukwa is to collect logs from Hadoop clusters. This section
+describes how to configure Hadoop to send its logs to Chukwa. Note that
+these directions require Hadoop 0.20.0+. Earlier versions of Hadoop do not have
+the hooks that Chukwa requires in order to grab MapReduce job logs.
+
The Hadoop configuration files are located in HADOOP_HOME/conf.
+ To setup Chukwa to collect logs from Hadoop, you need to change some of the
+ Hadoop configuration files.
Copy CHUKWA_HOME/conf/hadoop-log4j.properties file to HADOOP_HOME/conf/log4j.properties
Copy CHUKWA_HOME/conf/hadoop-metrics.properties file to HADOOP_HOME/conf/hadoop-metrics.properties
Edit HADOOP_HOME/conf/hadoop-metrics.properties file and change @CHUKWA_LOG_DIR@ to your actual CHUKWA log dirctory (ie, CHUKWA_HOME/var/log)
Edit the CHUKWA_HOME/conf/chukwa-env.sh configuration file:
-
-
Set JAVA_HOME to your Java installation.
-
Set HADOOP_JAR to $CHUKWA_HOME/hadoopjars/hadoop-0.18.2.jar
-
Set CHUKWA_IDENT_STRING to the Chukwa cluster name.
-
+Configuration
+
First, edit $CHUKWA_HOME/conf/chukwa-env.sh In addition to
+the general directions given above, you should set
+HADOOP_HOME. This should be the Hadoop deployment Chukwa will use to
+store collected data.
+You will get a version mismatch error if this is configured incorrectly.
+
+
+
Next, edit $CHUKWA_HOME/conf/chukwa-collector-conf.xml.
+The one mandatory configuration parameter is writer.hdfs.filesystem.
+This should be set to the HDFS root URL on which Chukwa will store data.
+Various optional configuration options are described in the collector configuration guide
+and in the collector configuration file itself.
+
+
+
+
+Starting, stopping, and monitoring
+
To run a collector process on a single node, use bin/jettyCollector.sh.
+
+
+
+Typically, collectors run as daemons. The script bin/start-collectors.sh
+will ssh to each collector listed in conf/collectors and start a
+collector, running in the background. The script bin/stop-collectors.sh
+ does the reverse.
+
You can, of course, use any other daemon-management system you like.
+For instance, tools/init.d includes init scripts for running
+Chukwa collectors.
+
To check if a collector is working properly, you can simply access
+http://collectorhost:collectorport/chukwa?ping=true with a web browser.
+If the collector is running, you should see a status page with a handful of statistics.
+
+
+
-2. Set Up the Hadoop jar File
-
Do the following:
+Demux and HICC
+
+
+
- 3. Configure the Collector
-
Edit the CHUKWA_HOME/conf/chukwa-collector-conf.xml configuration file.
-
Set the writer.hdfs.filesystem property to the HDFS root URL.
+Start the Chukwa Processes
+
+
The Chukwa startup scripts are located in the CHUKWA_HOME/tools/init.d directory.
+
+
Start the Chukwa data processors script (execute this command only on the data processor node):
+
+
+
+
Create down sampling daily cron job:
+
+
+
- 4. Set Up the Database
+Set Up the Database
Set up and configure the MySQL database.
@@ -195,67 +406,30 @@
-
-
-Migrate Existing Data From Chukwa 0.1.1
-
Start the MySQL shell:
-
-
-
From the MySQL shell, enter these commands (replace <database_name> with an actual value):
There are a number of Adaptors built into Chukwa, and you can also develop
your own. Chukwa will use them if you add them to the Chukwa library search path
- (e.g., by putting them in a jarfile in /lib.)
+ (e.g., by putting them in a jarfile in $CHUKWA_HOME/lib.)
Chukwa Collectors are responsible for accepting incoming data from Agents,
- and storing the data. Most commonly, collectors simply write to HDFS.
+ and storing the data.
+ Most commonly, collectors simply write all received to HDFS.
In this mode, the filesystem to write to is determined by the option
writer.hdfs.filesystem in chukwa-collector-conf.xml.
This is the only option that you really need to specify to get a working
collector.
+
By default, collectors listen on port 8080. This can be configured
+ in chukwa-collector.conf.xml