Purpose

Reply-To: chukwa-dev@hadoop.apache.org Delivered-To: mailing list chukwa-commits@hadoop.apache.org Received: (qmail 77275 invoked by uid 99); 10 Nov 2009 18:33:11 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 10 Nov 2009 18:33:11 +0000 X-ASF-Spam-Status: No, hits=-2.6 required=5.0 tests=AWL,BAYES_00 X-Spam-Check-By: apache.org Received: from [140.211.11.4] (HELO eris.apache.org) (140.211.11.4) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 10 Nov 2009 18:33:08 +0000 Received: by eris.apache.org (Postfix, from userid 65534) id E998B238889D; Tue, 10 Nov 2009 18:32:47 +0000 (UTC) Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Subject: svn commit: r834588 - in /hadoop/chukwa/trunk: CHANGES.txt src/docs/src/documentation/content/xdocs/admin.xml src/docs/src/documentation/content/xdocs/agent.xml src/docs/src/documentation/content/xdocs/collector.xml Date: Tue, 10 Nov 2009 18:32:47 -0000 To: chukwa-commits@hadoop.apache.org From: asrabkin@apache.org X-Mailer: svnmailer-1.0.8 Message-Id: <20091110183247.E998B238889D@eris.apache.org> Author: asrabkin Date: Tue Nov 10 18:32:47 2009 New Revision: 834588 URL: http://svn.apache.org/viewvc?rev=834588&view=rev Log: CHUKWA-413. Improve admin guide. Modified: hadoop/chukwa/trunk/CHANGES.txt hadoop/chukwa/trunk/src/docs/src/documentation/content/xdocs/admin.xml hadoop/chukwa/trunk/src/docs/src/documentation/content/xdocs/agent.xml hadoop/chukwa/trunk/src/docs/src/documentation/content/xdocs/collector.xml Modified: hadoop/chukwa/trunk/CHANGES.txt URL: http://svn.apache.org/viewvc/hadoop/chukwa/trunk/CHANGES.txt?rev=834588&r1=834587&r2=834588&view=diff ============================================================================== --- hadoop/chukwa/trunk/CHANGES.txt (original) +++ hadoop/chukwa/trunk/CHANGES.txt Tue Nov 10 18:32:47 2009 @@ -8,6 +8,8 @@ IMPROVEMENTS + CHUKWA-413. Improve admin guide. (asrabkin) + CHUKWA-345. Remove redundant 'application' field from Chunk API. (asrabkin) CHUKWA-409. Make SocketTeeWriter work in single-stage pipeline. (Thushara Wijeratna via asrabkin) Modified: hadoop/chukwa/trunk/src/docs/src/documentation/content/xdocs/admin.xml URL: http://svn.apache.org/viewvc/hadoop/chukwa/trunk/src/docs/src/documentation/content/xdocs/admin.xml?rev=834588&r1=834587&r2=834588&view=diff ============================================================================== --- hadoop/chukwa/trunk/src/docs/src/documentation/content/xdocs/admin.xml (original) +++ hadoop/chukwa/trunk/src/docs/src/documentation/content/xdocs/admin.xml Tue Nov 10 18:32:47 2009 @@ -25,38 +25,44 @@

Purpose -

The purpose of this document is to help you install and configure Chukwa.

Chukwa is a system for large-scale reliable log collection and processing +with Hadoop. The Chukwa design overview discusses the overall architecture of Chukwa. +You should read that document before this one. +The purpose of this document is to help you install and configure Chukwa.

Pre-requisites -

-Supported Platforms -

GNU/Linux is supported as a development and production platform. Chukwa has been demonstrated on Hadoop clusters with 2000 nodes.

-Required Software -

Required software for Linux include:

Java 1.6.10, preferably from Sun, installed (see http://java.sun.com/) -
MySQL 5.1.30 (see Set Up the Database) -
Hadoop cluster, installed (see http://hadoop.apache.org/) -
ssh must be installed and sshd must be running to use the Chukwa scripts that manage remote Chukwa daemons -

Chukwa should work on any POSIX platform, but GNU/Linux is the only + production platform that has been tested extensively. Chukwa has also been used + successfully on Mac OS X, which several members of the Chukwa team use for + development.

+ The only absolute software requirements are Java 1.6 + or better and Hadoop 0.18+. + + + HICC, the Chukwa + visualization interface, requires MySQL 5.1.30+.

+The Chukwa cluster management scripts rely on ssh; these scripts, however, +are not required if you have some alternate mechanism for starting and stopping +daemons. +

-Install Chukwa -

Chukwa is installed on:

A hadoop cluster created specifically for Chukwa (referred to as the Chukwa cluster).
The source nodes that Chukwa monitors (referred to as the monitored source nodes).

Installing Chukwa

A minimal Chukwa deployment has three components:

A Hadoop cluster on which Chukwa will store data (referred to as the Chukwa cluster).
A collector process, that writes collected data to HDFS, the Hadoop file system.
One or more agent processes, that send monitoring data to the collector. +The nodes with active agent processes are referred to as the monitored source nodes.

Chukwa can also be installed on a single node, in which case the machine must have at least 16 GB of memory.

In addition, you may wish to run the Chukwa Demux jobs, which parse collected +data, or HICC, the Chukwa visualization tool.

-General Install Procedure -

1. Select one of the nodes in the Chukwa cluster:

+First Steps + +

Obtain a copy of Chukwa. You can find the latest release on the +Chukwa release page.
Un-tar the release, via tar xzf.
Make sure a copy of Chukwa is available on each node being monitored, and on +each node that will run a collector.
+We refer to the directory containing Chukwa as CHUKWA_HOME. It may +be helpful to set CHUKWA_HOME explicitly in your environment, +but Chukwa does not require that you do so.

+ + + +

+General Configuration + +

Agents and collectors are configured differently, but part of the process +is common to both.

Make sure that JAVA_HOME is set correctly and points to a Java 1.6 JRE. +It's generally best to set this in conf/chukwa-env.sh.
+In conf/chukwa-env.sh, set CHUKWA_LOG_DIR and +CHUKWA_PID_DIR to the directories where Chukwa should store its +console logs and pid files. The pid directory must not be shared between +different Chukwa instances: it should be local, not NFS-mounted. +
Optionally, set CHUKWA_IDENT_STRING. This string is + used to name Chukwa's own console log files.

+ +

-Chukwa Binary -

To get a Chukwa distribution, download a recent stable release of Chukwa from one of the Apache Download Mirrors (see - Hadoop Chukwa Releases. -

+Agents +

Agents are the Chukwa processes that actually produce data. This section +describes how to configure and run them. More details are available in the +Agent configuration guide.

+ +

+Configuration +

This section describes how to set up the agent process on the source nodes.

+ + + +

The one mandatory configuration step is to set up + $CHUKWA_HOME/conf/collectors. This file should contain a list +of hosts that will run Chukwa collectors. Agents will pick a random collector +from this list to try sending to, and will fail-over to another listed collector +on error. The file should look something like:

+ + +http://<collector1HostName>:<collector1Port>/ +http://<collector2HostName>:<collector2Port>/ +http://<collector3HostName>:<collector3Port>/ + + +

Edit the CHUKWA_HOME/conf/initial_adaptors configuration file. This is +where you tell Chukwa what log files to monitor. See +the adaptor configuration guide for +a list of available adaptors.

+ +

There are a number of optional settings in +$CHUKWA_HOME/conf/chukwa-agent-conf.xml:

The most important of these is the cluster/group name that identifies the +monitored source nodes. This value is stored in each Chunk of collected data; +you can therefore use it to distinguish data coming from different groups of +machines. + + <property> + <name>chukwaAgent.tags</name> + <value>cluster="demo"</value> + <description>The cluster's name for this agent</description> + </property> + +
+Another important option is chukwaAgent.checkpoint.dir. +This is the directory Chukwa will use for its periodic checkpoints of running adaptors. +It must not be a shared directory; use a local, not NFS-mount, directory. +

+ +

+ + + + +

+Starting, stopping, and monitoring +

To run an agent process on a single node, use bin/agent.sh. +

+ +

+Typically, agents run as daemons. The script bin/start-agents.sh +will ssh to each machine listed in conf/agents and start an agent, +running in the background. The script bin/stop-agents.sh +does the reverse.

You can, of course, use any other daemon-management system you like. +For instance, tools/init.d includes init scripts for running +Chukwa agents.

To check if an agent is working properly, you can telnet to the control +port (9093 by default) and hit "enter". You will get a status message if +the agent is running normally.

The default.properties file contains default parameter settings. To override these default settings use the build.properties file. -For example, copy the TODO-JAVA-HOME environment variable from the default.properties file to the build.properties file and change the setting.

-Hadoop Configuration Files -

The Hadoop configuration files are located in the HADOOP_HOME/conf directory. To setup Chukwa to collect logs from Hadoop, you need to change some of the hadoop configuration files.

+Configuring Hadoop for monitoring +

+One of the key goals for Chukwa is to collect logs from Hadoop clusters. This section +describes how to configure Hadoop to send its logs to Chukwa. Note that +these directions require Hadoop 0.20.0+. Earlier versions of Hadoop do not have +the hooks that Chukwa requires in order to grab MapReduce job logs.

The Hadoop configuration files are located in HADOOP_HOME/conf. + To setup Chukwa to collect logs from Hadoop, you need to change some of the + Hadoop configuration files.

Copy CHUKWA_HOME/conf/hadoop-log4j.properties file to HADOOP_HOME/conf/log4j.properties
Copy CHUKWA_HOME/conf/hadoop-metrics.properties file to HADOOP_HOME/conf/hadoop-metrics.properties
Edit HADOOP_HOME/conf/hadoop-metrics.properties file and change @CHUKWA_LOG_DIR@ to your actual CHUKWA log dirctory (ie, CHUKWA_HOME/var/log)
ln -s HADOOP_HOME/conf/hadoop-site.xml CHUKWA_HOME/conf/hadoop-site.xml

- + +

-Chukwa Cluster Deployment -

This section describes how to set up the Chukwa cluster and related components.

+Collectors +

This section describes how to set up the Chukwa collectors. +For more details, see the collector configuration guide.

-1. Set the Environment Variables -

Edit the CHUKWA_HOME/conf/chukwa-env.sh configuration file:

Set JAVA_HOME to your Java installation. -
Set HADOOP_JAR to $CHUKWA_HOME/hadoopjars/hadoop-0.18.2.jar -
Set CHUKWA_IDENT_STRING to the Chukwa cluster name. -

+Configuration +

First, edit $CHUKWA_HOME/conf/chukwa-env.sh In addition to +the general directions given above, you should set +HADOOP_HOME. This should be the Hadoop deployment Chukwa will use to +store collected data. +You will get a version mismatch error if this is configured incorrectly. +

+ +

Next, edit $CHUKWA_HOME/conf/chukwa-collector-conf.xml. +The one mandatory configuration parameter is writer.hdfs.filesystem. +This should be set to the HDFS root URL on which Chukwa will store data. +Various optional configuration options are described in the collector configuration guide +and in the collector configuration file itself. +

+ +

+Starting, stopping, and monitoring +

To run a collector process on a single node, use bin/jettyCollector.sh. +

+ +

+Typically, collectors run as daemons. The script bin/start-collectors.sh +will ssh to each collector listed in conf/collectors and start a +collector, running in the background. The script bin/stop-collectors.sh + does the reverse.

You can, of course, use any other daemon-management system you like. +For instance, tools/init.d includes init scripts for running +Chukwa collectors.

To check if a collector is working properly, you can simply access +http://collectorhost:collectorport/chukwa?ping=true with a web browser. +If the collector is running, you should see a status page with a handful of statistics.

+ +

-2. Set Up the Hadoop jar File -

Do the following:

+Demux and HICC + + +

- 3. Configure the Collector -

Edit the CHUKWA_HOME/conf/chukwa-collector-conf.xml configuration file.

Set the writer.hdfs.filesystem property to the HDFS root URL.

+Start the Chukwa Processes + +

The Chukwa startup scripts are located in the CHUKWA_HOME/tools/init.d directory.

Start the Chukwa data processors script (execute this command only on the data processor node): +

+CHUKWA_HOME/tools/init.d/chukwa-data-processors start +

Create down sampling daily cron job: +

+CHUKWA_HOME/bin/downSampling.sh --config <path to chukwa conf> -n add

- 4. Set Up the Database +Set Up the Database

Set up and configure the MySQL database.

@@ -195,67 +406,30 @@

- -

-Migrate Existing Data From Chukwa 0.1.1 -

Start the MySQL shell:

- -mysql -u root -p -Enter password: - - -

From the MySQL shell, enter these commands (replace <database_name> with an actual value):

- -use <database_name> -source /path/to/chukwa/conf/database_create_table.sql -source /path/to/chukwa/conf/database_upgrade.sql - - - -

- -

-5. Start the Chukwa Processes - -

The Chukwa startup scripts are located in the CHUKWA_HOME/tools/init.d directory.

Start the Chukwa collector script (execute this command only on those nodes that have the Chukwa Collector installed): -

-CHUKWA_HOME/tools/init.d/chukwa-collector start

Start the Chukwa data processors script (execute this command only on the data processor node): -

-CHUKWA_HOME/tools/init.d/chukwa-data-processors start -

Create down sampling daily cron job: -

-CHUKWA_HOME/bin/downSampling.sh --config <path to chukwa conf> -n add

-7. Set Up HICC +Set Up HICC

The Hadoop Infrastructure Care Center (HICC) is the Chukwa web user interface. To set up HICC, do the following:

Download apache-tomcat 6.0.18+ from Apache Tomcat and decompress the tarball to CHUKWA_HOME/opt.
Copy CHUKWA_HOME/hicc.war to apache-tomcat-6.0.18/webapps.
Start up HICC by running:

-CHUKWA_HOME/bin/hicc.sh start +$CHUKWA_HOME/bin/hicc.sh start

Point your favorite browser to: http://<server>:8080/hicc

@@ -263,124 +437,6 @@

-Monitored Source Node Deployment -

This section describes how to set up the source nodes.

- -

-1. Set the Environment Variables -

Edit the CHUKWA_HOME/conf/chukwa-current/chukwa-env.sh configuration file:

Set JAVA_HOME to the root of your Java installation. -
Set other environment variables as necessary. -

- - -export JAVA_HOME=/path/to/java -export HADOOP_HOME=/path/to/hadoop -export chuwaRecordsRepository="/chukwa/repos/" -export JDBC_DRIVER=com.mysql.jdbc.Driver -export JDBC_URL_PREFIX=jdbc:mysql:// - -

- - -

-2. Configure the Agent - -

Edit the CHUKWA_HOME/conf/chukwa-current/chukwa-agent-conf.xml configuration file.

Enter the cluster/group name that identifies the monitored source nodes:

- - - <property> - <name>chukwaAgent.tags</name> - <value>cluster="demo"</value> - <description>The cluster's name for this agent</description> - </property> - - -

Edit the CHUKWA_HOME/conf/chukwa-current/agents configuration file.

Create a list of hosts that are running the Chukwa agent:

- - -localhost -localhost -localhost - - -

Edit the CHUKWA_HOME/conf/collectors configuration file.

The Chukwa agent needs to know about the Chukwa collectors. Create a list of hosts that are running the Chukwa collector:

- -

This ...

- - -<collector1HostName> -<collector2HostName> -<collector3HostName> - - -

Or this ...

- -http://<collector1HostName>:<collector1Port>/ -http://<collector2HostName>:<collector2Port>/ -http://<collector3HostName>:<collector3Port>/ - -

- - - -

-3. Configure Adaptors -

Edit the CHUKWA_HOME/conf/initial_adaptors configuration file.

- -

Define the default adaptors:

- -add org.apache.hadoop.chukwa.datacollection.adaptor.filetailer.CharFileTailingAdaptorUTF8NewLineEscaped SysLog 0 /var/log/messages 0 - -

Make sure Chukwa has a Read access to /var/log/messages.

- - -

-4. Start the Chukwa Processes - -

Start the Chukwa agent and system metrics processes on the monitored source nodes.

- -

The Chukwa startup scripts are located in the CHUKWA_HOME/tools/init.d directory.

- -

Run both of these commands on all monitored source nodes:

- -

Start the Chukwa agent script: -

-CHUKWA_HOME /tools/init.d/chukwa-agent start

Start the Chukwa system metrics script: -

-CHUKWA_HOME /tools/init.d/chukwa-system-metrics start -

- - -

-5. Validate the Chukwa Processes - -

The Chukwa status scripts are located in the CHUKWA_HOME/tools/init.d directory.

- -

Verify that that agent and system metrics processes are running on all source nodes:

- -

To obtain the status for the Chukwa agent, run: -

-CHUKWA_HOME/tools/init.d/chukwa-agent status

To obtain the status for the system metrics, run: -

-CHUKWA_HOME/tools/init.d/chukwa-system-metrics status -

- -

@@ -388,6 +444,8 @@

UNIX Processes For Chukwa Agents + +

The Chukwa agent process name is identified by:

org.apache.hadoop.chukwa.datacollection.agent.ChukwaAgent Modified: hadoop/chukwa/trunk/src/docs/src/documentation/content/xdocs/agent.xml URL: http://svn.apache.org/viewvc/hadoop/chukwa/trunk/src/docs/src/documentation/content/xdocs/agent.xml?rev=834588&r1=834587&r2=834588&view=diff ============================================================================== --- hadoop/chukwa/trunk/src/docs/src/documentation/content/xdocs/agent.xml (original) +++ hadoop/chukwa/trunk/src/docs/src/documentation/content/xdocs/agent.xml Tue Nov 10 18:32:47 2009 @@ -39,7 +39,7 @@
There are a number of Adaptors built into Chukwa, and you can also develop your own. Chukwa will use them if you add them to the Chukwa library search path - (e.g., by putting them in a jarfile in /lib.)
+ (e.g., by putting them in a jarfile in $CHUKWA_HOME/lib.)

Modified: hadoop/chukwa/trunk/src/docs/src/documentation/content/xdocs/collector.xml URL: http://svn.apache.org/viewvc/hadoop/chukwa/trunk/src/docs/src/documentation/content/xdocs/collector.xml?rev=834588&r1=834587&r2=834588&view=diff ============================================================================== --- hadoop/chukwa/trunk/src/docs/src/documentation/content/xdocs/collector.xml (original) +++ hadoop/chukwa/trunk/src/docs/src/documentation/content/xdocs/collector.xml Tue Nov 10 18:32:47 2009 @@ -25,12 +25,15 @@

Basic Operation

Chukwa Collectors are responsible for accepting incoming data from Agents, - and storing the data. Most commonly, collectors simply write to HDFS. + and storing the data. + Most commonly, collectors simply write all received to HDFS. In this mode, the filesystem to write to is determined by the option writer.hdfs.filesystem in chukwa-collector-conf.xml. This is the only option that you really need to specify to get a working collector.

By default, collectors listen on port 8080. This can be configured + in chukwa-collector.conf.xml

Configuration Knobs