chukwa-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From asrab...@apache.org
Subject svn commit: r794987 - in /hadoop/chukwa/trunk: ./ src/docs/ src/docs/src/documentation/content/xdocs/ src/docs/src/documentation/content/xdocs/v0.1.2/
Date Fri, 17 Jul 2009 06:49:43 GMT
Author: asrabkin
Date: Fri Jul 17 06:49:43 2009
New Revision: 794987

URL: http://svn.apache.org/viewvc?rev=794987&view=rev
Log:
CHUKWA-350. Improve documentation.

Added:
    hadoop/chukwa/trunk/src/docs/src/documentation/content/xdocs/agent.xml
    hadoop/chukwa/trunk/src/docs/src/documentation/content/xdocs/programming.xml
    hadoop/chukwa/trunk/src/docs/src/documentation/content/xdocs/v0.1.2/
    hadoop/chukwa/trunk/src/docs/src/documentation/content/xdocs/v0.1.2/admin.xml
    hadoop/chukwa/trunk/src/docs/src/documentation/content/xdocs/v0.1.2/index.xml
Modified:
    hadoop/chukwa/trunk/CHANGES.txt
    hadoop/chukwa/trunk/src/docs/overview.html
    hadoop/chukwa/trunk/src/docs/src/documentation/content/xdocs/admin.xml
    hadoop/chukwa/trunk/src/docs/src/documentation/content/xdocs/index.xml
    hadoop/chukwa/trunk/src/docs/src/documentation/content/xdocs/site.xml
    hadoop/chukwa/trunk/src/docs/src/documentation/content/xdocs/tabs.xml

Modified: hadoop/chukwa/trunk/CHANGES.txt
URL: http://svn.apache.org/viewvc/hadoop/chukwa/trunk/CHANGES.txt?rev=794987&r1=794986&r2=794987&view=diff
==============================================================================
--- hadoop/chukwa/trunk/CHANGES.txt (original)
+++ hadoop/chukwa/trunk/CHANGES.txt Fri Jul 17 06:49:43 2009
@@ -42,6 +42,8 @@
 
   IMPROVEMENTS
 
+    CHUKWA-350. Improve docs, add programmer guide. (asrabkin)
+
     CHUKWA-282. Demux should output detailed per-operation ClientTrace records (Jiaqi Tan
via asrabkin)
 
     CHUKWA-341. Heap space in HICC Jetty is configurable. (Jiaqi Tan via asrabkin)

Modified: hadoop/chukwa/trunk/src/docs/overview.html
URL: http://svn.apache.org/viewvc/hadoop/chukwa/trunk/src/docs/overview.html?rev=794987&r1=794986&r2=794987&view=diff
==============================================================================
--- hadoop/chukwa/trunk/src/docs/overview.html (original)
+++ hadoop/chukwa/trunk/src/docs/overview.html Fri Jul 17 06:49:43 2009
@@ -21,6 +21,9 @@
 </head>
 <body>
 
+Hadoop MapReduce and HDFS are designed to support efficient batch processing of large datasets.
 Many organizations accumulate huge volumes of log files and system metrics data, and it's
tempting to use MapReduce to do the processing. However, this data has certain unfortunate
characteristics. It's updated incrementally, and spread across many machines. This makes it
a difficult to use MapReduce on this data.  Chukwa is a Hadoop subproject aiming to bridge
this gap, and to facilitate MapReduce processing of monitoring data.  
+
+
 Chukwa is an open source data collection system for monitoring and analyzing large distributed
systems.  
 Chukwa is built on top of the Hadoop distributed filesystem (HDFS) and MapReduce framework
and inherits Hadoop's
 scalability and robustness.  Chukwa also includes a flexible and powerful toolkit for displaying
monitoring

Modified: hadoop/chukwa/trunk/src/docs/src/documentation/content/xdocs/admin.xml
URL: http://svn.apache.org/viewvc/hadoop/chukwa/trunk/src/docs/src/documentation/content/xdocs/admin.xml?rev=794987&r1=794986&r2=794987&view=diff
==============================================================================
--- hadoop/chukwa/trunk/src/docs/src/documentation/content/xdocs/admin.xml (original)
+++ hadoop/chukwa/trunk/src/docs/src/documentation/content/xdocs/admin.xml Fri Jul 17 06:49:43
2009
@@ -334,7 +334,7 @@
 
 
 <section>
-<title>3. Configure the Adaptor</title>
+<title>3. Configure Adaptors</title>
 <p>Edit the CHUKWA_HOME/conf/initial_adaptors configuration file.</p>
 
 <p>Define the default adaptors:</p>

Added: hadoop/chukwa/trunk/src/docs/src/documentation/content/xdocs/agent.xml
URL: http://svn.apache.org/viewvc/hadoop/chukwa/trunk/src/docs/src/documentation/content/xdocs/agent.xml?rev=794987&view=auto
==============================================================================
--- hadoop/chukwa/trunk/src/docs/src/documentation/content/xdocs/agent.xml (added)
+++ hadoop/chukwa/trunk/src/docs/src/documentation/content/xdocs/agent.xml Fri Jul 17 06:49:43
2009
@@ -0,0 +1,136 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!--
+  Licensed to the Apache Software Foundation (ASF) under one or more
+  contributor license agreements.  See the NOTICE file distributed with
+  this work for additional information regarding copyright ownership.
+  The ASF licenses this file to You under the Apache License, Version 2.0
+  (the "License"); you may not use this file except in compliance with
+  the License.  You may obtain a copy of the License at
+
+      http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS,
+  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  See the License for the specific language governing permissions and
+  limitations under the License.
+-->
+<!DOCTYPE document PUBLIC "-//APACHE//DTD Documentation V2.0//EN" "http://forrest.apache.org/dtd/document-v20.dtd">
+
+<document>
+  <header>
+    <title>Chukwa Agent Setup Guide</title>
+  </header>
+  <body>
+
+<section>
+<title>Overview</title>
+<p>In a normal Chukwa installation, an <em>Agent</em> process runs on every
machine being monitored. This process is responsible for all the data collection on that host.
 Data collection might mean periodically running a Unix command, or tailing a file, or listening
for incoming UDP packets.</p>
+
+<p>Each particular data source corresponds to a so-called <em>Adaptor</em>.
Adaptors are dynamically loadable modules that run inside the Agent process. There is generally
one Adaptor for each data source: for each file being watched or for each Unix command being
executed. Each adaptor has a unique name. If you do not specify a name, one will be autogenerated
by hashing the Adaptor type and parameters.</p>
+
+<p>There are a number of Adaptors built into Chukwa, and you can also develop your
own. Chukwa will use them if you add them to the Chukwa library search path (e.g., by putting
them in a jarfile in <code>/lib</code>.)</p>
+</section>
+
+<section><title>Data Model</title>
+<p>Chukwa Adaptors emit data in <em>Chunks</em>. A Chunk is a sequence
of bytes, with some metadata. Several of these are set automatically by the Agent or Adaptors.
Two of them require user intervention: <code>cluster name</code> and <code>datatype</code>.
 Cluster name is specified in <code>conf/chukwa-env.sh</code>, and is global to
each Agent process.  Datatype describes the expected format of the data collected by an Adaptor
instance, and it is specified when that instance is started. </p>
+
+<p>The following table lists the Chunk metadata fields. 
+</p>
+
+<table>
+<tr><td>Field</td><td>Meaning</td><td>Source</td></tr>
+<tr><td>Source</td><td>Hostname where Chunk was generated</td><td>Automatic</td></tr>
+<tr><td>Cluster</td><td>Cluster host is associated with</td><td>Specified
by user in agent config</td></tr>
+<tr><td>Datatype</td><td>Format of output</td><td>Specified
by user when Adaptor started</td></tr>
+<tr><td>Sequence ID</td><td>Offset of Chunk in stream</td><td>Automatic,
initial offset specified when Adaptor started</td></tr>
+<tr><td>Name</td><td>Name of data source</td><td>Automatic,
chosen by Adaptor</td></tr>
+</table>
+
+<p>Conceptually, each Adaptor emits a semi-infinite stream of bytes, numbered starting
from zero. The sequence ID specifies how many bytes each Adaptor has sent, including the current
chunk.  So if an adaptor emits a chunk containing the first 100 bytes from a file, the sequenceID
of that Chunk will be 100. And the second hundred bytes will have sequence ID 200.  This may
seem a little peculiar, but it's actually the same way that TCP sequence numbers work.
+</p>
+
+<p>Adaptors need to take sequence ID as a parameter so that they can resume correctly
after a crash, and not send redundant data. When starting adaptors, it's usually save to specify
0 as an ID, but it's sometimes useful to specify something else. For instance, it lets you
do things like only tail the second half of a file. 
+</p>
+</section>
+
+
+<section>
+<title>Agent Control</title>
+
+<p>Once an Agent process is running, there are a number of commands that you can use
to inspect and control it.  By default, Agents listen for incoming commands on port 9093.
Commands are case-insensitive</p>
+
+<table>
+<tr><td>Command</td><td>Purpose</td><td>Options</td></tr>
+
+<tr><td><code>add</code>   </td><td> Start an adaptor.</td>
 <td>See below</td></tr>
+<tr><td><code>close</code> </td><td> Close socket connection
to agent.</td><td>None</td></tr>
+<tr><td><code>help</code>  </td><td> Display a list of
available commands</td><td>None</td></tr>
+<tr><td><code>list</code>  </td><td> List currently running
adaptors</td><td>None</td></tr>
+<tr><td><code>reloadcollectors</code>  </td><td> Re-read
list of collectors</td><td>None</td></tr>
+<tr><td><code>stop</code>  </td><td> Stop adaptor, abruptly</td><td>Adaptor
name</td></tr>
+<tr><td><code>shutdown</code>  </td><td> Stop adaptor,
gracefully</td><td>Adaptor name</td></tr>
+<tr><td><code>stopagent</code>  </td><td> Stop agent
process</td><td>None</td></tr>
+</table>
+
+
+<p>The add command is by far the most complex; it takes several mandatory and optional
parameters. The general form is as follows:</p>
+<source>
+add [name =] &#60;adaptor_class_name&#62; &#60;datatype&#62; &#60;adaptor
specific params&#62; &#60;initial offset&#62;. 
+</source>
+
+<p>
+There are four mandatory fields: The word <code>add</code>, the class name for
the Adaptor, the datatype of the Adaptor's output, and the sequence number for the first byte.
 There are two optional fields; the adaptor instance name, and the adaptor parameters.
+</p>
+
+<p>The adaptor name, if specified, should go after the add command, and be followed
with an equals sign. It should be a string of printable characters, without whitespace or
'='.  
+</p>
+
+<p>Adaptor parameters aren't required by the add command, but adaptor implementations
may have both mandatory and optional parameters. See below.</p>
+</section>
+
+<section> 
+<title>Adaptors</title>
+<p>This section lists the standard adaptors, and the arguments they take.</p>
+
+<ul>
+<li><strong>FileAdaptor</strong>: Pushes a whole file, as one Chunk, then
exits. Takes one mandatory parameter; the file to push.
+
+<source>add FileTailer FooData /tmp/foo 0</source>
+This pushes file <code>/tmp/foo</code> as one chunk, with datatype <code>FooData</code>.
+</li>
+<li><strong>filetailer.FileTailingAdaptor</strong>
+ Repeatedly tails a file, treating the file as a sequence of bytes, ignoring the content.
Chunk boundaries are arbitrary. This is useful for streaming binary data. Takes one mandatory
parameter; a path to the file to tail.
+<source>add filetailer.FileTailingAdaptor BarData /foo/bar 0</source>
+This pushes <code>/foo/bar</code> in a sequence of Chunks of type <code>BarData</code>
+
+</li>
+<li><strong>filetailer.CharFileTailingAdaptorUTF8</strong>
+The same, except that chunks are guaranteed to end only at carriage returns. This is useful
for most ASCII log file formats.
+</li>
+
+<li><strong>filetailer.CharFileTailingAdaptorUTF8NewLineEscaped</strong>
+ The same, except that chunks are guaranteed to end only at non-escaped carriage returns.
This is useful for pushing Chukwa-formatted log files, where exception stack traces stay in
a single chunk.
+</li>
+
+<li><strong>DirTailingAdaptor</strong> Takes a directory path and a second
adaptor name as mandatory parameters; repeatedly scans that directory and all subdirectories,
and starts the indicated adaptor running on each file.
+
+<source>add DirTailingAdaptor logs /var/log/ filetailer.CharFileTailingAdaptorUTF8
0</source>
+
+</li>
+<li><strong>ExecAdaptor</strong> Takes a frequency (in miliseconds) as
optional parameter, and then program name as mandatory parameter. Runs that program repeatedly
at a rate specified by frequency.
+
+<source>add ExecAdaptor Df 60000 /bin/df -x nfs -x none 0</source>
+ This adaptor will run <code>df</code> every minute, labelling output as Df.
+</li>
+
+<li><strong>edu.berkeley.chukwa_xtrace.XtrAdaptor</strong> (available in
contrib) Takes an <a href="http://www.x-trace.net/wiki/doku.php">Xtrace</a> ReportSource
classname [without package] as mandatory argument, and no optional parameters.  Listens for
incoming reports in the same way as that ReportSource would.
+
+<source>add edu.berkeley.chukwa_xtrace.XtrAdaptor Xtrace UdpReportSource 0</source>
+ This adaptor will create and start a <code>UdpReportSource</code>, labeling
its output datatype as Xtrace.
+</li>
+</ul>
+
+</section>
+</body>
+</document>
\ No newline at end of file

Modified: hadoop/chukwa/trunk/src/docs/src/documentation/content/xdocs/index.xml
URL: http://svn.apache.org/viewvc/hadoop/chukwa/trunk/src/docs/src/documentation/content/xdocs/index.xml?rev=794987&r1=794986&r2=794987&view=diff
==============================================================================
--- hadoop/chukwa/trunk/src/docs/src/documentation/content/xdocs/index.xml (original)
+++ hadoop/chukwa/trunk/src/docs/src/documentation/content/xdocs/index.xml Fri Jul 17 06:49:43
2009
@@ -27,8 +27,11 @@
         The Chukwa Documentation provides the information you need to get started using Chukwa.
       </p>
       <p>
-        Begin with the <a href="admin.html"> Chukwa Administration Guide</a>
which shows you how to setup and deploy Chukwa. 
+        If you're trying to set up a Chukwa cluster from scratch, you should read the <a
href="admin.html"> Chukwa Administration Guide</a> which shows you how to setup and
deploy Chukwa. 
       </p>
+     <p> If you want to configure the Chukwa agent process, to control what's collected,
you should read the <a href="agent.html">Agent Guide</a>.
+     </p>
+     <p>And if you want to use collected data, read the <a href="programming.html">programming
guide</a></p>
       <p>
 		If you have more questions, you can ask on the <a href="ext:lists">Chukwa Core Mailing
Lists</a>.
       </p>

Added: hadoop/chukwa/trunk/src/docs/src/documentation/content/xdocs/programming.xml
URL: http://svn.apache.org/viewvc/hadoop/chukwa/trunk/src/docs/src/documentation/content/xdocs/programming.xml?rev=794987&view=auto
==============================================================================
--- hadoop/chukwa/trunk/src/docs/src/documentation/content/xdocs/programming.xml (added)
+++ hadoop/chukwa/trunk/src/docs/src/documentation/content/xdocs/programming.xml Fri Jul 17
06:49:43 2009
@@ -0,0 +1,57 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!--
+  Licensed to the Apache Software Foundation (ASF) under one or more
+  contributor license agreements.  See the NOTICE file distributed with
+  this work for additional information regarding copyright ownership.
+  The ASF licenses this file to You under the Apache License, Version 2.0
+  (the "License"); you may not use this file except in compliance with
+  the License.  You may obtain a copy of the License at
+
+      http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS,
+  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  See the License for the specific language governing permissions and
+  limitations under the License.
+-->
+<!DOCTYPE document PUBLIC "-//APACHE//DTD Documentation V2.0//EN" "http://forrest.apache.org/dtd/document-v20.dtd">
+
+<document>
+  <header>
+    <title>Chukwa Programming Guide</title>
+  </header>
+  <body>
+
+<p>This document discusses the Chukwa archive file formats, and the layout of the Chukwa
storage directories.</p>
+
+<section>
+<title>Sink File Format</title>
+<p>As data is collected, Chukwa dumps it into <em>sink files</em> in HDFS.
By default, these are located in <code>/chukwa/logs</code>.  If the file name
ends in .chukwa, that means the file is still being written to. Every few minutes, the collector
will close the file, and rename the file to '*.done'.  This marks the file as available for
processing.</p>
+
+<p>Each sink file is a Hadoop sequence file, containing a succession of key-value pairs,
and periodic synch markers to facilitate MapReduce access. They key type is <code>ChukwaArchiveKey</code>;
the value type is <code>ChunkImpl</code>. See the Chukwa Javadoc for details about
these classes.</p>
+
+<p>Data in the sink may include duplicate and omitted chunks.</p>
+</section>
+
+<section>
+<title>Demux and Archiving</title>
+<p>It's possible to write MapReduce jobs that directly examine the data sink, but it's
not extremely convenient. Data is not organized in a useful way, so jobs will likely discard
most of their input. Data quality is imperfect, since duplicates and omissions may exist.
 And MapReduce and HDFS are optimized to deal with a modest number of large files, not many
small ones.</p> 
+
+<p> Chukwa therefore supplies several MapReduce jobs for organizing collected data
and putting it into a more useful form; these jobs are typically run regularly from cron.
 Knowing how to use Chukwa-collected data requires understanding how these jobs lay out storage.
For now, this document only discusses one such job: the Simple Archiver. </p>
+</section>
+
+<section><title>Simple Archiver</title>
+<p>The simple archiver is designed to consolidate a large number of data sink files
into a small number of archive files, with the contents grouped in a useful way.  Archive
files, like raw sink files, are in Hadoop sequence file format. Unlike the data sink, however,
duplicates have been removed.  (Future versions of the Simple Archiver will indicate the presence
of gaps.)</p>
+
+<p>The simple archiver moves every <code>.done</code> file out of the sink,
and then runs a MapReduce job to group the data. Output Chunks will be placed into files with
names of the form <code>/chukwa/archive/clustername/Datatype_date.arc</code>.
 Date corresponds to when the data was collected; Datatype is the datatype of each Chunk.

+</p>
+
+<p>If archived data corresponds to an existing filename, a new file will be created
with a disambiguating suffix.</p>
+
+<!-- The Simple Archiver is a Java class, stored in <code>chukwa-core-*.jar</code>
-->
+
+</section>
+
+</body>
+</document>
\ No newline at end of file

Modified: hadoop/chukwa/trunk/src/docs/src/documentation/content/xdocs/site.xml
URL: http://svn.apache.org/viewvc/hadoop/chukwa/trunk/src/docs/src/documentation/content/xdocs/site.xml?rev=794987&r1=794986&r2=794987&view=diff
==============================================================================
--- hadoop/chukwa/trunk/src/docs/src/documentation/content/xdocs/site.xml (original)
+++ hadoop/chukwa/trunk/src/docs/src/documentation/content/xdocs/site.xml Fri Jul 17 06:49:43
2009
@@ -42,6 +42,8 @@
  <docs label="Overview"> 
     <index      label="Overview"       href="index.html" />
     <admin      label="Admin Guide"    href="admin.html" />
+    <agent      label="Agent Configuration Guide" href="agent.html" />
+    <programming      label="Programming Guide" href="programming.html" />
     <api        label="API Docs"       href="ext:api/index"/>
     <wiki       label="Wiki"           href="ext:wiki" />
     <faq        label="FAQ"            href="ext:faq" />

Modified: hadoop/chukwa/trunk/src/docs/src/documentation/content/xdocs/tabs.xml
URL: http://svn.apache.org/viewvc/hadoop/chukwa/trunk/src/docs/src/documentation/content/xdocs/tabs.xml?rev=794987&r1=794986&r2=794987&view=diff
==============================================================================
--- hadoop/chukwa/trunk/src/docs/src/documentation/content/xdocs/tabs.xml (original)
+++ hadoop/chukwa/trunk/src/docs/src/documentation/content/xdocs/tabs.xml Fri Jul 17 06:49:43
2009
@@ -32,6 +32,6 @@
   -->
   <tab label="Project" href="http://hadoop.apache.org/chukwa" type="visible" /> 
   <tab label="Wiki" href="http://wiki.apache.org/hadoop/Chukwa/" type="visible" />

-  <tab label="Chukwa 0.1.2 Documentation" dir="" type="visible" /> 
+  <tab label="Chukwa 0.3 Documentation" dir="" type="visible" /> 
 
 </tabs>

Added: hadoop/chukwa/trunk/src/docs/src/documentation/content/xdocs/v0.1.2/admin.xml
URL: http://svn.apache.org/viewvc/hadoop/chukwa/trunk/src/docs/src/documentation/content/xdocs/v0.1.2/admin.xml?rev=794987&view=auto
==============================================================================
--- hadoop/chukwa/trunk/src/docs/src/documentation/content/xdocs/v0.1.2/admin.xml (added)
+++ hadoop/chukwa/trunk/src/docs/src/documentation/content/xdocs/v0.1.2/admin.xml Fri Jul
17 06:49:43 2009
@@ -0,0 +1,482 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!--
+  Licensed to the Apache Software Foundation (ASF) under one or more
+  contributor license agreements.  See the NOTICE file distributed with
+  this work for additional information regarding copyright ownership.
+  The ASF licenses this file to You under the Apache License, Version 2.0
+  (the "License"); you may not use this file except in compliance with
+  the License.  You may obtain a copy of the License at
+
+      http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS,
+  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  See the License for the specific language governing permissions and
+  limitations under the License.
+-->
+<!DOCTYPE document PUBLIC "-//APACHE//DTD Documentation V2.0//EN" "http://forrest.apache.org/dtd/document-v20.dtd">
+
+<document>
+  <header>
+    <title>Chukwa Administration Guide</title>
+  </header>
+  <body>
+
+<section>
+<title> Purpose </title>
+<p>The purpose of this document is to help you install and configure Chukwa.</p>
+</section>
+
+<section>
+<title> Pre-requisites</title>
+<section>
+<title>Supported Platforms</title>
+<p>GNU/Linux is supported as a development and production platform. Chukwa has been
demonstrated on Hadoop clusters with 2000 nodes.</p>
+</section>
+<section>
+<title>Required Software</title>
+<p>Required software for Linux include:</p>
+<ol>
+<li> Java 1.6.10, preferably from Sun, installed (see <a href="http://java.sun.com/">http://java.sun.com/</a>)
+</li> <li> MySQL 5.1.30 (see <a href="#4.+Set+Up+the+Database">Set Up the
Database)</a>
+</li> <li> Hadoop cluster, installed (see <a href="http://hadoop.apache.org/"
>http://hadoop.apache.org/</a>)
+</li> <li> ssh must be installed and sshd must be running to use the Chukwa scripts
that manage remote Chukwa daemons 
+</li></ol> 
+</section>
+</section>
+
+
+<section>
+<title>Install Chukwa</title>
+<p>Chukwa is installed on: </p>
+<ul>
+<li> A hadoop cluster created specifically for Chukwa (referred to as the Chukwa cluster).</li>

+<li> The source nodes that Chukwa monitors (referred to as the monitored source nodes).</li>
+</ul> 
+<p></p>
+<p></p>
+<p>Chukwa can also be installed on a single node, in which case the machine must have
at least 16 GB of memory. </p>
+<p></p>
+<p></p>
+<p></p>
+
+<figure  align="left" alt="Chukwa Components" src="images/components.gif" />
+
+<section>
+<title>General  Install Procedure </title>
+<p>1. Select one of the nodes in the Chukwa cluster: </p>
+<ul>
+<li> Create a directory for the Chukwa installation (Chukwa will set the  environment
variable <strong>CHUKWA_HOME</strong> to point to this directory during the the
install).
+</li> <li> Move to the new directory.
+</li> <li> Download and un-tar the Chukwa binary.
+</li> <li> Configure the components for the Chukwa cluster (see <a href="#Chukwa+Cluster+Deployment">Chukwa
Cluster Deployment</a>).
+</li> <li> Configure the Hadoop configuration files (see <a href="#Hadoop+Configuration+Files">Hadoop
Configuration Files</a>).
+</li> <li> Zip the directory and deploy to all nodes in the Chukwa cluster.
+</li></ul> 
+<p></p>
+<p></p>
+<p>2. Select one of the source nodes to be monitored: </p>
+<ul>
+<li> Create a directory for the Chukwa installation (Chukwa will set the environment
variable <strong>CHUKWA_HOME</strong> to point to this directory during the install).
+</li> <li> Move to the new directory.
+</li> <li> Download and un-tar the Chukwa binary.
+</li> <li> Configure the components for the source nodes (see <a href="#Monitored+Source+Node+Deployment">Monitored
Source Node Deployment</a>).
+</li> <li> Configure the Hadoop configuration files (see <a href="#Hadoop+Configuration+Files">Hadoop
Configuration Files</a>).
+</li> <li> Zip the directory and deploy to all source nodes to be monitored.
+</li></ul> 
+</section>
+
+<section>
+<title>Chukwa Binary</title>
+<p>To get a Chukwa distribution, download a recent stable release of Hadoop from one
of the Apache Download Mirrors (see 
+ <a href="http://hadoop.apache.org/chukwa/releases.html)./">Hadoop Chukwa Releases</a>.
 
+</p>
+</section>
+
+<section>
+<title>Chukwa Configuration Files </title>
+<p>The Chukwa configuration files are located in the CHUKWA_HOME/conf directory. The
configuration files that you modify are named <strong> *.template. </strong>
+To set up your Chukwa installation (configure various components), copy, rename, and modify
the *.template files as necessary. 
+For example, copy the chukwa-collector-conf.xml.template file to a file named chukwa-collector-conf.xml
and then modify the file to include the cluster/group name for the source nodes.
+</p>
+<p>The <strong>default.properties</strong> file contains default parameter
settings. To override these default settings use the <strong>build.properties </strong>
file. 
+For example, copy the TODO-JAVA-HOME environment variable from the default.properties file
to the build.properties file and change the setting.</p>
+</section>
+
+<section>
+<title>Hadoop Configuration Files</title>
+<p>The Hadoop configuration files are located in the HADOOP_HOME/conf directory. To
setup Chukwa, you need to change some of the hadoop configuration files.</p>
+<ol>
+	<li>Copy CHUKWA_HOME/conf/hadoop-log4j.properties file to HADOOP_HOME/conf/log4j.properties</li>
+	<li>Copy CHUKWA_HOME/conf/hadoop-metrics.properties file to HADOOP_HOME/conf/hadoop-metrics.properties</li>
+	<li>Edit HADOOP_HOME/conf/hadoop-metrics.properties file and change @CHUKWA_LOG_DIR@
to your actual CHUKWA log dirctory (ie, CHUKWA_HOME/var/log)</li>	
+	<li>ln -s HADOOP_HOME/conf/hadoop-site.xml CHUKWA_HOME/conf/hadoop-site.xml</li>
+</ol>
+	
+</section>
+
+</section>
+
+
+<section>
+<title>Chukwa Cluster Deployment </title>
+<p>This section describes how to set up the Chukwa cluster and related components.</p>
+
+<section>
+<title>1. Set the Environment Variables</title>
+<p>Edit the CHUKWA_HOME/conf/chukwa-env.sh configuration file: </p> 
+<ul>
+<li> Set JAVA_HOME to your Java installation.
+</li> <li> Set HADOOP_JAR to $CHUKWA_HOME/hadoopjars/hadoop-0.18.2.jar 
+</li> <li> Set CHUKWA_IDENT_STRING to the Chukwa cluster name. 
+</li></ul> 
+</section>
+
+<section>
+<title>2. Set Up the Hadoop jar File </title>
+<p>Do the following:</p>
+<source>
+cp $HADOOP_HOME/lib hadoop-&#42;-core.jar file $CHUKWA&#95;HOME/hadoopjars
+</source>
+</section>
+
+
+<section>
+<title> 3. Configure the Collector  </title>
+<p>Edit the CHUKWA_HOME/conf/chukwa-collector-conf.xml configuration file.</p>
+<p>Set the writer.hdfs.filesystem property to the HDFS root URL. </p>
+</section>
+
+<section>
+<title> 4. Set Up the Database </title>
+<p>Set up and configure the MySQL database.</p>
+
+<section>
+<title>Install MySQL</title>
+
+<p>Download MySQL 5.1 from the <a href="http://dev.mysql.com/downloads/mysql/5.1.html#downloads">MySQL
site</a>. </p>
+<source>
+tar fxvz mysql-&#42;.tar.gz -C $CHUKWA&#95;HOME/opt
+cd $CHUKWA&#95;HOME/opt/mysql-&#42;
+</source>
+
+<p>
+Configure and then copy the my.cnf file to the CHUKWA_HOME/opt/mysql-* directory:
+</p>
+<source>
+./scripts/mysql_install_db
+./bin/mysqld_safe&#38;
+./bin/mysqladmin -u root create &#60;clustername&#62;
+./bin/mysql -u root &#60;clustername&#62; &#60; $CHUKWA&#95;HOME/conf/database_create_table
+</source>
+
+<p>Edit the CHUKWA_HOME/conf/jdbc.conf configuration file. </p>
+<p>Set the clustername to the MYSQL root URL:</p>
+<source>
+&#60;clustername&#62;&#61;jdbc:mysql://localhost:3306/&#60;clustername&#62;?user&#61;root
+</source>
+
+<p>Download the MySQL Connector/J 5.1 from the  <a href="http://dev.mysql.com/downloads/connector/j/5.1.html">MySQL
site</a>, 
+and place the jar file in $CHUKWA_HOME/lib.</p>
+</section>
+
+<section>
+<title>Set Up MySQL for Replication</title>
+<p>Start the MySQL shell:</p>
+<source>
+mysql -u root -p
+Enter password:
+</source>
+<p>From the MySQL shell, enter these commands (replace &#60;username&#62; and
&#60;password&#62; with actual values):</p>
+<source>
+GRANT REPLICATION SLAVE ON &#42;.&#42; TO &#39;&#60;username&#62;&#39;&#64;&#39;&#37;&#39;
IDENTIFIED BY &#39;&#60;password&#62;&#39;;
+FLUSH PRIVILEGES; 
+</source>
+</section>
+
+
+<section>
+<title>Migrate Existing Data From Chukwa 0.1.1</title>
+<p>Start the MySQL shell:</p>
+<source>
+mysql -u root -p
+Enter password:
+</source>
+
+<p>From the MySQL shell, enter these commands (replace &#60;database_name&#62;
with an actual value):</p>
+<source>
+use &#60;database_name&#62;
+source /path/to/chukwa/conf/database_create_table.sql
+source /path/to/chukwa/conf/database_upgrade.sql
+</source>
+
+
+</section>
+
+</section>
+
+<section>
+<title>5. Start the Chukwa Processes </title>
+
+<p>The Chukwa startup scripts are located in the CHUKWA_HOME/tools/init.d directory.</p>
+<ul>
+<li> Start the Chukwa collector  script (execute this command only on those nodes that
have the Chukwa Collector installed):
+</li></ul> 
+<source>CHUKWA&#95;HOME/tools/init.d/chukwa-collector start </source> <ul>
+<li> Start the Chukwa data processors script (execute this command only on the data
processor node):
+</li></ul> 
+<source>CHUKWA&#95;HOME/tools/init.d/chukwa-data-processors start </source>
+<ul>
+<li> Create down sampling daily cron job:
+</li></ul> 
+<source>CHUKWA&#95;HOME/bin/downSampling.sh --config &#60;path to chukwa conf&#62;
-n add </source>
+</section>
+
+<section>
+<title>6. Validate the Chukwa Processes </title>
+
+<p>The Chukwa status scripts are located in the CHUKWA_HOME/tools/init.d directory.</p>
+<ul>
+<li> To obtain the status for the Chukwa collector, run:</li>
+</ul> 
+<source>CHUKWA&#95;HOME/tools/init.d/chukwa-collector status </source> <ul>
+<li> To verify that the data processors are functioning correctly: </li>
+</ul> 
+<source>Visit the Chukwa hadoop cluster&#39;s Job Tracker UI for job status. 
+Refresh to the Chukwa Cluster Configuration page for the Job Tracker URL. </source>
+</section>
+
+<section>
+<title>7. Set Up HICC </title>
+<p>The Hadoop Infrastructure Care Center (HICC) is the Chukwa web user interface. To
set up HICC, do the following:</p>
+<ul>
+<li>Download apache-tomcat 6.0.18+ from <a href="http://tomcat.apache.org/download-60.cgi">Apache
Tomcat</a> and decompress the tarball to CHUKWA_HOME/opt. </li> 
+<li>Copy CHUKWA_HOME/hicc.war to apache-tomcat-6.0.18/webapps. </li> 
+<li>Start up HICC by running: </li> 
+</ul>
+<source>CHUKWA_HOME/bin/hicc.sh start</source>
+<ul>
+<li>Point your favorite browser to: http://&#60;server&#62;:8080/hicc  </li>

+</ul>
+</section>
+
+</section>
+
+<section>
+<title>Monitored Source Node Deployment </title>
+<p>This section describes how to set up the source nodes. </p>
+
+<section>
+<title>1. Set the Environment Variables </title>
+<p>Edit the CHUKWA_HOME/conf/chukwa-current/chukwa-env.sh configuration file: </p>
+<ul>
+<li> Set JAVA_HOME to the root of your Java installation.
+</li><li> Set other environment variables as necessary.
+</li></ul> 
+
+<source>
+export JAVA&#95;HOME&#61;/path/to/java
+export HADOOP&#95;HOME&#61;/path/to/hadoop
+export chuwaRecordsRepository&#61;&#34;/chukwa/repos/&#34;
+export JDBC&#95;DRIVER&#61;com.mysql.jdbc.Driver
+export JDBC&#95;URL&#95;PREFIX&#61;jdbc:mysql://
+</source>
+</section>
+
+
+<section>
+<title>2. Configure the Agent</title>
+
+<p>Edit the CHUKWA_HOME/conf/chukwa-current/chukwa-agent-conf.xml configuration file.
</p>
+<p>Enter the cluster/group name that identifies the monitored source nodes:</p>
+
+<source>
+ &#60;property&#62;
+    &#60;name&#62;chukwaAgent.tags&#60;/name&#62;
+    &#60;value&#62;cluster&#61;&#34;demo&#34;&#60;/value&#62;
+    &#60;description&#62;The cluster&#39;s name for this agent&#60;/description&#62;
+  &#60;/property&#62;
+</source>
+
+<p>Edit the CHUKWA_HOME/conf/chukwa-current/agents configuration file. </p>
+<p>Create a list of hosts that are running the Chukwa agent:</p>
+
+<source>
+localhost
+localhost
+localhost
+</source>
+
+<p>Edit the CHUKWA_HOME/conf/collectors configuration file. </p>
+<p>The Chukwa agent needs to know about the Chukwa collectors. Create a list of hosts
that are running the Chukwa collector:</p>
+
+<ul>
+	<li>This ...</li>
+</ul>
+
+<source>
+&#60;collector1HostName&#62;
+&#60;collector2HostName&#62;
+&#60;collector3HostName&#62;
+</source>
+
+<ul>
+	<li>Or this ...</li>
+</ul>
+<source>
+http://&#60;collector1HostName&#62;:&#60;collector1Port&#62;/
+http://&#60;collector2HostName&#62;:&#60;collector2Port&#62;/
+http://&#60;collector3HostName&#62;:&#60;collector3Port&#62;/
+</source>
+</section>
+
+
+
+<section>
+<title>3. Configure the Adaptor</title>
+<p>Edit the CHUKWA_HOME/conf/initial_adaptors configuration file.</p>
+
+<p>Define the default adaptors:</p>
+<source>
+add org.apache.hadoop.chukwa.datacollection.adaptor.filetailer.CharFileTailingAdaptorUTF8NewLineEscaped
SysLog 0 /var/log/messages 0
+</source>
+<p>Make sure Chukwa has a Read access to /var/log/messages. </p>
+</section>
+
+
+<section>
+<title>4. Start the Chukwa Processes </title>
+
+<p>Start the Chukwa agent and system metrics processes on the monitored source nodes.</p>
+
+<p>The Chukwa startup scripts are located in the CHUKWA_HOME/tools/init.d directory.</p>
+
+<p>Run both of these commands on all monitored source nodes: </p>
+
+<ul>
+<li> Start the Chukwa agent script:
+</li></ul> 
+<source>CHUKWA&#95;HOME /tools/init.d/chukwa-agent start</source> <ul>
+<li> Start the Chukwa system metrics script:
+</li></ul> 
+<source>CHUKWA&#95;HOME /tools/init.d/chukwa-system-metrics start</source>
+</section>
+
+
+<section>
+<title>5. Validate the Chukwa Processes </title>
+
+<p>The Chukwa status scripts are located in the CHUKWA_HOME/tools/init.d directory.</p>
+
+<p>Verify that that agent and system metrics processes are running on all source nodes:
</p>
+
+<ul>
+<li> To obtain the status for the Chukwa agent, run:
+</li></ul> 
+<source>CHUKWA&#95;HOME/tools/init.d/chukwa-agent status </source> <ul>
+<li> To obtain the status for the system metrics, run:
+</li></ul> 
+<source>CHUKWA&#95;HOME/tools/init.d/chukwa-system-metrics status </source>
+</section>
+
+</section>
+
+
+<section>
+<title>Troubleshooting Tips</title>
+
+<section>
+<title>UNIX Processes For Chukwa Agents</title>
+<p>The system metrics data loader process names are uniquely defined by:</p>
+<ul>
+<li> org.apache.hadoop.chukwa.inputtools.plugin.metrics.Exec sar -q -r -n ALL 55
+</li> <li> org.apache.hadoop.chukwa.inputtools.plugin.metrics.Exec iostat -x
-k 55 2
+</li> <li> org.apache.hadoop.chukwa.inputtools.plugin.metrics.Exec top -b -n
1 -c
+</li> <li> org.apache.hadoop.chukwa.inputtools.plugin.metrics.Exec df -l
+</li> <li> org.apache.hadoop.chukwa.inputtools.plugin.metrics.Exec CHUKWA_HOME/bin/../bin/netstat.sh
+</li></ul> 
+<p>The Chukwa agent process name is identified by:</p>
+<ul>
+<li> org.apache.hadoop.chukwa.datacollection.agent.ChukwaAgent
+</li></ul> 
+<p>Command line to use to search for the process name:</p>
+<ul>
+<li> ps ax | grep org.apache.hadoop.chukwa.datacollection.agent.ChukwaAgent
+</li></ul> 
+</section>
+
+<section>
+<title>UNIX Processes For Chukwa Collectors</title>
+<p>Chukwa Collector name is identified by:</p>
+<ul>
+<li> <strong>org.apache.hadoop.chukwa.datacollection.collector.CollectorStub</strong>
+</li></ul> 
+</section>
+
+<section>
+<title>UNIX Processes For Chukwa Data Processes</title>
+<p>Chukwa Data Processors are identified by:</p>
+<ul>
+<li> org.apache.hadoop.chukwa.extraction.demux.Demux
+</li> <li>org.apache.hadoop.chukwa.extraction.database.DatabaseLoader
+</li> <li>org.apache.hadoop.chukwa.extraction.archive.ChukwaArchiveBuilder
+</li></ul> 
+<p>The processes are scheduled execution, therefore they are not always visible from
the process list.</p>
+</section>
+
+
+<section>
+<title>Checks for MySQL Replication </title>
+<p>At slave server, MySQL prompt, run:</p>
+<source>
+show slave status\G
+</source>
+<p>Make sure both <strong>Slave_IO_Running</strong> and <strong>Slave_SQL_Running</strong>
are both "Yes".</p>
+<p>Things to check if MySQL replication fails:</p>
+<ul>
+<li> Make sure grant permission has been enabled on master MySQL server.
+</li> <li> Check disk space availability.  
+</li> <li> Check Error status in slave status.
+</li></ul> 
+<p>To reset MySQL replication, run these commands on MySQL:</p>
+<source>
+STOP SLAVE;
+CHANGE MASTER TO
+  MASTER&#95;HOST&#61;&#39;hostname&#39;,
+  MASTER&#95;USER&#61;&#39;username&#39;,
+  MASTER&#95;PASSWORD&#61;&#39;password&#39;,
+  MASTER&#95;PORT&#61;3306,
+  MASTER&#95;LOG&#95;FILE&#61;&#39;master2-bin.001&#39;,
+  MASTER&#95;LOG&#95;POS&#61;4,
+  MASTER&#95;CONNECT&#95;RETRY&#61;10;
+START SLAVE;
+</source>
+</section>
+
+
+<section>
+<title> Checks For Disk Full </title>
+<p>If anything is wrong, use /etc/init.d/chukwa-agent and CHUKWA_HOME/tools/init.d/chukwa-system-metrics
stop to shutdown Chukwa.  
+Look at agent.log and collector.log file to determine the problems. </p> 
+<p>The most common problem is the log files are growing unbounded. Set up a cron job
to remove old log files:  </p>
+<source>
+ 0 12 &#42; &#42; &#42; CHUKWA&#95;HOME/tools/expiration.sh 10 !CHUKWA&#95;HOME/var/log
nowait
+</source>     
+<p>This will set up the log file expiration for CHUKWA_HOME/var/log for log files older
than 10 days.</p>
+</section>
+
+
+<section>
+<title>Emergency Shutdown Procedure</title>
+<p>If the system is not functioning properly and you cannot find an answer in the Administration
Guide, execute the kill command. 
+The current state of the java process will be written to the log files. You can analyze these
files to determine the cause of the problem.</p>
+<source>
+kill -3 &#60;pid&#62;
+</source>
+
+</section>
+</section>
+
+</body>
+</document>

Added: hadoop/chukwa/trunk/src/docs/src/documentation/content/xdocs/v0.1.2/index.xml
URL: http://svn.apache.org/viewvc/hadoop/chukwa/trunk/src/docs/src/documentation/content/xdocs/v0.1.2/index.xml?rev=794987&view=auto
==============================================================================
--- hadoop/chukwa/trunk/src/docs/src/documentation/content/xdocs/v0.1.2/index.xml (added)
+++ hadoop/chukwa/trunk/src/docs/src/documentation/content/xdocs/v0.1.2/index.xml Fri Jul
17 06:49:43 2009
@@ -0,0 +1,36 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!--
+  Licensed to the Apache Software Foundation (ASF) under one or more
+  contributor license agreements.  See the NOTICE file distributed with
+  this work for additional information regarding copyright ownership.
+  The ASF licenses this file to You under the Apache License, Version 2.0
+  (the "License"); you may not use this file except in compliance with
+  the License.  You may obtain a copy of the License at
+
+      http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS,
+  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  See the License for the specific language governing permissions and
+  limitations under the License.
+-->
+<!DOCTYPE document PUBLIC "-//APACHE//DTD Documentation V2.0//EN" "http://forrest.apache.org/dtd/document-v20.dtd">
+
+<document>
+  <header>
+    <title>Overview </title>
+  </header>
+  
+  <body>
+      <p>
+        The Chukwa Documentation provides the information you need to get started using Chukwa.
+      </p>
+      <p>
+        Begin with the <a href="admin.html"> Chukwa Administration Guide</a>
which shows you how to setup and deploy Chukwa. 
+      </p>
+      <p>
+		If you have more questions, you can ask on the <a href="ext:lists">Chukwa Core Mailing
Lists</a>.
+      </p>
+  </body>
+</document>



Mime
View raw message