chukwa-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From ey...@apache.org
Subject svn commit: r1210068 [1/2] - in /incubator/chukwa/trunk: ./ src/docs/src/documentation/content/xdocs/ src/site/ src/site/apt/ src/site/resources/images/
Date Sun, 04 Dec 2011 08:01:34 GMT
Author: eyang
Date: Sun Dec  4 08:01:33 2011
New Revision: 1210068

URL: http://svn.apache.org/viewvc?rev=1210068&view=rev
Log:
CHUKWA-612. Convert Chukwa document from forrest format to apt format. (Eric Yang)


Added:
    incubator/chukwa/trunk/src/site/apt/admin.apt
      - copied, changed from r1208953, incubator/chukwa/trunk/src/docs/src/documentation/content/xdocs/admin.xml
    incubator/chukwa/trunk/src/site/apt/agent.apt
      - copied, changed from r1208953, incubator/chukwa/trunk/src/docs/src/documentation/content/xdocs/agent.xml
    incubator/chukwa/trunk/src/site/apt/async_ack.apt
      - copied, changed from r1208953, incubator/chukwa/trunk/src/docs/src/documentation/content/xdocs/async_ack.xml
    incubator/chukwa/trunk/src/site/apt/collector.apt
      - copied, changed from r1208953, incubator/chukwa/trunk/src/docs/src/documentation/content/xdocs/collector.xml
    incubator/chukwa/trunk/src/site/apt/design.apt
      - copied, changed from r1208953, incubator/chukwa/trunk/src/docs/src/documentation/content/xdocs/design.xml
    incubator/chukwa/trunk/src/site/apt/index.apt
    incubator/chukwa/trunk/src/site/resources/images/apache-incubator-logo.png   (with props)
    incubator/chukwa/trunk/src/site/resources/images/chukwa_architecture.png   (with props)
    incubator/chukwa/trunk/src/site/site.xml
Removed:
    incubator/chukwa/trunk/src/docs/src/documentation/content/xdocs/admin.xml
    incubator/chukwa/trunk/src/docs/src/documentation/content/xdocs/agent.xml
    incubator/chukwa/trunk/src/docs/src/documentation/content/xdocs/async_ack.xml
    incubator/chukwa/trunk/src/docs/src/documentation/content/xdocs/collector.xml
    incubator/chukwa/trunk/src/docs/src/documentation/content/xdocs/dataflow.xml
    incubator/chukwa/trunk/src/docs/src/documentation/content/xdocs/design.xml
    incubator/chukwa/trunk/src/docs/src/documentation/content/xdocs/index.xml
    incubator/chukwa/trunk/src/docs/src/documentation/content/xdocs/programming.xml
    incubator/chukwa/trunk/src/docs/src/documentation/content/xdocs/quickstart.xml
    incubator/chukwa/trunk/src/docs/src/documentation/content/xdocs/site.xml
    incubator/chukwa/trunk/src/docs/src/documentation/content/xdocs/tabs.xml
Modified:
    incubator/chukwa/trunk/CHANGES.txt
    incubator/chukwa/trunk/pom.xml
    incubator/chukwa/trunk/src/site/apt/Quick_Start_Guide.apt

Modified: incubator/chukwa/trunk/CHANGES.txt
URL: http://svn.apache.org/viewvc/incubator/chukwa/trunk/CHANGES.txt?rev=1210068&r1=1210067&r2=1210068&view=diff
==============================================================================
--- incubator/chukwa/trunk/CHANGES.txt (original)
+++ incubator/chukwa/trunk/CHANGES.txt Sun Dec  4 08:01:33 2011
@@ -36,6 +36,8 @@ Trunk (unreleased changes)
 
   IMPROVEMENTS
 
+    CHUKWA-612. Convert Chukwa document from forrest format to apt format. (Eric Yang)
+
     CHUKWA-608. Added Quick start guide. (Ahmed Fathalla via Eric Yang)
 
     CHUKWA-605. Update directory structure to be aligned with Hadoop. (Eric Yang)

Modified: incubator/chukwa/trunk/pom.xml
URL: http://svn.apache.org/viewvc/incubator/chukwa/trunk/pom.xml?rev=1210068&r1=1210067&r2=1210068&view=diff
==============================================================================
--- incubator/chukwa/trunk/pom.xml (original)
+++ incubator/chukwa/trunk/pom.xml Sun Dec  4 08:01:33 2011
@@ -309,7 +309,7 @@
             <id>eyang</id>
             <name>Eric Yang</name>
             <email>eyang@apache.org</email>
-            <timezone>(GMT-08:00)</timezone>
+            <timezone>-8</timezone>
             <roles>
                 <role></role>
             </roles>
@@ -318,7 +318,7 @@
             <id>asrabkin</id>
             <name>Ariel Rabkin</name>
             <email>asrabkin@apache.org</email>
-            <timezone>(GMT-05:00)</timezone>
+            <timezone>-5</timezone>
             <roles>
                 <role></role>
             </roles>
@@ -327,7 +327,7 @@
             <id>billgraham</id>
             <name>Bill Graham</name>
             <email>billgraham@apache.org</email>
-            <timezone>(GMT-08:00)</timezone>
+            <timezone>-8</timezone>
             <roles>
                 <role></role>
             </roles>
@@ -336,7 +336,7 @@
             <id>jboulon</id>
             <name>Jerome Boulon</name>
             <email>jboulon@apache.org</email>
-            <timezone>(GMT-08:00)</timezone>
+            <timezone>-8</timezone>
             <roles>
                 <role></role>
             </roles>
@@ -356,6 +356,18 @@
             </resource>
         </resources>
         <plugins>
+          <plugin>
+            <groupId>org.apache.maven.plugins</groupId>
+            <artifactId>maven-site-plugin</artifactId>
+            <version>3.0</version>
+            <dependencies>
+              <dependency><!-- add support for ssh/scp -->
+                <groupId>org.apache.maven.wagon</groupId>
+                <artifactId>wagon-ssh</artifactId>
+                <version>1.0</version>
+              </dependency>
+            </dependencies>
+          </plugin>
             <plugin>
                 <groupId>org.apache.maven.plugins</groupId>
                 <artifactId>maven-resources-plugin</artifactId>
@@ -668,5 +680,49 @@
         </dependencies>
     </dependencyManagement>
 
+    <reporting>
+      <plugins>
+        <plugin>
+          <groupId>org.apache.maven.plugins</groupId>
+          <artifactId>maven-project-info-reports-plugin</artifactId>
+          <version>2.4</version>
+          <configuration>
+            <dependencyLocationsEnabled>false</dependencyLocationsEnabled>
+          </configuration>
+        </plugin>
+        <plugin>
+          <groupId>org.apache.maven.plugins</groupId>
+          <artifactId>maven-javadoc-plugin</artifactId>
+          <version>2.8</version>
+          <reportSets>
+            <reportSet>
+              <id>javadoc</id>
+              <configuration>
+                <aggregate>true</aggregate>
+                <doctitle>${project.name} API ${project.version}</doctitle>
+              </configuration>
+              <reports>
+                <report>javadoc</report>
+              </reports>
+            </reportSet>
+            <reportSet>
+              <id>aggregate</id>
+              <reports>
+                <report>aggregate</report>
+              </reports>
+            </reportSet>
+          </reportSets>
+        </plugin>
+      </plugins>
+    </reporting>
+
+    <distributionManagement>
+      <site>
+        <id>apache-website</id>
+        <name>Apache Website</name>
+        <url>scp://people.apache.org/www/incubator.apache.org/chukwa/docs/r${project.version}</url>
+      </site>
+    </distributionManagement>
+
 </project>
 

Modified: incubator/chukwa/trunk/src/site/apt/Quick_Start_Guide.apt
URL: http://svn.apache.org/viewvc/incubator/chukwa/trunk/src/site/apt/Quick_Start_Guide.apt?rev=1210068&r1=1210067&r2=1210068&view=diff
==============================================================================
--- incubator/chukwa/trunk/src/site/apt/Quick_Start_Guide.apt (original)
+++ incubator/chukwa/trunk/src/site/apt/Quick_Start_Guide.apt Sun Dec  4 08:01:33 2011
@@ -28,7 +28,9 @@ Installing Chukwa
 
   * HICC, the Chukwa visualization tool.
 
-[http://people.apache.org/~eyang/docs/chukwa-0.5-arch.png] Chukwa 0.5.0 Architecture 
+[]
+
+[./images/chukwa_architecture.png] Chukwa 0.5.0 Architecture 
 
 First Steps
 

Copied: incubator/chukwa/trunk/src/site/apt/admin.apt (from r1208953, incubator/chukwa/trunk/src/docs/src/documentation/content/xdocs/admin.xml)
URL: http://svn.apache.org/viewvc/incubator/chukwa/trunk/src/site/apt/admin.apt?p2=incubator/chukwa/trunk/src/site/apt/admin.apt&p1=incubator/chukwa/trunk/src/docs/src/documentation/content/xdocs/admin.xml&r1=1208953&r2=1210068&rev=1210068&view=diff
==============================================================================
--- incubator/chukwa/trunk/src/docs/src/documentation/content/xdocs/admin.xml (original)
+++ incubator/chukwa/trunk/src/site/apt/admin.apt Sun Dec  4 08:01:33 2011
@@ -1,477 +1,384 @@
-<?xml version="1.0" encoding="UTF-8"?>
-<!--
-  Licensed to the Apache Software Foundation (ASF) under one or more
-  contributor license agreements.  See the NOTICE file distributed with
-  this work for additional information regarding copyright ownership.
-  The ASF licenses this file to You under the Apache License, Version 2.0
-  (the "License"); you may not use this file except in compliance with
-  the License.  You may obtain a copy of the License at
-
-      http://www.apache.org/licenses/LICENSE-2.0
-
-  Unless required by applicable law or agreed to in writing, software
-  distributed under the License is distributed on an "AS IS" BASIS,
-  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-  See the License for the specific language governing permissions and
-  limitations under the License.
--->
-<!DOCTYPE document PUBLIC "-//APACHE//DTD Documentation V2.0//EN" "http://forrest.apache.org/dtd/document-v20.dtd">
-
-<document>
-  <header>
-    <title>Chukwa Administration Guide</title>
-  </header>
-  <body>
-
-<section>
-<title> Purpose </title>
-<p> Chukwa is a system for large-scale reliable log collection and processing
-with Hadoop. The <a href="design.html">Chukwa design overview</a> discusses the overall architecture of Chukwa.
-You should read that document before this one.
-The purpose of this document is to help you install and configure Chukwa.</p>
-</section>
-
-<section>
-<title> Pre-requisites</title>
-<p>Chukwa should work on any POSIX platform, but  GNU/Linux is the only
- production platform that has been tested extensively. Chukwa has also been used
- successfully on Mac OS X, which several members of the Chukwa team use for 
- development. </p>
- <p>
- The only absolute software requirements are <a href="http://java.sun.com">Java 1.6
- </a> or better and <a href="http://hadoop.apache.org/" >Hadoop 0.18+</a>.
+~~ Licensed to the Apache Software Foundation (ASF) under one or more
+~~ contributor license agreements.  See the NOTICE file distributed with
+~~ this work for additional information regarding copyright ownership.
+~~ The ASF licenses this file to You under the Apache License, Version 2.0
+~~ (the "License"); you may not use this file except in compliance with
+~~ the License.  You may obtain a copy of the License at
+~~
+~~     http://www.apache.org/licenses/LICENSE-2.0
+~~
+~~ Unless required by applicable law or agreed to in writing, software
+~~ distributed under the License is distributed on an "AS IS" BASIS,
+~~ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+~~ See the License for the specific language governing permissions and
+~~ limitations under the License.
+~~
+
+Chukwa Administration Guide
+
+  This chapter is the detailed configuration guide to Chukwa configuration.
+
+  Please read this chapter carefully and ensure that all requirements have 
+been satisfied. Failure to do so will cause you (and us) grief debugging 
+strange errors and/or data loss.
+
+  Chukwa uses the same configuration system as Hadoop. To configure a deploy, 
+edit a file of environment variables in etc/chukwa/chukwa-env.sh -- this 
+configuration is used mostly by the launcher shell scripts getting the 
+cluster off the ground -- and then add configuration to an XML file to do 
+things like override Chukwa defaults, tell Chukwa what Filesystem to use, 
+or the location of the HBase configuration.
+
+  When running in distributed mode, after you make an edit to an Chukwa 
+configuration, make sure you copy the content of the conf directory to all 
+nodes of the cluster. Chukwa will not do this for you. Use rsync.
+
+Pre-requisites
+
+  Chukwa should work on any POSIX platform, but GNU/Linux is the only
+production platform that has been tested extensively. Chukwa has also been used
+successfully on Mac OS X, which several members of the Chukwa team use for 
+development.
+
+  The only absolute software requirements are {{{http://java.sun.com}Java 1.6}}
+or better and {{{http://hadoop.apache.org/}Hadoop 0.20.205.1+}}.
   
+  HICC, the Chukwa visualization interface, {{{#Set+Up+the+Database}requires HBase 0.90.4+}}.
 
- HICC, the Chukwa
- visualization interface, <a href="#Set+Up+the+Database">requires MySQL 5.1.30+.</a></p>
- <p>
-The Chukwa cluster management scripts rely on <code>ssh</code>; these scripts, however,
+  The Chukwa cluster management scripts rely on <ssh>; these scripts, however,
 are not required if you have some alternate mechanism for starting and stopping
 daemons.
- </p>
-</section>
 
+Installing Chukwa
+
+  A minimal Chukwa deployment has three components:
+
+  * A Hadoop and HBase cluster on which Chukwa will process data (referred to as the Chukwa cluster).
+
+  * A collector process, that writes collected data to HBase.
+
+  * One or more agent processes, that send monitoring data to the collector. 
+    The nodes with active agent processes are referred to as the monitored 
+    source nodes.
+
+  * Data analytics script, summarize Hadoop Cluster Health.
+
+  * HICC, the Chukwa visualization tool.
+
+[]
+
+[./images/chukwa_architecture.png] Chukwa Components
+
+* First Steps
+
+  * Obtain a copy of Chukwa. You can find the latest release on the 
+    {{{http://hadoop.apache.org/chukwa/releases.html} Chukwa release page}}.
+
+  * Un-tar the release, via <tar xzf>.
 
-<section>
-<title>Installing Chukwa</title>
-<p>A minimal Chukwa deployment has three components: </p>
-<ul>
-<li> A Hadoop cluster on which Chukwa will store data (referred to as the Chukwa cluster).</li> 
-<li> A collector process, that writes collected data to HDFS, the Hadoop file system.</li>
-<li> One or more agent processes, that send monitoring data to the collector. 
-The nodes with active agent processes are referred to as the monitored source nodes.</li>
-</ul> 
-<p>In addition, you may wish to run the Chukwa Demux jobs, which parse collected
-data, or HICC, the Chukwa visualization tool.</p>
-<p></p>
-<p></p>
-<p></p>
-
-<figure  align="left" alt="Chukwa Components" src="images/components.gif" />
-
-<section>
-<title>First Steps </title>
-
-<ol>
-<li>Obtain a copy of Chukwa. You can find the latest release on the 
-<a href="http://hadoop.apache.org/chukwa/releases.html">Chukwa release page</a>.</li>
-<li>Un-tar the release, via <code>tar xzf</code>.</li>
-<li>Make sure a copy of Chukwa is available on each node being monitored, and on
-each node that will run a collector.</li>
-<li>
-We refer to the directory containing Chukwa as <code>CHUKWA_HOME</code>. It may
-be helpful to set <code>CHUKWA_HOME</code> explicitly in your environment,
-but Chukwa does not require that you do so.</li>
-</ol>
-</section>
-
-<!-- 
-<section>
-<title>Chukwa Configuration Files </title>
-<p>The Chukwa configuration files are located in the CHUKWA_HOME/conf directory.</p>
-<ul>
-<li> <code>chukwa-env.sh</code> contains environment variables.
-</li></ul>
-</section>
- -->
-
-<section>
-<title>General Configuration</title>
-
-<p>Agents and collectors are configured differently, but part of the process
-is common to both. </p>
-<ul>
-<li>Make sure that <code>JAVA_HOME</code> is set correctly and points to a Java 1.6 JRE. 
-It's generally best to set this in <code>conf/chukwa-env.sh</code>.</li>
-<li>
-In <code>conf/chukwa-env.sh</code>, set <code>CHUKWA_LOG_DIR</code> and
-<code>CHUKWA_PID_DIR</code> to the directories where Chukwa should store its
+  * Make sure a copy of Chukwa is available on each node being monitored, and on
+each node that will run a collector.
+
+  * We refer to the directory containing Chukwa as <CHUKWA_HOME>. It may
+be helpful to set <CHUKWA_HOME> explicitly in your environment,
+but Chukwa does not require that you do so.
+
+* General Configuration
+
+  Agents and collectors are configured differently, but part of the process
+is common to both.
+
+  * Make sure that <JAVA_HOME> is set correctly and points to a Java 1.6 JRE. 
+It's generally best to set this in <etc/chukwa/chukwa-env.sh>.
+
+  * In <etc/chukwa/chukwa-env.sh>, set <CHUKWA_LOG_DIR> and
+<CHUKWA_PID_DIR> to the directories where Chukwa should store its
 console logs and pid files.  The pid directory must not be shared between
 different Chukwa instances: it should be local, not NFS-mounted.
-</li>
- <li> Optionally, set CHUKWA_IDENT_STRING. This string is
- used to name Chukwa's own console log files.</li>
-<!--
-<li>Set <b>either</b> <code>HADOOP_HOME</code> or <code>HADOOP_JAR</code></li>
--->
-</ul>
-</section>
-</section>
-
-<!-- 
-</li> <li> Download and un-tar the Chukwa binary.
-</li> <li> Configure the components for the Chukwa cluster (see <a href="#Chukwa+Cluster+Deployment">Chukwa Cluster Deployment</a>).
-</li> <li> Configure the Hadoop configuration files (see <a href="#Hadoop+Configuration+Files">Hadoop Configuration Files</a>).
-</li> <li> Zip the directory and deploy to all nodes in the Chukwa cluster.
-</li></ul> 
-<p></p>
-<p></p>
-<p>2. Select one of the source nodes to be monitored: </p>
-<ul>
-<li> Create a directory for the Chukwa installation (Chukwa will set the environment variable <strong>CHUKWA_HOME</strong> to point to this directory during the install).
-</li> <li> Move to the new directory.
-</li> <li> Download and un-tar the Chukwa binary.
-</li> <li> Configure the components for the source nodes (see <a href="#Monitored+Source+Node+Deployment">Monitored Source Node Deployment</a>).
-</li> <li> Configure the Hadoop configuration files (see <a href="#Hadoop+Configuration+Files">Hadoop Configuration Files</a>).
-</li> <li> Zip the directory and deploy to all source nodes to be monitored.
-</li></ul> 
-</section>
- -->
-
-<section>
-<title>Agents </title>
-<p>Agents are the Chukwa processes that actually produce data. This section
+
+  * Optionally, set CHUKWA_IDENT_STRING. This string is
+ used to name Chukwa's own console log files.
+
+Agents
+
+  Agents are the Chukwa processes that actually produce data. This section
 describes how to configure and run them. More details are available in the
-<a href="agent.html">Agent configuration guide</a>.</p>
+{{{agent.html}Agent configuration guide}}.
 
-<section>
-<title>Configuration</title>
-<p>This section describes how to set up the agent process on the source nodes. </p>
-
-<!-- 
-<p>Edit <code>$CHUKWA_HOME/conf/agents</code> configuration file. </p>
-<p>Create a list of hosts that are running the Chukwa agent:</p>
-
-<source>
-localhost
-localhost
-localhost
-</source>
- -->
- 
-<p>The one mandatory configuration step is to set up 
-<code> $CHUKWA_HOME/conf/collectors</code>. This file should contain a list
+* Configuration
+
+  This section describes how to set up the agent process on the source nodes.
+
+  The one mandatory configuration step is to set up 
+<$CHUKWA_HOME/etc/chukwa/collectors>. This file should contain a list
 of hosts that will run Chukwa collectors. Agents will pick a random collector
 from this list to try sending to, and will fail-over to another listed collector
-on error.  The file should look something like:</p>
+on error.  The file should look something like:
 
-<source>
-http://&#60;collector1HostName&#62;:&#60;collector1Port&#62;/
-http://&#60;collector2HostName&#62;:&#60;collector2Port&#62;/
-http://&#60;collector3HostName&#62;:&#60;collector3Port&#62;/
-</source>
-
-<p>Edit the CHUKWA_HOME/conf/initial_adaptors configuration file. This is 
-where you tell Chukwa what log files to monitor. See
-<a href="agent.html#Adaptors">the adaptor configuration guide</a> for
-a list of available adaptors.</p>
-
-<p>There are a number of optional settings in 
-<code>$CHUKWA_HOME/conf/chukwa-agent-conf.xml</code>:</p>
-<ul>
-<li>The most important of these is the cluster/group name that identifies the
+---
+http://<collector1HostName>:<collector1Port>/
+http://<collector2HostName>:<collector2Port>/
+http://<collector3HostName>:<collector3Port>/
+---
+
+  Edit the <CHUKWA_HOME/etc/chukwa/initial_adaptors> configuration file. 
+This is where you tell Chukwa what log files to monitor. See
+{{{agent.html#Adaptors}the adaptor configuration guide}} for
+a list of available adaptors.
+
+  There are a number of optional settings in 
+<$CHUKWA_HOME/etc/chukwa/chukwa-agent-conf.xml>:
+
+  * The most important of these is the cluster/group name that identifies the
 monitored source nodes. This value is stored in each Chunk of collected data;
 you can therefore use it to distinguish data coming from different groups of 
 machines.
-<source>
- &#60;property&#62;
-    &#60;name&#62;chukwaAgent.tags&#60;/name&#62;
-    &#60;value&#62;cluster&#61;&#34;demo&#34;&#60;/value&#62;
-    &#60;description&#62;The cluster&#39;s name for this agent&#60;/description&#62;
-  &#60;/property&#62;
-</source>
-</li>
-<li>
-Another important option is <code>chukwaAgent.checkpoint.dir</code>.
-This is the directory Chukwa will use for its periodic checkpoints of running adaptors.
-It <strong>must not</strong> be a shared directory; use a local, not NFS-mount, directory.
-</li>
-
-<li>
-Setting the option <code>chukwaAgent.control.remote</code> will disallow remote connections
-to the agent control socket.
-</li>
-</ul>
-
-
-</section>
-
 
+---
+ <property>
+    <name>chukwaAgent.tags</name>
+    <value>cluster="demo"</value>
+    <description>The cluster's name for this agent</description>
+ </property>
+---
+
+  * Another important option is <chukwaAgent.checkpoint.dir>.
+This is the directory Chukwa will use for its periodic checkpoints of 
+running adaptors.  It <<must not>> be a shared directory; use a local, 
+not NFS-mount, directory.
+
+  * Setting the option <chukwaAgent.control.remote> will disallow remote 
+connections to the agent control socket.
+
+* Starting, Stopping, And Monitoring
+
+  To run an agent process on a single node, use <bin/chukwa agent>.
+
+  Typically, agents run as daemons. The script <bin/start-agents.sh> 
+will ssh to each machine listed in <etc/chukwa/agents> and start an agent,
+running in the background. The script <bin/stop-agents.sh> 
+does the reverse.
+
+  You can, of course, use any other daemon-management system you like. 
+For instance, <tools/init.d> includes init scripts for running
+Chukwa agents.
 
-
-<section>
-<title>Starting, stopping, and monitoring</title>
-<p>To run an agent process on a single node, use <code>bin/chukwa agent</code>.
-</p>
-
-<p>
-Typically, agents run as daemons. The script <code>bin/start-agents.sh</code> 
-will ssh to each machine listed in <code>conf/agents</code> and start an agent,
-running in the background. The script <code>bin/stop-agents.sh</code> 
-does the reverse.</p>
-<p>You can, of course, use any other daemon-management system you like. 
-For instance, <code>tools/init.d</code> includes init scripts for running
-Chukwa agents.</p>
-<p>To check if an agent is working properly, you can telnet to the control
+  To check if an agent is working properly, you can telnet to the control
 port (9093 by default) and hit "enter". You will get a status message if
 the agent is running normally.
-</p>
-</section>
 
-<section>
-<title>Configuring Hadoop for monitoring</title>
-<p>
-One of the key goals for Chukwa is to collect logs from Hadoop clusters. This section
-describes how to configure Hadoop to send its logs to Chukwa. Note that 
-these directions require Hadoop 0.20.0+.  Earlier versions of Hadoop do not have
-the hooks that Chukwa requires in order to grab MapReduce job logs.</p>
-<p>The Hadoop configuration files are located in <code>HADOOP_HOME/conf</code>.
- To setup Chukwa to collect logs from Hadoop, you need to change some of the 
- Hadoop configuration files.</p>
-<ol>
-	<li>Copy CHUKWA_HOME/conf/hadoop-log4j.properties file to HADOOP_HOME/conf/log4j.properties</li>
-	<li>Copy CHUKWA_HOME/conf/hadoop-metrics.properties file to HADOOP_HOME/conf/hadoop-metrics.properties</li>
-	<li>Edit HADOOP_HOME/conf/hadoop-metrics.properties file and change @CHUKWA_LOG_DIR@ to your actual CHUKWA log dirctory (ie, CHUKWA_HOME/var/log)</li>	
-<!-- <li>ln -s HADOOP_HOME/conf/hadoop-site.xml CHUKWA_HOME/conf/hadoop-site.xml</li>
- -->	
- </ol>
-</section>
-
-</section>
-
-
-<section>
-<title>Collectors </title>
-<p>This section describes how to set up the Chukwa collectors.
-For more details, see <a href="collector.html">the collector configuration guide</a>.</p>
-
-<section>
-<title>Configuration</title>
-<p>First, edit <code>$CHUKWA_HOME/conf/chukwa-env.sh</code> In addition to 
-the general directions given above, you should set <code>
-HADOOP_HOME</code>. This should be the Hadoop deployment Chukwa will use to
-store collected data.
-You will get a version mismatch error if this is configured incorrectly.
-</p>
+Configuring Hadoop For Monitoring
+
+  One of the key goals for Chukwa is to collect logs from Hadoop clusters. 
+This section describes how to configure Hadoop to send its logs to Chukwa. 
+Note that these directions require Hadoop 0.20.205.0+.  Earlier versions of 
+Hadoop do not have the hooks that Chukwa requires in order to grab 
+MapReduce job logs.
+
+  The Hadoop configuration files are located in <HADOOP_HOME/etc/hadoop>.
+To setup Chukwa to collect logs from Hadoop, you need to change some of the 
+Hadoop configuration files.
+
+  * Copy CHUKWA_HOME/etc/chukwa/hadoop-log4j.properties file to HADOOP_CONF_DIR/log4j.properties
+
+  * Copy CHUKWA_HOME/etc/chukwa/hadoop-metrics2.properties file to HADOOP_CONF_DIR/hadoop-metrics2.properties
+
+  * Edit HADOOP_HOME/etc/hadoop/hadoop-metrics2.properties file and change ${CHUKWA_LOG_DIR} to your actual CHUKWA log dirctory (ie, CHUKWA_HOME/var/log)
+
+Setup HBase Table
+
+  Chukwa is moving towards a model of using HBase to store metrics data to 
+allow real-time charting. This section describes how to configure HBase and 
+HICC to work together.
+
+  * Presently, we support HBase 0.90.4+. If you have HBase 0.89 jars anywhere, 
+they will cause linkage errors.  Check for and remove them.
+
+  * Setting up tables:
+
+---
+/path/to/hbase-0.90.4/bin/hbase shell < etc/chukwa/hbase.schema
+---
+
+Collectors
+
+  This section describes how to set up the Chukwa collectors.
+For more details, see {{{./collector.html}the collector configuration guide}}.
+
+* Configuration
+
+  First, edit <$CHUKWA_HOME/etc/chukwa/chukwa-env.sh> In addition to 
+the general directions given above, you should set <HADOOP_CONF_DIR> and
+<HBASE_CONF_DIR>.  This should be the Hadoop deployment Chukwa will use to 
+store collected data.  You will get a version mismatch error if this is 
+configured incorrectly.
+
+  Next, edit <$CHUKWA_HOME/etc/chukwa/chukwa-collector-conf.xml>.
+
+** Use HBase For Data Storage
+
+  * Configuring the collector: set HBaseWriter as your writer, or add it 
+    to the pipeline if you are using 
+
+---
+  <property>
+    <name>chukwaCollector.writerClass</name>
+    <value>org.apache.hadoop.chukwa.datacollection.writer.PipelineStageWriter</value>
+  </property>
+
+  <property>
+    <name>chukwaCollector.pipeline</name>
+    <value>org.apache.hadoop.chukwa.datacollection.writer.hbase.HBaseWriter</value>
+  </property>
+---
 
-<p>Next, edit <code>$CHUKWA_HOME/conf/chukwa-collector-conf.xml</code>.
-The one mandatory configuration parameter is <code>writer.hdfs.filesystem</code>.
+** Use HDFS For Data Storage
+
+  The one mandatory configuration parameter is <writer.hdfs.filesystem>.
 This should be set to the HDFS root URL on which Chukwa will store data.
-Various optional configuration options are described in <a href="collector.html">the collector configuration guide</a>
+Various optional configuration options are described in 
+{{{./collector.html}the collector configuration guide}}
 and in the collector configuration file itself.
-</p>
-</section>
 
-<section>
-<title>Starting, stopping, and monitoring</title>
-<p>To run a collector process on a single node, use <code>bin/chukwa collector</code>.
-</p>
-
-<p>
-Typically, collectors run as daemons. The script <code>bin/start-collectors.sh</code> 
-will ssh to each collector listed in <code>conf/collectors</code> and start a
-collector, running in the background. The script <code>bin/stop-collectors.sh
-</code> does the reverse.</p>
-<p>You can, of course, use any other daemon-management system you like. 
-For instance, <code>tools/init.d</code> includes init scripts for running
-Chukwa collectors.</p>
-<p>To check if a collector is working properly, you can simply access
-<code>http://collectorhost:collectorport/chukwa?ping=true</code> with a web browser.
-If the collector is running, you should see a status page with a handful of statistics.</p>
-
-</section>
-
-</section>
-
-<section>
-<title>Demux and HICC</title>
-
-
-<!-- 
-<section>
-<title>Migrate Existing Data From Chukwa 0.1.1</title>
-<p>Start the MySQL shell:</p>
-<source>
-mysql -u root -p
-Enter password:
-</source>
-
-<p>From the MySQL shell, enter these commands (replace &#60;database_name&#62; with an actual value):</p>
-<source>
-use &#60;database_name&#62;
-source /path/to/chukwa/conf/database_create_table.sql
-source /path/to/chukwa/conf/database_upgrade.sql
-</source>
-</section> -->
-
-
-<section>
-<title>Start the Chukwa Processes </title>
-
-<p>The Chukwa startup scripts are located in the CHUKWA_HOME/tools/init.d directory.</p>
-<ul>
-<li> Start the Chukwa data processors script (execute this command only on the data processor node):
-</li></ul> 
-<source>CHUKWA&#95;HOME/tools/init.d/chukwa-data-processors start </source>
-<ul>
-<li> Create down sampling daily cron job:
-</li></ul> 
-<source>CHUKWA&#95;HOME/bin/downSampling.sh --config &#60;path to chukwa conf&#62; -n add </source>
-</section>
-
-<!-- 
-<section>
-<title>Validate the Chukwa Processes </title>
-
-<p>The Chukwa status scripts are located in the CHUKWA_HOME/tools/init.d directory.</p>
-
- <ul>
-<li> To verify that the data processors are functioning correctly: </li>
-</ul> 
-<source>Visit the Chukwa hadoop cluster&#39;s Job Tracker UI for job status. 
-Refresh to the Chukwa Cluster Configuration page for the Job Tracker URL. </source>
-</section> -->
-
-<section>
-<title>Running HICC</title>
-<p>The Hadoop Infrastructure Care Center (HICC) is the Chukwa web user interface. HICC is started by invoking</p>
-<source>bin/chukwa hicc</source>
-<p>Once the webcontainer with HICC has been started, point your favorite browser to:</p>
-<source>http://&#60;server&#62;:8080/hicc</source>
-</section>
-
-</section>
-
-
-<section>
-<title>HICC on HBase</title>
-<p>
-Chukwa is moving towards a model of using HBase to store metrics data to allow real-time
-charting. This section describes how to configure HBase and HICC to work together.
-</p>
-<ul>
-<li>
-Presently, we only support HBase 0.20.6. If you have HBase 0.89 jars anywhere, they 
-will cause linkage errors.  Check for and remove them.
-</li>
-<li>
-Setting up tables:<source>/work/hbase-0.20.6/bin/hbase shell < conf/hbase.schema</source></li>
-<li>
-Pointing Chukwa at HBase: Copy your <source>hbase-site.xml</source> to the Chukwa config directory.
-</li>
-<li>
-Configuring the collector: set HBaseWriter as your writer, or add it to the pipeline if you 
-are using <source>PipelineStageWriter</source></li>
-</ul>
-</section>
-
-<section>
-<title>Troubleshooting Tips</title>
-
-<section>
-<title>UNIX Processes For Chukwa Agents</title>
-
-<!-- 
-<p>The system metrics data loader process names are uniquely defined by:</p>
-<ul>
-<li> org.apache.hadoop.chukwa.inputtools.plugin.metrics.Exec sar -q -r -n ALL 55
-</li> <li> org.apache.hadoop.chukwa.inputtools.plugin.metrics.Exec iostat -x -k 55 2
-</li> <li> org.apache.hadoop.chukwa.inputtools.plugin.metrics.Exec top -b -n 1 -c
-</li> <li> org.apache.hadoop.chukwa.inputtools.plugin.metrics.Exec df -l
-</li> <li> org.apache.hadoop.chukwa.inputtools.plugin.metrics.Exec CHUKWA_HOME/bin/../bin/netstat.sh
-</li></ul> 
--->
-<p>The Chukwa agent process name is identified by:</p>
-<ul>
-<li> org.apache.hadoop.chukwa.datacollection.agent.ChukwaAgent
-</li></ul> 
-<p>Command line to use to search for the process name:</p>
-<ul>
-<li> ps ax | grep org.apache.hadoop.chukwa.datacollection.agent.ChukwaAgent
-</li></ul> 
-</section>
-
-<section>
-<title>UNIX Processes For Chukwa Collectors</title>
-<p>Chukwa Collector name is identified by:</p>
-<ul>
-<li> <strong>org.apache.hadoop.chukwa.datacollection.collector.CollectorStub</strong>
-</li></ul> 
-</section>
-
-<section>
-<title>UNIX Processes For Chukwa Data Processes</title>
-<p>Chukwa Data Processors are identified by:</p>
-<ul>
-<li> org.apache.hadoop.chukwa.extraction.demux.Demux
-</li> <li>org.apache.hadoop.chukwa.extraction.database.DatabaseLoader
-</li> <li>org.apache.hadoop.chukwa.extraction.archive.ChukwaArchiveBuilder
-</li></ul> 
-<p>The processes are scheduled execution, therefore they are not always visible from the process list.</p>
-</section>
-
-
-<section>
-<title>Checks for MySQL Replication </title>
-<p>At slave server, MySQL prompt, run:</p>
-<source>
-show slave status\G
-</source>
-<p>Make sure both <strong>Slave_IO_Running</strong> and <strong>Slave_SQL_Running</strong> are both "Yes".</p>
-<p>Things to check if MySQL replication fails:</p>
-<ul>
-<li> Make sure grant permission has been enabled on master MySQL server.
-</li> <li> Check disk space availability.  
-</li> <li> Check Error status in slave status.
-</li></ul> 
-<p>To reset MySQL replication, run these commands on MySQL:</p>
-<source>
-STOP SLAVE;
-CHANGE MASTER TO
-  MASTER&#95;HOST&#61;&#39;hostname&#39;,
-  MASTER&#95;USER&#61;&#39;username&#39;,
-  MASTER&#95;PASSWORD&#61;&#39;password&#39;,
-  MASTER&#95;PORT&#61;3306,
-  MASTER&#95;LOG&#95;FILE&#61;&#39;master2-bin.001&#39;,
-  MASTER&#95;LOG&#95;POS&#61;4,
-  MASTER&#95;CONNECT&#95;RETRY&#61;10;
-START SLAVE;
-</source>
-</section>
-
-
-<section>
-<title> Checks For Disk Full </title>
-<p>If anything is wrong, use /etc/init.d/chukwa-agent and CHUKWA_HOME/tools/init.d/chukwa-system-metrics stop to shutdown Chukwa.  
-Look at agent.log and collector.log file to determine the problems. </p> 
-<p>The most common problem is the log files are growing unbounded. Set up a cron job to remove old log files:  </p>
-<source>
- 0 12 &#42; &#42; &#42; CHUKWA&#95;HOME/tools/expiration.sh 10 !CHUKWA&#95;HOME/var/log nowait
-</source>     
-<p>This will set up the log file expiration for CHUKWA_HOME/var/log for log files older than 10 days.</p>
-</section>
-
-
-<section>
-<title>Emergency Shutdown Procedure</title>
-<p>If the system is not functioning properly and you cannot find an answer in the Administration Guide, execute the kill command. 
-The current state of the java process will be written to the log files. You can analyze these files to determine the cause of the problem.</p>
-<source>
-kill -3 &#60;pid&#62;
-</source>
+* Starting, Stopping, And Monitoring
+
+  To run a collector process on a single node, use <bin/chukwa collector>.
+
+  Typically, collectors run as daemons. The script <bin/start-collectors.sh> 
+will ssh to each collector listed in <etc/chukwa/collectors> and start a
+collector, running in the background. The script <bin/stop-collectors.sh> 
+does the reverse.
+
+  You can, of course, use any other daemon-management system you like. 
+For instance, <tools/init.d> includes init scripts for running
+Chukwa collectors.
+
+  To check if a collector is working properly, you can simply access
+<http://collectorhost:collectorport/chukwa?ping=true> with a web browser.
+If the collector is running, you should see a status page with a handful of 
+statistics.
+
+ETL Processes (Optional)
+
+  For storing data to HDFS, the archive and demux mapreduce jobs can be 
+started by running:
+
+---
+CHUKWA_HOME/bin/chukwa archive
+---
+ 
+  Demux mapreduce jobs can be started by rnning:
+
+---
+CHUKWA_HOME/bin/chukwa demux
+---
+
+Setup Cluster Aggregation Script
+
+  For data analytics with Apache Pig, there are some additional environment setup. Apache Pig does not use the same environment variable name as Hadoop, therefore make sure the following environment are setup correctly:
+
+  [[1]] Download and setup Apache Pig 0.9.1.
+
+  [[2]] Define Apache Pig class path:
+
+---
+export PIG_CLASSPATH=$HADOOP_CONF_DIR:$HBASE_CONF_DIR
+---
+
+  [[3]] Create a jar file of HBASE_CONF_DIR, run:
+
+---
+jar cf $CHUKWA_HOME/hbase-env.jar $HBASE_CONF_DIR
+---
+
+  [[4]] Setup a cron job or Hudson job for analytics script to run periodically:
+
+---
+pig -Dpig.additional.jars=${HBASE_HOME}/hbase-0.90.4.jar:${HBASE_HOME}/lib/zookeeper-3.3.2.jar:${PIG_PATH}/pig.jar:${CHUKWA_HOME}/hbase-env.jar ${CHUKWA_HOME}/script/pig/ClusterSummary.pig
+---
+
+HICC
+
+* Configuration
+
+  Edit <etc/chukwa/auth.conf> and add authorized user to the list.
+
+* Starting, Stopping, And Monitoring
+
+  The Hadoop Infrastructure Care Center (HICC) is the Chukwa web user interface.
+HICC is started by invoking
+
+---
+bin/chukwa hicc
+---
+
+  Once the webcontainer with HICC has been started, point your favorite 
+browser to:
+
+---
+http://<server>:4080/hicc
+---
+
+Troubleshooting Tips
+
+* UNIX Processes For Chukwa Agents
+
+  The Chukwa agent process name is identified by:
+
+---
+org.apache.hadoop.chukwa.datacollection.agent.ChukwaAgent
+---
+
+  Command line to use to search for the process name:
+
+---
+ps ax | grep org.apache.hadoop.chukwa.datacollection.agent.ChukwaAgent
+---
+
+* UNIX Processes For Chukwa Collectors
+
+  Chukwa Collector name is identified by:
+
+---
+org.apache.hadoop.chukwa.datacollection.collector.CollectorStub
+---
+
+* UNIX Processes For Chukwa Data Processes
+
+  Chukwa Data Processors are identified by:
+
+---
+org.apache.hadoop.chukwa.extraction.demux.Demux
+org.apache.hadoop.chukwa.extraction.database.DatabaseLoader
+org.apache.hadoop.chukwa.extraction.archive.ChukwaArchiveBuilder
+---
+
+  The processes are scheduled execution, therefore they are not always 
+visible from the process list.
+
+* Checks For Disk Full 
+
+  If anything is wrong, use /etc/init.d/chukwa-agent and 
+CHUKWA_HOME/tools/init.d/chukwa-system-metrics stop to shutdown Chukwa.  
+Look at agent.log and collector.log file to determine the problems. 
+
+  The most common problem is the log files are growing unbounded. Set up a 
+cron job to remove old log files:
+
+---
+ 0 12 * * * CHUKWA_HOME/tools/expiration.sh 10 $CHUKWA_HOME/var/log nowait
+---
+
+  This will set up the log file expiration for CHUKWA_HOME/var/log for 
+log files older than 10 days.
+
+* Emergency Shutdown Procedure
 
-</section>
-</section>
+  If the system is not functioning properly and you cannot find an answer in 
+the Administration Guide, execute the kill command.  The current state of 
+the java process will be written to the log files. You can analyze 
+these files to determine the cause of the problem.
 
-</body>
-</document>
+---
+kill -3 <pid>
+---

Copied: incubator/chukwa/trunk/src/site/apt/agent.apt (from r1208953, incubator/chukwa/trunk/src/docs/src/documentation/content/xdocs/agent.xml)
URL: http://svn.apache.org/viewvc/incubator/chukwa/trunk/src/site/apt/agent.apt?p2=incubator/chukwa/trunk/src/site/apt/agent.apt&p1=incubator/chukwa/trunk/src/docs/src/documentation/content/xdocs/agent.xml&r1=1208953&r2=1210068&rev=1210068&view=diff
==============================================================================
--- incubator/chukwa/trunk/src/docs/src/documentation/content/xdocs/agent.xml (original)
+++ incubator/chukwa/trunk/src/site/apt/agent.apt Sun Dec  4 08:01:33 2011
@@ -1,195 +1,206 @@
-<?xml version="1.0" encoding="UTF-8"?>
-<!--
-  Licensed to the Apache Software Foundation (ASF) under one or more
-  contributor license agreements.  See the NOTICE file distributed with
-  this work for additional information regarding copyright ownership.
-  The ASF licenses this file to You under the Apache License, Version 2.0
-  (the "License"); you may not use this file except in compliance with
-  the License.  You may obtain a copy of the License at
-
-      http://www.apache.org/licenses/LICENSE-2.0
-
-  Unless required by applicable law or agreed to in writing, software
-  distributed under the License is distributed on an "AS IS" BASIS,
-  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-  See the License for the specific language governing permissions and
-  limitations under the License.
--->
-<!DOCTYPE document PUBLIC "-//APACHE//DTD Documentation V2.0//EN" "http://forrest.apache.org/dtd/document-v20.dtd">
-
-<document>
-  <header>
-    <title>Chukwa Agent Setup Guide</title>
-  </header>
-  <body>
-
-<section>
-<title>Overview</title>
-<p>In a normal Chukwa installation, an <em>Agent</em> process runs on every 
+~~ Licensed to the Apache Software Foundation (ASF) under one or more
+~~ contributor license agreements.  See the NOTICE file distributed with
+~~ this work for additional information regarding copyright ownership.
+~~ The ASF licenses this file to You under the Apache License, Version 2.0
+~~ (the "License"); you may not use this file except in compliance with
+~~ the License.  You may obtain a copy of the License at
+~~
+~~     http://www.apache.org/licenses/LICENSE-2.0
+~~
+~~ Unless required by applicable law or agreed to in writing, software
+~~ distributed under the License is distributed on an "AS IS" BASIS,
+~~ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+~~ See the License for the specific language governing permissions and
+~~ limitations under the License.
+~~
+
+Overview
+
+  In a normal Chukwa installation, an <Agent> process runs on every 
 machine being monitored. This process is responsible for all the data collection
 on that host.  Data collection might mean periodically running a Unix command,
-or tailing a file, or listening for incoming UDP packets.</p>
+or tailing a file, or listening for incoming UDP packets.
 
-<p>Each particular data source corresponds to a so-called <em>Adaptor</em>. 
+  Each particular data source corresponds to a so-called <Adaptor>. 
 Adaptors are dynamically loadable modules that run inside the Agent process. 
-There is generally one Adaptor for each data source: for each file being watched 
-or for each Unix command being executed. Each adaptor has a unique name. If you 
-do not specify a name, one will be auto-generated by hashing the Adaptor type and
-parameters.</p>
-
-<p>There are a number of Adaptors built into Chukwa, and you can also develop
-your own. Chukwa will use them if you add them to the Chukwa library search path
- (e.g., by putting them in a jarfile in <code>$CHUKWA_HOME/lib</code>.)</p>
-</section>
-
-
-
-<section>
-<title>Agent Control</title>
-
-<p>Once an Agent process is running, there are a number of commands that you can
- use to inspect and control it.  By default, Agents listen for incoming commands
-  on port 9093. Commands are case-insensitive</p>
-
-<table>
-<tr><td>Command</td><td>Purpose</td><td>Options</td></tr>
-
-<tr><td><code>add</code>   </td><td> Start an adaptor.</td>  <td>See below</td></tr>
-<tr><td><code>close</code> </td><td> Close socket connection to agent.</td><td>None</td></tr>
-<tr><td><code>help</code>  </td><td> Display a list of available commands</td><td>None</td></tr>
-<tr><td><code>list</code>  </td><td> List currently running adaptors</td><td>None</td></tr>
-<tr><td><code>reloadcollectors</code>  </td><td> Re-read list of collectors</td><td>None</td></tr>
-<tr><td><code>stop</code>  </td><td> Stop adaptor, abruptly</td><td>Adaptor name</td></tr>
-<tr><td><code>stopall</code>  </td><td> Stop all adaptors, abruptly</td><td>Adaptor name</td></tr>
-<tr><td><code>shutdown</code>  </td><td> Stop adaptor, gracefully</td><td>Adaptor name</td></tr>
-<tr><td><code>stopagent</code>  </td><td> Stop agent process</td><td>None</td></tr>
-</table>
-
-
-<p>The add command is by far the most complex; it takes several mandatory and 
-optional parameters. The general form is as follows:</p>
-<source>
-add [name =] &#60;adaptor_class_name&#62; &#60;datatype&#62; &#60;adaptor 
-specific params&#62; &#60;initial offset&#62;. 
-</source>
+There is generally one Adaptor for each data source: for each file being 
+watched or for each Unix command being executed. Each adaptor has a unique name.
+If you do not specify a name, one will be auto-generated by hashing the 
+Adaptor type and parameters.
+
+  There are a number of Adaptors built into Chukwa, and you can also develop
+your own. Chukwa will use them if you add them to the Chukwa library search 
+path (e.g., by putting them in a jarfile in <$CHUKWA_HOME/lib>.)
+
+Agent Control
+
+  Once an Agent process is running, there are a number of commands that you can
+use to inspect and control it.  By default, Agents listen for incoming commands
+on port 9093. Commands are case-insensitive
+
+*--------------------*--------------------------------------*--------------:
+| Command            | Purpose                              | Options      |
+*--------------------*--------------------------------------*--------------:
+| <add>              | Start an adaptor.                    | See below    |
+*--------------------*--------------------------------------*--------------:
+| <close>            | Close socket connection to agent.    | None         |
+*--------------------*--------------------------------------*--------------:
+| <help>             | Display a list of available commands | None         |
+*--------------------*--------------------------------------*--------------:
+| <list>             | List currently running adaptors      | None         |
+*--------------------*--------------------------------------*--------------:
+| <reloadcollectors> | Re-read list of collectors           | None         |
+*--------------------*--------------------------------------*--------------:
+| <stop>             | Stop adaptor, abruptly               | Adaptor name |
+*--------------------*--------------------------------------*--------------:
+| <stopall>          | Stop all adaptors, abruptly          | Adaptor name |
+*--------------------*--------------------------------------*--------------:
+| <shutdown>         | Stop adaptor, gracefully             | Adaptor name |
+*--------------------*--------------------------------------*--------------:
+| <stopagent>        | Stop agent process                   | None         |
+*--------------------*--------------------------------------*--------------:
+
+
+  The add command is by far the most complex; it takes several mandatory and 
+optional parameters. The general form is as follows:
+
+---
+add [name =] <adaptor_class_name> <datatype> <adaptor specific params> <initial offset>
+---
 
-<p>
-There are four mandatory fields: The word <code>add</code>, the class name for 
+  There are four mandatory fields: The word <add>, the class name for
 the Adaptor, the datatype of the Adaptor's output, and the sequence number for 
 the first byte.  There are two optional fields; the adaptor instance name, and 
 the adaptor parameters.
-</p>
 
-<p>The adaptor name, if specified, should go after the add command, and be 
+  The adaptor name, if specified, should go after the add command, and be 
 followed with an equals sign. It should be a string of printable characters, 
 without whitespace or '='.  Chukwa Adaptor names all start with "adaptor_".
 If you specify an adaptor name which does not start with that prefix, it will
 be added automatically.  
-</p>
 
-<p>Adaptor parameters aren't required by the Chukwa agent, but each class of adaptor 
-may itself specify both mandatory and optional parameters. See below.</p>
-</section>
-
-<section>
-<title>Command-line options</title>
-<p>Normally, agents are configured via the file <code>conf/chukwa-agent-conf.xml.</code>
+  Adaptor parameters aren't required by the Chukwa agent, but each class of 
+adaptor may itself specify both mandatory and optional parameters. See below.
+
+Command-line options
+
+  Normally, agents are configured via the file <conf/chukwa-agent-conf.xml.>
 However, there are a few command-line options that are sometimes useful in
 troubleshooting. If you specify "local" as an option, then the agent will print
 chunks to standard out, rather than to a collector. If you specify a URI, then
 that will be used as collector, overriding the collectors specified in
-<code>conf/collectors</code>.  These options are intended for testing and debugging,
-not for production use.</p>
+<conf/collectors>.  These options are intended for testing and debugging,
+not for production use.
 
-<source>
+---
 bin/chukwa agent local
-</source>
-</section>
+---
 
-<section> 
-<title>Adaptors</title>
-<p>This section lists the standard adaptors, and the arguments they take.</p>
+Adaptors
 
-<ul>
-<li><strong>FileAdaptor</strong>: Pushes a whole file, as one Chunk, then exits.
+  This section lists the standard adaptors, and the arguments they take.
+
+  * <<FileAdaptor>> Pushes a whole file, as one Chunk, then exits.
  Takes one mandatory parameter; the file to push.
 
-<source>add FileTailer FooData /tmp/foo 0</source>
-This pushes file <code>/tmp/foo</code> as one chunk, with datatype <code>FooData</code>.
-</li>
-<li><strong>filetailer.LWFTAdaptor</strong>
-Repeatedly tails a file, treating the file as a sequence of bytes, ignoring the
-  content. Chunk boundaries are arbitrary. This is useful for streaming binary 
-  data. Takes one mandatory parameter; a path to the file to tail. If log file
-  is rotated while there is unread data, this adaptor will not attempt to recover it.
-  <source>add filetailer.LWFTAdaptor BarData /foo/bar 0</source>
-This pushes <code>/foo/bar</code> in a sequence of Chunks of type <code>BarData</code>
-</li>
-
-<li><strong>filetailer.FileTailingAdaptor</strong>
- Repeatedly tails a file, again ignoring content and with unspecified Chunk
- boundaries. Takes one mandatory parameter; a path to the file to tail. Keeps a 
+---
+add FileTailer FooData /tmp/foo 0
+---
+  This pushes file </tmp/foo> as one chunk, with datatype <FooData>.
+
+  * <<filetailer.LWFTAdaptor>> Repeatedly tails a file, treating the file as 
+  a sequence of bytes, ignoring the content. Chunk boundaries are arbitrary. 
+  This is useful for streaming binary data. Takes one mandatory parameter; 
+  a path to the file to tail. If log file is rotated while there is unread 
+  data, this adaptor will not attempt to recover it.
+
+---
+add filetailer.LWFTAdaptor BarData /foo/bar 0
+---
+  This pushes </foo/bar> in a sequence of Chunks of type <BarData>
+
+  * <<filetailer.FileTailingAdaptor>> Repeatedly tails a file, again 
+  ignoring content and with unspecified Chunk boundaries. Takes one 
+  mandatory parameter; a path to the file to tail. Keeps a 
   file handle open in order to detect log file rotation.
-<source>add filetailer.FileTailingAdaptor BarData /foo/bar 0</source>
-This pushes <code>/foo/bar</code> in a sequence of Chunks of type <code>BarData</code>
-</li>
-
-
-<li><strong>filetailer.RCheckFTAdaptor</strong>
- An experimental modification of the above, which avoids the need to keep a file handle
- open.  Same parameters and usage as the above.
-</li>
-
-<li><strong>filetailer.CharFileTailingAdaptorUTF8</strong>
-The same as the base FileTailingAdaptor, except that chunks are guaranteed to end only at carriage returns.
- This is useful for most ASCII log file formats.
-</li>
-
-<li><strong>filetailer.CharFileTailingAdaptorUTF8NewLineEscaped</strong>
- The same, except that chunks are guaranteed to end only at non-escaped carriage
- returns. This is useful for pushing Chukwa-formatted log files, where exception
- stack traces stay in a single chunk.
-</li>
-
-<li><strong>DirTailingAdaptor</strong> Takes a directory path and an
- adaptor name as mandatory parameters; repeatedly scans that directory and all
- subdirectories, and starts the indicated adaptor running on each file. Since
- the DirTailingAdaptor does not, itself, emit data, the datatype parameter is 
- applied to the newly-spawned adaptors.  Note  that if you try this on a large 
- directory with an adaptor that keeps file handles open,
-  it is possible to exceed your system's limit on open files.
-  A file pattern can be specified as an optional second parameter.
 
-<source>add DirTailingAdaptor logs /var/log/ *.log filetailer.CharFileTailingAdaptorUTF8 0</source>
+---
+add filetailer.FileTailingAdaptor BarData /foo/bar 0
+---
+  This pushes </foo/bar> in a sequence of Chunks of type <BarData>
+
+  * <<filetailer.RCheckFTAdaptor>>
+    An experimental modification of the above, which avoids the need to 
+    keep a file handle open.  Same parameters and usage as the above.
+
+  * <<filetailer.CharFileTailingAdaptorUTF8>>
+    The same as the base FileTailingAdaptor, except that chunks are 
+    guaranteed to end only at carriage returns.
+    This is useful for most ASCII log file formats.
+
+  * <<filetailer.CharFileTailingAdaptorUTF8NewLineEscaped>>
+     The same, except that chunks are guaranteed to end only at 
+     non-escaped carriage returns. This is useful for pushing 
+     Chukwa-formatted log files, where exception
+     stack traces stay in a single chunk.
+
+  * <<DirTailingAdaptor>> Takes a directory path and an
+    adaptor name as mandatory parameters; repeatedly scans that directory 
+    and all subdirectories, and starts the indicated adaptor running on 
+    each file. Since the DirTailingAdaptor does not, itself, emit data, 
+    the datatype parameter is applied to the newly-spawned adaptors.  
+    Note  that if you try this on a large directory with an adaptor that 
+    keeps file handles open, it is possible to exceed your system's limit 
+    on open files.
+    A file pattern can be specified as an optional second parameter.
+
+---
+add DirTailingAdaptor logs /var/log/ *.log filetailer.CharFileTailingAdaptorUTF8 0
+---
 
-</li>
-<li><strong>ExecAdaptor</strong> Takes a frequency (in milliseconds) as optional 
+  * <<ExecAdaptor>> Takes a frequency (in milliseconds) as optional 
 parameter, and then program name as mandatory parameter. Runs that program 
 repeatedly at a rate specified by frequency.
 
-<source>add ExecAdaptor Df 60000 /bin/df -x nfs -x none 0</source>
- This adaptor will run <code>df</code> every minute, labeling output as Df.
-</li>
+---
+add ExecAdaptor Df 60000 /bin/df -x nfs -x none 0
+---
+  This adaptor will run <df> every minute, labeling output as Df.
 
-<li><strong>UDPAdaptor</strong> Takes a port number as mandatory parameter.
+  * <<UDPAdaptor>> Takes a port number as mandatory parameter.
 Binds to the indicated UDP port, and emits one Chunk for each received packet.
 
-<source>add UdpAdaptor Packets 1234 0</source>
- This adaptor will listen for incoming traffic on port 1234, labeling output as Packets.
-</li>
-
+---
+add UdpAdaptor Packets 1234 0
+---
+  This adaptor will listen for incoming traffic on port 1234, labeling output as Packets.
 
-<li><strong>edu.berkeley.chukwa_xtrace.XtrAdaptor</strong> (available in <code>contrib</code>)
- Takes an <a href="http://www.x-trace.net/wiki/doku.php">Xtrace</a> ReportSource
+  * <<edu.berkeley.chukwa_xtrace.XtrAdaptor>> (available in <contrib>)
+ Takes an {{{http://www.x-trace.net/wiki/doku.php}Xtrace}} ReportSource
  class name [without package] as mandatory argument, and no optional parameters.
  Listens for incoming reports in the same way as that ReportSource would.
 
-<source>add edu.berkeley.chukwa_xtrace.XtrAdaptor Xtrace UdpReportSource 0</source>
- This adaptor will create and start a <code>UdpReportSource</code>, labeling its
+---
+add edu.berkeley.chukwa_xtrace.XtrAdaptor Xtrace UdpReportSource 0
+---
+  This adaptor will create and start a <UdpReportSource>, labeling its
   output datatype as Xtrace.
-</li>
-</ul>
 
-</section>
-</body>
-</document>
\ No newline at end of file
+  * <<sigar.SystemMetrics>> This adaptor collects CPU, disk, network
+    utilization as well as model and specifications of the machine, and
+    emits data as one Chunk periodically.
+
+---
+add sigar.SystemMetrics SystemMetrics 60 0
+---
+  This adaptor will take snapshots of system state every minute,
+  labeling output as SystemMetrics.
+
+  * <<SocketAdaptor>> This adaptor binds to a port and listen for Log4J
+    SocketAppender traffic.  Each logging entry is converted to one
+    chunk.
+
+---
+add SocketAdaptor JobSummary 9098 0
+---
+  This adaptor will bind to port 9098, and label output as JobSummary.

Copied: incubator/chukwa/trunk/src/site/apt/async_ack.apt (from r1208953, incubator/chukwa/trunk/src/docs/src/documentation/content/xdocs/async_ack.xml)
URL: http://svn.apache.org/viewvc/incubator/chukwa/trunk/src/site/apt/async_ack.apt?p2=incubator/chukwa/trunk/src/site/apt/async_ack.apt&p1=incubator/chukwa/trunk/src/docs/src/documentation/content/xdocs/async_ack.xml&r1=1208953&r2=1210068&rev=1210068&view=diff
==============================================================================
--- incubator/chukwa/trunk/src/docs/src/documentation/content/xdocs/async_ack.xml (original)
+++ incubator/chukwa/trunk/src/site/apt/async_ack.apt Sun Dec  4 08:01:33 2011
@@ -1,120 +1,89 @@
-<?xml version="1.0" encoding="UTF-8"?>
-<!--
-  Licensed to the Apache Software Foundation (ASF) under one or more
-  contributor license agreements.  See the NOTICE file distributed with
-  this work for additional information regarding copyright ownership.
-  The ASF licenses this file to You under the Apache License, Version 2.0
-  (the "License"); you may not use this file except in compliance with
-  the License.  You may obtain a copy of the License at
-
-      http://www.apache.org/licenses/LICENSE-2.0
-
-  Unless required by applicable law or agreed to in writing, software
-  distributed under the License is distributed on an "AS IS" BASIS,
-  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-  See the License for the specific language governing permissions and
-  limitations under the License.
--->
-<!DOCTYPE document PUBLIC "-//APACHE//DTD Documentation V2.0//EN" "http://forrest.apache.org/dtd/document-v20.dtd">
-
-<document>
-  <header>
-    <title>Asynchronous Acknowledgement</title>
-  </header>
-
-<body>
-
-<section>
-<title>Overview</title>
-<p>
-Chukwa supports two different reliability strategies.
+~~ Licensed to the Apache Software Foundation (ASF) under one or more
+~~ contributor license agreements.  See the NOTICE file distributed with
+~~ this work for additional information regarding copyright ownership.
+~~ The ASF licenses this file to You under the Apache License, Version 2.0
+~~ (the "License"); you may not use this file except in compliance with
+~~ the License.  You may obtain a copy of the License at
+~~
+~~     http://www.apache.org/licenses/LICENSE-2.0
+~~
+~~ Unless required by applicable law or agreed to in writing, software
+~~ distributed under the License is distributed on an "AS IS" BASIS,
+~~ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+~~ See the License for the specific language governing permissions and
+~~ limitations under the License.
+~~
+
+Overview
+
+  Chukwa supports two different reliability strategies.
 The first, default strategy, is as follows: collectors write data to HDFS, and
-as soon as the HDFS write call returns success, report success to the agent, which
-advances its checkpoint state.
-</p><p>
-This is potentially a problem if HDFS (or some other storage tier) has non-durable or
-asynchronous writes. As a result, Chukwa offers a mechanism, asynchronous acknowledgement,
-for coping with this case. </p>
-<p>
-This mechanism can be enabled by setting option <code>httpConnector.asyncAcks</code>.
-This option applies to both agents and collectors. On the collector side, it tells
-the collector to return asynchronous acknowledgements. On the agent side, it tells
-agents to look for and process them correctly. Agents with the option set to false
-should work OK with collectors where it's set to true. The reverse is not
-generally true: agents will expect a collector to be able to answer questions
-about the state of the filesystem.
-</p>
-</section>
-
-<section>
-<title>Theory</title>
-<p>
-In this approach, rather than try to build a fault tolerant collector, Chukwa agents look 
-<strong>through</strong> the collectors to the underlying state of the filesystem. This 
-filesystem state is what is used to detect and recover from failure. Recovery is 
-handled entirely by the agent, without requiring anything at all from the failed collector.
-</p>
-
-<p>
-When an agent sends data to a collector, the collector responds with the name of 
-the HDFS file in which the data will be stored and the future location of the 
-data within the file. This is very easy to compute -- since each file is only 
-written by a single collector, the only requirement is to enqueue the data and 
-add up lengths. </p>
-
-<p>Every few minutes, each agent process polls a collector to find the length of 
-each file to which data is being written. The length of the file is then compared 
-with the offset at which each chunk was to be written. If the file length exceeds 
-this value, then the data has been committed and the agent process advances its 
-checkpoint accordingly. (Note that the length returned by the filesystem is the 
-amount of data that has been successfully replicated.) There is nothing essential 
-about the role of collectors in monitoring the written files. Collectors store 
-no per-agent state. The reason to poll collectors, rather than the filesystem 
-directly, is to reduce the load on the filesystem master and to shield agents 
-from the details of the storage system. </p>
-
-<p>
-The collector component that handles these requests is 
-<code>datacollection.collector.servlet.CommitCheckServlet</code>.
-This will be started if <code>httpConnector.asyncAcks</code> is true in the
+as soon as the HDFS write call returns success, report success to the agent, 
+which advances its checkpoint state.
+
+  This is potentially a problem if HDFS (or some other storage tier) has 
+non-durable or asynchronous writes. As a result, Chukwa offers a mechanism, 
+asynchronous acknowledgement, for coping with this case.
+
+  This mechanism can be enabled by setting option <httpConnector.asyncAcks>.
+This option applies to both agents and collectors. On the collector side, it 
+tells the collector to return asynchronous acknowledgements. On the agent side,
+it tells agents to look for and process them correctly. Agents with the option 
+set to false should work OK with collectors where it's set to true. The 
+reverse is not generally true: agents will expect a collector to be able to 
+answer questions about the state of the filesystem.
+
+Theory
+
+  In this approach, rather than try to build a fault tolerant collector, 
+Chukwa agents look <<through>> the collectors to the underlying state of the 
+filesystem. This filesystem state is what is used to detect and recover from 
+failure. Recovery is handled entirely by the agent, without requiring anything 
+at all from the failed collector.
+
+  When an agent sends data to a collector, the collector responds with the name 
+of the HDFS file in which the data will be stored and the future location of 
+the data within the file. This is very easy to compute -- since each file is 
+only written by a single collector, the only requirement is to enqueue the 
+data and add up lengths.
+
+  Every few minutes, each agent process polls a collector to find the length of 
+each file to which data is being written. The length of the file is then 
+compared with the offset at which each chunk was to be written. If the file 
+length exceeds this value, then the data has been committed and the agent 
+process advances its checkpoint accordingly. (Note that the length returned by 
+the filesystem is the amount of data that has been successfully replicated.) 
+There is nothing essential about the role of collectors in monitoring the 
+written files. Collectors store no per-agent state. The reason to poll 
+collectors, rather than the filesystem directly, is to reduce the load on 
+the filesystem master and to shield agents from the details of the storage 
+system.
+
+  The collector component that handles these requests is 
+<datacollection.collector.servlet.CommitCheckServlet>.
+This will be started if <httpConnector.asyncAcks> is true in the
 collector configuration.
-</p>
 
-<p>On error, agents resume from their last checkpoint and pick a new collector. 
+  On error, agents resume from their last checkpoint and pick a new collector. 
 In the event of a failure, the total volume of data retransmitted is bounded by 
-the period between collector file rotations. </p>
+the period between collector file rotations.
 
-<!--
-This means that the fraction of duplicate data is the ratio of collector rotation 
-interval to the mean time between collector failures. Using the default five minute rotation interval, and assuming one crash per week on average, this means the fraction of duplicate data from this mechanism is 0.05\%, an acceptably low overhead. 
--->
-
-<p>The solution is end-to-end. Authoritative copies of data can only exist in two places:
- the nodes where data was originally produced, and the HDFS file system where it will 
- ultimately be stored. Collectors only hold soft state;  the only ``hard'' state 
- stored by Chukwa is the agent checkpoints. Below is a diagram of the 
- flow of messages in this protocol.</p>
-
-</section>
-
-<section>
-<title>Configuration</title>
-<p>
-In addition to <code>httpConnector.asyncAcks</code> (which enables asynchronous acknowledgement)
-a number of options affect this mode of operation.</p>
-<p>
-Option <code>chukwaCollector.asyncAcks.scanperiod</code> affects how often collectors will check
-the filesystem for commits. It defaults to twice the rotation interval.</p>
+  The solution is end-to-end. Authoritative copies of data can only exist in 
+two places: the nodes where data was originally produced, and the HDFS file 
+system where it will ultimately be stored. Collectors only hold soft state;  
+the only ``hard'' state stored by Chukwa is the agent checkpoints. Below is a 
+diagram of the flow of messages in this protocol.
 
-<p>
-Option <code>chukwaCollector.asyncAcks.scanpaths</code> determines where in HDFS
-collectors will look. It defaults to the data sink dir plus the archive dir.
-</p>
+Configuration
 
-<p>
-In the future, Zookeeper could be used instead to track rotations.
-</p>
-</section>
+  In addition to <httpConnector.asyncAcks> (which enables asynchronous 
+acknowledgement) a number of options affect this mode of operation.
+
+  Option <chukwaCollector.asyncAcks.scanperiod> affects how often collectors 
+will check the filesystem for commits. It defaults to twice the rotation 
+interval.
+
+  Option <chukwaCollector.asyncAcks.scanpaths> determines where in HDFS
+collectors will look. It defaults to the data sink dir plus the archive dir.
 
-</body>
-</document>
\ No newline at end of file
+  In the future, Zookeeper could be used instead to track rotations.

Copied: incubator/chukwa/trunk/src/site/apt/collector.apt (from r1208953, incubator/chukwa/trunk/src/docs/src/documentation/content/xdocs/collector.xml)
URL: http://svn.apache.org/viewvc/incubator/chukwa/trunk/src/site/apt/collector.apt?p2=incubator/chukwa/trunk/src/site/apt/collector.apt&p1=incubator/chukwa/trunk/src/docs/src/documentation/content/xdocs/collector.xml&r1=1208953&r2=1210068&rev=1210068&view=diff
==============================================================================
--- incubator/chukwa/trunk/src/docs/src/documentation/content/xdocs/collector.xml (original)
+++ incubator/chukwa/trunk/src/site/apt/collector.apt Sun Dec  4 08:01:33 2011
@@ -1,118 +1,185 @@
-<?xml version="1.0" encoding="UTF-8"?>
-<!--
-  Licensed to the Apache Software Foundation (ASF) under one or more
-  contributor license agreements.  See the NOTICE file distributed with
-  this work for additional information regarding copyright ownership.
-  The ASF licenses this file to You under the Apache License, Version 2.0
-  (the "License"); you may not use this file except in compliance with
-  the License.  You may obtain a copy of the License at
-
-      http://www.apache.org/licenses/LICENSE-2.0
-
-  Unless required by applicable law or agreed to in writing, software
-  distributed under the License is distributed on an "AS IS" BASIS,
-  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-  See the License for the specific language governing permissions and
-  limitations under the License.
--->
-<!DOCTYPE document PUBLIC "-//APACHE//DTD Documentation V2.0//EN" "http://forrest.apache.org/dtd/document-v20.dtd">
-
-<document>
-  <header>
-    <title>Chukwa Collector Setup Guide</title>
-  </header>
-  <body>
-  	<section>
-  	  <title>Basic Operation</title>
-  		<p>Chukwa Collectors are responsible for accepting incoming data from Agents,
-  		and storing the data.
-  		 Most commonly, collectors simply write all received to HDFS.  
-  		In this mode, the filesystem to write to is determined by the option
-  		<code>writer.hdfs.filesystem</code> in  <code>chukwa-collector-conf.xml</code>.
-  		 This is the only option that you really need to specify to get a working 
-  		 collector.
-  		</p>
-  		<p> By default, collectors listen on port 8080. This can be configured
-  		in <code>chukwa-collector.conf.xml</code></p>
-  	</section>
-  	
-  	<section><title>Configuration Knobs</title>
-  	<p>There's a bunch more "standard" knobs worth knowing about. These
-  	are mostly documented in <code>chukwa-collector-conf.xml</code></p>
+~~ Licensed to the Apache Software Foundation (ASF) under one or more
+~~ contributor license agreements.  See the NOTICE file distributed with
+~~ this work for additional information regarding copyright ownership.
+~~ The ASF licenses this file to You under the Apache License, Version 2.0
+~~ (the "License"); you may not use this file except in compliance with
+~~ the License.  You may obtain a copy of the License at
+~~
+~~     http://www.apache.org/licenses/LICENSE-2.0
+~~
+~~ Unless required by applicable law or agreed to in writing, software
+~~ distributed under the License is distributed on an "AS IS" BASIS,
+~~ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+~~ See the License for the specific language governing permissions and
+~~ limitations under the License.
+~~
+
+Basic Operation
+
+  Chukwa Collectors are responsible for accepting incoming data from Agents,
+and storing the data.  Most commonly, collectors simply write all received 
+to HDFS.  In this mode, the filesystem to write to is determined by the option
+<writer.hdfs.filesystem> in <chukwa-collector-conf.xml>.
+
+  This is the only option that you really need to specify to get a working 
+collector.
+
+  By default, collectors listen on port 8080. This can be configured
+in <chukwa-collector.conf.xml>
   	
-  	<p>
-  	It's also possible to do limited configuration on the command line. This is
-  	primarily intended for debugging.  You can say 'writer=pretend' to get the 
-  	collector to print incoming chunks on standard out, or portno=xyz to override
-  	the default port number.
-  	</p>
-  	 	<source>
-  	  bin/chukwa collector writer=pretend portno=8081
-  	</source>
-  	</section>
+Configuration Knobs
+
+  There's a bunch more "standard" knobs worth knowing about. These
+are mostly documented in <chukwa-collector-conf.xml>
   	
-  	<section><title>Advanced options</title>
-  	<p>
-  	  There are some advanced options, not necessarily documented in the
-  	collector conf file, that are helpful in using Chukwa in nonstandard ways.
-  	</p> <p>
-	    While normally Chukwa writes sequence files to HDFS, it's possible to
-	    specify an alternate Writer class. The option 
-	    <code>chukwaCollector.writerClass</code> specifies a Java class to instantiate
-	    and use as a writer. See the <code>ChukwaWriter</code> javadoc for details.
-	  </p>  <p>
-	  	One particularly useful Writer class is <code>PipelineStageWriter</code>, which
-	  	lets you string together a series of <code>PipelineableWriters</code>
-	  	for pre-processing or post-processing incoming data.
-	  	As an example, the SocketTeeWriter class allows other programs to get incoming chunks
-	  	fed to them over a socket by the collector.
-	  	</p>
-	  	
-	  	<p>Stages in the pipeline should be listed, comma-separated, in option 
-	  	<code>chukwaCollector.pipeline</code></p>
+  It's also possible to do limited configuration on the command line. This is
+primarily intended for debugging.  You can say 'writer=pretend' to get the 
+collector to print incoming chunks on standard out, or portno=xyz to override
+the default port number.
+
+---
+bin/chukwa collector writer=pretend portno=8081
+---
+
+Advanced options
+
+  There are some advanced options, not necessarily documented in the
+collector conf file, that are helpful in using Chukwa in nonstandard ways.
+While normally Chukwa writes sequence files to HDFS, it's possible to
+specify an alternate Writer class. The option 
+<chukwaCollector.writerClass> specifies a Java class to instantiate
+and use as a writer. See the <ChukwaWriter> javadoc for details.
+
+  One particularly useful Writer class is <PipelineStageWriter>, which
+lets you string together a series of <PipelineableWriters>
+for pre-processing or post-processing incoming data.
+As an example, the SocketTeeWriter class allows other programs to get 
+incoming chunks fed to them over a socket by the collector.
 	  	
-	  	<source>
-&#60;property&#62;
-  &#60;name&#62;chukwaCollector.writerClass&#60;/name&#62;
-  &#60;value&#62;org.apache.hadoop.chukwa.datacollection.writer.PipelineStageWriter&#60;/value&#62;
-&#60;/property&#62;
-
-&#60;property&#62;
-  &#60;name&#62;chukwaCollector.pipeline&#60;/name&#62;
-  &#60;value&#62;org.apache.hadoop.chukwa.datacollection.writer.SocketTeeWriter,org.apache.hadoop.chukwa.datacollection.writer.SeqFileWriter&#60;/value&#62;
-&#60;/property&#62;
-	  	</source>
+  Stages in the pipeline should be listed, comma-separated, in option 
+<chukwaCollector.pipeline>
 	  	
-	  	<section>
-	  	<title>SocketTeeWriter</title>
-	  	<p>
-	  		The <code>SocketTeeWriter</code> allows external processes to watch
-	  	the stream of chunks passing through the collector. This allows certain kinds
-	  	of real-time monitoring to be done on-top of Chukwa.</p>  
+---
+<property>
+  <name>chukwaCollector.writerClass</name>
+  <value>org.apache.hadoop.chukwa.datacollection.writer.PipelineStageWriter</value>
+</property>
+
+<property>
+  <name>chukwaCollector.pipeline</name>
+  <value>org.apache.hadoop.chukwa.datacollection.writer.SocketTeeWriter,org.apache.hadoop.chukwa.datacollection.writer.SeqFileWriter</value>
+</property>
+---
+
+HBaseWriter
+
+  The default writer to store data on HBase.  HBaseWriter runs Demux parsers
+inside for convert unstructured data to semi-structured data, then load the
+key value pairs to HBase table.  HBaseWriter has the following configuration:
+
+  * <hbase.demux.package> Demux parser class package, HBaseWriter uses this 
+    package name to validate HBase for annotated demux parser classes.
+
+---
+<property>
+  <name>hbase.demux.package</name>
+  <value>org.apache.hadoop.chukwa.extraction.demux.processor</value>
+</property>
+---
+
+  * <hbase.writer.verify.schema> Verify HBase Table schema with demux parser 
+    schema, log warning if there are mismatch between hbase schema and 
+    demux parsers.
+
+---
+<property>
+  <name>hbase.writer.verify.schema</name>
+  <value>false</value>
+</property>
+---
+
+  * <hbase.writer.halt.on.schema.mismatch> If this option is set to true, 
+    and HBase table schema is mismatched with demux parser, collector will 
+    shut down itself.
+
+---
+<property>
+  <name>hbase.writer.halt.on.schema.mismatch</name>
+  <value>false</value>
+</property>
+---
+
+LocalWriter
+
+  <LocalWriter> writes chunks of data to local disk then upload file to HDFS 
+as a whole file.  This writer is designed for high throughput environment.
+
+  * <chukwaCollector.localOutputDir> Location to buffer data before moving
+    data to HDFS.
+
+---
+<property>
+  <name>chukwaCollector.localOutputDir</name>
+  <value>/tmp/chukwa/logs</value>
+</property>
+---
+
+SeqFileWriter
+
+  The <SeqFileWriter> streams chunks of data to HDFS, and write data in
+temp filename with <.chukwa> suffix.  When the file is completed writing,
+the filename is renamed with <.done> suffix.  SeqFileWriter has the following
+configuration in <chukwa-collector-conf.xml>.
+
+  * <writer.hdfs.filesystem> Location to name node address
+
+  * <chukwaCollector.outputDir> Location of collect data sink directory
+
+  * <chukwaCollector.rotateInterval> File Rotation Interval
+
+  * <chukwaCollector.isFixedTimeRotatorScheme> A flag to indicate that the 
+    collector should close at a fixed offset after every rotateInterval. 
+    The default value is false which uses the default scheme where 
+    collectors close after regular rotateIntervals.
+    If set to true then specify chukwaCollector.fixedTimeIntervalOffset value.
+    e.g., if isFixedTimeRotatorScheme is true and fixedTimeIntervalOffset is
+    set to 10000 and rotateInterval is set to 300000, then the collector will
+    close its files at 10 seconds past the 5 minute mark, if
+    isFixedTimeRotatorScheme is false, collectors will rotate approximately
+    once every 5 minutes
+
+  * <chukwaCollector.fixedTimeIntervalOffset> Chukwa fixed time interval 
+    offset value (ms)
+
+SocketTeeWriter
+
+  The <SocketTeeWriter> allows external processes to watch
+the stream of chunks passing through the collector. This allows certain kinds
+of real-time monitoring to be done on-top of Chukwa.
 	  	
-	  	 <p>  
-	  	    SocketTeeWriter listens on a port (specified by conf option
-	  	 <code>chukwaCollector.tee.port</code>, defaulting to 9094.)  Applications
-	  	 that want Chunks should connect to that port, and issue a command of the form
-	  	 <code>RAW|WRITABLE &#60;filter&#62;\n</code>. Filters use the same syntax
-	  	 as the <a href="programming.html#Reading+data+from+the+sink+or+the+archive">
-	  	 Dump command</a>.  If the filter is accepted, the Writer will respond 
-	  	 <code>OK\n</code>.
-	  	 </p>
-	  	 <p>
-	  	 Subsequently, Chunks matching the filter will be serialized and sent back over the socket.
-	  	Specifying "WRITABLE" will cause the chunks to be written using Hadoop's 
-	  	Writable serialization framework. "RAW" will send the internal data of the
-	  	Chunk, without any metadata, prefixed by its length encoded as a 32-bit int,
-	  	big-endian.  "HEADER" is similar to "RAW", but with a one-line header in
-	  	front of the content. Header format is <code>hostname</code> 
-	  	<code>datatype</code> <code>stream name</code> <code>offset</code>, separated by spaces.
-	  	</p>
-	  	<p>
-	  	The filter will be de-activated when the socket is closed.
-	  	</p>
+  SocketTeeWriter listens on a port (specified by conf option
+<chukwaCollector.tee.port>, defaulting to 9094.)  Applications
+that want Chunks should connect to that port, and issue a command of the form
+<RAW|WRITABLE <filter>\n>. Filters use the same syntax
+as the {{{./programming.html#Reading+data+from+the+sink+or+the+archive}Dump command}}.  
+If the filter is accepted, the Writer will respond 
+	  	 <OK\n>.
 
-	  	<source>
+  Subsequently, Chunks matching the filter will be serialized and sent back 
+over the socket.  Specifying "WRITABLE" will cause the chunks to be written 
+using Hadoop's Writable serialization framework. "RAW" will send the internal 
+data of the Chunk, without any metadata, prefixed by its length encoded as 
+a 32-bit int, big-endian.  "HEADER" is similar to "RAW", but with a one-line 
+header in front of the content. Header format is:
+
+---
+<hostname> <datatype> <stream name> <offset>
+---
+  separated by spaces.
+
+  The filter will be de-activated when the socket is closed.
+
+---
 Socket s2 = new Socket("host", SocketTeeWriter.DEFAULT_PORT);
 s2.getOutputStream().write("RAW datatype=XTrace\n".getBytes());
 dis = new DataInputStream(s2.getInputStream());
@@ -123,17 +190,79 @@ while(true) {
    dis.readFully(data);
    DoSomethingUsing(data);
 }
-	  	</source>
-	  	</section>
+---
 	  	
-	  	<section>
-	  	<title>
-	  	Acknowledgement mode</title>
-	  	<p>
-	  	See  <a href="async_ack.html">The asynchronous acknowledgement</a> documentation
-	  	to learn how to enable and control that feature
-	  	</p></section>
-
-  	</section>
-  </body>
-</document>
+Acknowledgement mode
+
+  Chukwa supports two different reliability strategies.
+The first, default strategy, is as follows: collectors write data to HDFS, and
+as soon as the HDFS write call returns success, report success to the agent,
+which advances its checkpoint state.
+
+  This is potentially a problem if HDFS (or some other storage tier) has
+non-durable or asynchronous writes. As a result, Chukwa offers a mechanism,
+asynchronous acknowledgement, for coping with this case.
+
+  This mechanism can be enabled by setting option <httpConnector.asyncAcks>.
+This option applies to both agents and collectors. On the collector side, it
+tells the collector to return asynchronous acknowledgements. On the agent side,
+it tells agents to look for and process them correctly. Agents with the option
+set to false should work OK with collectors where it's set to true. The
+reverse is not generally true: agents will expect a collector to be able to
+answer questions about the state of the filesystem.
+
+* Theory
+
+  In this approach, rather than try to build a fault tolerant collector,
+Chukwa agents look <<through>> the collectors to the underlying state of the
+filesystem. This filesystem state is what is used to detect and recover from
+failure. Recovery is handled entirely by the agent, without requiring anything
+at all from the failed collector.
+
+  When an agent sends data to a collector, the collector responds with the name
+of the HDFS file in which the data will be stored and the future location of
+the data within the file. This is very easy to compute -- since each file is
+only written by a single collector, the only requirement is to enqueue the
+data and add up lengths.
+
+  Every few minutes, each agent process polls a collector to find the length of
+each file to which data is being written. The length of the file is then
+compared with the offset at which each chunk was to be written. If the file
+length exceeds this value, then the data has been committed and the agent
+process advances its checkpoint accordingly. (Note that the length returned by
+the filesystem is the amount of data that has been successfully replicated.)
+There is nothing essential about the role of collectors in monitoring the
+written files. Collectors store no per-agent state. The reason to poll
+collectors, rather than the filesystem directly, is to reduce the load on
+the filesystem master and to shield agents from the details of the storage
+system.
+
+  The collector component that handles these requests is
+<datacollection.collector.servlet.CommitCheckServlet>.
+This will be started if <httpConnector.asyncAcks> is true in the
+collector configuration.
+
+  On error, agents resume from their last checkpoint and pick a new collector.
+In the event of a failure, the total volume of data retransmitted is bounded by
+the period between collector file rotations.
+
+  The solution is end-to-end. Authoritative copies of data can only exist in
+two places: the nodes where data was originally produced, and the HDFS file
+system where it will ultimately be stored. Collectors only hold soft state;
+the only ``hard'' state stored by Chukwa is the agent checkpoints. Below is a
+diagram of the flow of messages in this protocol.
+
+* Configuration
+
+  In addition to <httpConnector.asyncAcks> (which enables asynchronous
+acknowledgement) a number of options affect this mode of operation.
+
+  * <chukwaCollector.asyncAcks.scanperiod> affects how often collectors
+will check the filesystem for commits. It defaults to twice the rotation
+interval.
+
+  * <chukwaCollector.asyncAcks.scanpaths> determines where in HDFS
+collectors will look. It defaults to the data sink dir plus the archive dir.
+
+  In the future, Zookeeper could be used instead to track rotations.
+



Mime
View raw message