chukwa-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From asrab...@apache.org
Subject svn commit: r800552 - in /hadoop/chukwa/trunk/src/docs/src/documentation/content/xdocs: agent.xml collector.xml
Date Mon, 03 Aug 2009 20:38:03 GMT
Author: asrabkin
Date: Mon Aug  3 20:38:03 2009
New Revision: 800552

URL: http://svn.apache.org/viewvc?rev=800552&view=rev
Log:
CHUKWA-350. More documentation

Modified:
    hadoop/chukwa/trunk/src/docs/src/documentation/content/xdocs/agent.xml
    hadoop/chukwa/trunk/src/docs/src/documentation/content/xdocs/collector.xml

Modified: hadoop/chukwa/trunk/src/docs/src/documentation/content/xdocs/agent.xml
URL: http://svn.apache.org/viewvc/hadoop/chukwa/trunk/src/docs/src/documentation/content/xdocs/agent.xml?rev=800552&r1=800551&r2=800552&view=diff
==============================================================================
--- hadoop/chukwa/trunk/src/docs/src/documentation/content/xdocs/agent.xml (original)
+++ hadoop/chukwa/trunk/src/docs/src/documentation/content/xdocs/agent.xml Mon Aug  3 20:38:03
2009
@@ -25,15 +25,31 @@
 
 <section>
 <title>Overview</title>
-<p>In a normal Chukwa installation, an <em>Agent</em> process runs on every
machine being monitored. This process is responsible for all the data collection on that host.
 Data collection might mean periodically running a Unix command, or tailing a file, or listening
for incoming UDP packets.</p>
-
-<p>Each particular data source corresponds to a so-called <em>Adaptor</em>.
Adaptors are dynamically loadable modules that run inside the Agent process. There is generally
one Adaptor for each data source: for each file being watched or for each Unix command being
executed. Each adaptor has a unique name. If you do not specify a name, one will be autogenerated
by hashing the Adaptor type and parameters.</p>
-
-<p>There are a number of Adaptors built into Chukwa, and you can also develop your
own. Chukwa will use them if you add them to the Chukwa library search path (e.g., by putting
them in a jarfile in <code>/lib</code>.)</p>
+<p>In a normal Chukwa installation, an <em>Agent</em> process runs on every

+machine being monitored. This process is responsible for all the data collection
+ on that host.  Data collection might mean periodically running a Unix command,
+  or tailing a file, or listening for incoming UDP packets.</p>
+
+<p>Each particular data source corresponds to a so-called <em>Adaptor</em>.

+Adaptors are dynamically loadable modules that run inside the Agent process. 
+There is generally one Adaptor for each data source: for each file being watched 
+or for each Unix command being executed. Each adaptor has a unique name. If you 
+do not specify a name, one will be autogenerated by hashing the Adaptor type and
+parameters.</p>
+
+<p>There are a number of Adaptors built into Chukwa, and you can also develop
+your own. Chukwa will use them if you add them to the Chukwa library search path
+ (e.g., by putting them in a jarfile in <code>/lib</code>.)</p>
 </section>
 
 <section><title>Data Model</title>
-<p>Chukwa Adaptors emit data in <em>Chunks</em>. A Chunk is a sequence
of bytes, with some metadata. Several of these are set automatically by the Agent or Adaptors.
Two of them require user intervention: <code>cluster name</code> and <code>datatype</code>.
 Cluster name is specified in <code>conf/chukwa-env.sh</code>, and is global to
each Agent process.  Datatype describes the expected format of the data collected by an Adaptor
instance, and it is specified when that instance is started. </p>
+<p>Chukwa Adaptors emit data in <em>Chunks</em>. A Chunk is a sequence
of bytes,
+ with some metadata. Several of these are set automatically by the Agent or 
+ Adaptors. Two of them require user intervention: <code>cluster name</code> and

+ <code>datatype</code>.  Cluster name is specified in <code>conf/chukwa-env.sh</code>,
+  and is global to each Agent process.  Datatype describes the expected format 
+  of the data collected by an Adaptor instance, and it is specified when that 
+  instance is started. </p>
 
 <p>The following table lists the Chunk metadata fields. 
 </p>
@@ -41,16 +57,28 @@
 <table>
 <tr><td>Field</td><td>Meaning</td><td>Source</td></tr>
 <tr><td>Source</td><td>Hostname where Chunk was generated</td><td>Automatic</td></tr>
-<tr><td>Cluster</td><td>Cluster host is associated with</td><td>Specified
by user in agent config</td></tr>
-<tr><td>Datatype</td><td>Format of output</td><td>Specified
by user when Adaptor started</td></tr>
-<tr><td>Sequence ID</td><td>Offset of Chunk in stream</td><td>Automatic,
initial offset specified when Adaptor started</td></tr>
+<tr><td>Cluster</td><td>Cluster host is associated with</td><td>Specified
by user
+ in agent config</td></tr>
+<tr><td>Datatype</td><td>Format of output</td><td>Specified
by user when Adaptor
+ started</td></tr>
+<tr><td>Sequence ID</td><td>Offset of Chunk in stream</td><td>Automatic,
initial
+ offset specified when Adaptor started</td></tr>
 <tr><td>Name</td><td>Name of data source</td><td>Automatic,
chosen by Adaptor</td></tr>
 </table>
 
-<p>Conceptually, each Adaptor emits a semi-infinite stream of bytes, numbered starting
from zero. The sequence ID specifies how many bytes each Adaptor has sent, including the current
chunk.  So if an adaptor emits a chunk containing the first 100 bytes from a file, the sequenceID
of that Chunk will be 100. And the second hundred bytes will have sequence ID 200.  This may
seem a little peculiar, but it's actually the same way that TCP sequence numbers work.
+<p>Conceptually, each Adaptor emits a semi-infinite stream of bytes, numbered
+ starting from zero. The sequence ID specifies how many bytes each Adaptor has
+ sent, including the current chunk.  So if an adaptor emits a chunk containing
+ the first 100 bytes from a file, the sequenceID of that Chunk will be 100. 
+ And the second hundred bytes will have sequence ID 200.  This may seem a 
+ little peculiar, but it's actually the same way that TCP sequence numbers work.
 </p>
 
-<p>Adaptors need to take sequence ID as a parameter so that they can resume correctly
after a crash, and not send redundant data. When starting adaptors, it's usually save to specify
0 as an ID, but it's sometimes useful to specify something else. For instance, it lets you
do things like only tail the second half of a file. 
+<p>Adaptors need to take sequence ID as a parameter so that they can resume 
+correctly after a crash, and not send redundant data. When starting adaptors, 
+it's usually save to specify 0 as an ID, but it's sometimes useful to specify 
+something else. For instance, it lets you do things like only tail the second 
+half of a file. 
 </p>
 </section>
 
@@ -58,7 +86,9 @@
 <section>
 <title>Agent Control</title>
 
-<p>Once an Agent process is running, there are a number of commands that you can use
to inspect and control it.  By default, Agents listen for incoming commands on port 9093.
Commands are case-insensitive</p>
+<p>Once an Agent process is running, there are a number of commands that you can
+ use to inspect and control it.  By default, Agents listen for incoming commands
+  on port 9093. Commands are case-insensitive</p>
 
 <table>
 <tr><td>Command</td><td>Purpose</td><td>Options</td></tr>
@@ -74,19 +104,42 @@
 </table>
 
 
-<p>The add command is by far the most complex; it takes several mandatory and optional
parameters. The general form is as follows:</p>
+<p>The add command is by far the most complex; it takes several mandatory and 
+optional parameters. The general form is as follows:</p>
 <source>
-add [name =] &#60;adaptor_class_name&#62; &#60;datatype&#62; &#60;adaptor
specific params&#62; &#60;initial offset&#62;. 
+add [name =] &#60;adaptor_class_name&#62; &#60;datatype&#62; &#60;adaptor

+specific params&#62; &#60;initial offset&#62;. 
 </source>
 
 <p>
-There are four mandatory fields: The word <code>add</code>, the class name for
the Adaptor, the datatype of the Adaptor's output, and the sequence number for the first byte.
 There are two optional fields; the adaptor instance name, and the adaptor parameters.
+There are four mandatory fields: The word <code>add</code>, the class name for

+the Adaptor, the datatype of the Adaptor's output, and the sequence number for 
+the first byte.  There are two optional fields; the adaptor instance name, and 
+the adaptor parameters.
 </p>
 
-<p>The adaptor name, if specified, should go after the add command, and be followed
with an equals sign. It should be a string of printable characters, without whitespace or
'='.  
+<p>The adaptor name, if specified, should go after the add command, and be 
+followed with an equals sign. It should be a string of printable characters, 
+without whitespace or '='.  
 </p>
 
-<p>Adaptor parameters aren't required by the add command, but adaptor implementations
may have both mandatory and optional parameters. See below.</p>
+<p>Adaptor parameters aren't required by the add command, but adaptor 
+implementations may have both mandatory and optional parameters. See below.</p>
+</section>
+
+<section>
+<title>Command-line options</title>
+<p>Normally, agents are configured via the file <code>conf/chukwa-agent-conf.xml.</code>
+However, there are a few command-line options that are sometimes useful in
+troubleshooting. If you specify "local" as an option, then the agent will print
+chunks to standard out, rather than to a collector. If you specify a URI, then
+that will be used as collector, overriding the collectors specified in
+<code>conf/collectors</code>.  These options are intended for testing and debugging,
+not for production use.</p>
+
+<source>
+bin/agent.sh local
+</source>
 </section>
 
 <section> 
@@ -94,40 +147,57 @@
 <p>This section lists the standard adaptors, and the arguments they take.</p>
 
 <ul>
-<li><strong>FileAdaptor</strong>: Pushes a whole file, as one Chunk, then
exits. Takes one mandatory parameter; the file to push.
+<li><strong>FileAdaptor</strong>: Pushes a whole file, as one Chunk, then
exits.
+ Takes one mandatory parameter; the file to push.
 
 <source>add FileTailer FooData /tmp/foo 0</source>
 This pushes file <code>/tmp/foo</code> as one chunk, with datatype <code>FooData</code>.
 </li>
 <li><strong>filetailer.FileTailingAdaptor</strong>
- Repeatedly tails a file, treating the file as a sequence of bytes, ignoring the content.
Chunk boundaries are arbitrary. This is useful for streaming binary data. Takes one mandatory
parameter; a path to the file to tail.
+ Repeatedly tails a file, treating the file as a sequence of bytes, ignoring the
+  content. Chunk boundaries are arbitrary. This is useful for streaming binary 
+  data. Takes one mandatory parameter; a path to the file to tail.
 <source>add filetailer.FileTailingAdaptor BarData /foo/bar 0</source>
 This pushes <code>/foo/bar</code> in a sequence of Chunks of type <code>BarData</code>
 
 </li>
 <li><strong>filetailer.CharFileTailingAdaptorUTF8</strong>
-The same, except that chunks are guaranteed to end only at carriage returns. This is useful
for most ASCII log file formats.
+The same, except that chunks are guaranteed to end only at carriage returns.
+ This is useful for most ASCII log file formats.
 </li>
 
 <li><strong>filetailer.CharFileTailingAdaptorUTF8NewLineEscaped</strong>
- The same, except that chunks are guaranteed to end only at non-escaped carriage returns.
This is useful for pushing Chukwa-formatted log files, where exception stack traces stay in
a single chunk.
+ The same, except that chunks are guaranteed to end only at non-escaped carriage
+  returns. This is useful for pushing Chukwa-formatted log files, where exception
+   stack traces stay in a single chunk.
 </li>
 
-<li><strong>DirTailingAdaptor</strong> Takes a directory path and a second
adaptor name as mandatory parameters; repeatedly scans that directory and all subdirectories,
and starts the indicated adaptor running on each file.
+<li><strong>DirTailingAdaptor</strong> Takes a directory path and a second
+ adaptor name as mandatory parameters; repeatedly scans that directory and all
+ subdirectories, and starts the indicated adaptor running on each file. Since
+ the DirTailingAdaptor does not, itself, emit data, the datatype parameter is 
+ applied to the newly-spawned adaptors.  Note  that if you try this on a large 
+ directory, it is possible to exceed your system's limit on open files.
 
 <source>add DirTailingAdaptor logs /var/log/ filetailer.CharFileTailingAdaptorUTF8
0</source>
 
 </li>
-<li><strong>ExecAdaptor</strong> Takes a frequency (in miliseconds) as
optional parameter, and then program name as mandatory parameter. Runs that program repeatedly
at a rate specified by frequency.
+<li><strong>ExecAdaptor</strong> Takes a frequency (in miliseconds) as
optional 
+parameter, and then program name as mandatory parameter. Runs that program 
+repeatedly at a rate specified by frequency.
 
 <source>add ExecAdaptor Df 60000 /bin/df -x nfs -x none 0</source>
  This adaptor will run <code>df</code> every minute, labelling output as Df.
 </li>
 
-<li><strong>edu.berkeley.chukwa_xtrace.XtrAdaptor</strong> (available in
contrib) Takes an <a href="http://www.x-trace.net/wiki/doku.php">Xtrace</a> ReportSource
classname [without package] as mandatory argument, and no optional parameters.  Listens for
incoming reports in the same way as that ReportSource would.
+<li><strong>edu.berkeley.chukwa_xtrace.XtrAdaptor</strong> (available in
contrib)
+ Takes an <a href="http://www.x-trace.net/wiki/doku.php">Xtrace</a> ReportSource
+ classname [without package] as mandatory argument, and no optional parameters.
+ Listens for incoming reports in the same way as that ReportSource would.
 
 <source>add edu.berkeley.chukwa_xtrace.XtrAdaptor Xtrace UdpReportSource 0</source>
- This adaptor will create and start a <code>UdpReportSource</code>, labeling
its output datatype as Xtrace.
+ This adaptor will create and start a <code>UdpReportSource</code>, labeling
its
+  output datatype as Xtrace.
 </li>
 </ul>
 

Modified: hadoop/chukwa/trunk/src/docs/src/documentation/content/xdocs/collector.xml
URL: http://svn.apache.org/viewvc/hadoop/chukwa/trunk/src/docs/src/documentation/content/xdocs/collector.xml?rev=800552&r1=800551&r2=800552&view=diff
==============================================================================
--- hadoop/chukwa/trunk/src/docs/src/documentation/content/xdocs/collector.xml (original)
+++ hadoop/chukwa/trunk/src/docs/src/documentation/content/xdocs/collector.xml Mon Aug  3
20:38:03 2009
@@ -36,6 +36,16 @@
   	<section><title>Configuration Knobs</title>
   	<p>There's a bunch more "standard" knobs worth knowing about. These
   	are mostly documented in <code>chukwa-collector-conf.xml</code></p>
+  	
+  	<p>
+  	It's also possible to do limited configuration on the command line. This is
+  	primarily intended for debugging.  You can say 'writer=pretend' to get the 
+  	collector to print incoming chunks on standard out, or portno=xyz to override
+  	the default port number.
+  	</p>
+  	 	<source>
+  	  bin/jettyCollector.sh writer=pretend portno=8081
+  	</source>
   	</section>
   	
   	<section><title>Advanced options</title>



Mime
View raw message