metron-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From ma...@apache.org
Subject svn commit: r20216 [16/18] - in /dev/metron/0.4.0-RC4: ./ site-book/ site-book/css/ site-book/images/ site-book/images/logos/ site-book/images/profiles/ site-book/img/ site-book/js/ site-book/metron-analytics/ site-book/metron-analytics/metron-maas-ser...
Date Tue, 27 Jun 2017 18:15:56 GMT
Added: dev/metron/0.4.0-RC4/site-book/metron-platform/metron-parsers/index.html
==============================================================================
--- dev/metron/0.4.0-RC4/site-book/metron-platform/metron-parsers/index.html (added)
+++ dev/metron/0.4.0-RC4/site-book/metron-platform/metron-parsers/index.html Tue Jun 27 18:15:56 2017
@@ -0,0 +1,697 @@
+<!DOCTYPE html>
+<!--
+ | Generated by Apache Maven Doxia at 2017-06-27
+ | Rendered using Apache Maven Fluido Skin 1.3.0
+-->
+<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
+  <head>
+    <meta charset="UTF-8" />
+    <meta name="viewport" content="width=device-width, initial-scale=1.0" />
+    <meta name="Date-Revision-yyyymmdd" content="20170627" />
+    <meta http-equiv="Content-Language" content="en" />
+    <title>Metron &#x2013; Parsers</title>
+    <link rel="stylesheet" href="../../css/apache-maven-fluido-1.3.0.min.css" />
+    <link rel="stylesheet" href="../../css/site.css" />
+    <link rel="stylesheet" href="../../css/print.css" media="print" />
+
+      
+    <script type="text/javascript" src="../../js/apache-maven-fluido-1.3.0.min.js"></script>
+
+                          
+        
+<script type="text/javascript">$( document ).ready( function() { $( '.carousel' ).carousel( { interval: 3500 } ) } );</script>
+          
+            </head>
+        <body class="topBarDisabled">
+          
+                
+                    
+    
+        <div class="container-fluid">
+          <div id="banner">
+        <div class="pull-left">
+                                    <a href="http://metron.apache.org/" id="bannerLeft">
+                                                                                                <img src="../../images/metron-logo.png"  alt="Apache Metron" width="148px" height="48px"/>
+                </a>
+                      </div>
+        <div class="pull-right">  </div>
+        <div class="clear"><hr/></div>
+      </div>
+
+      <div id="breadcrumbs">
+        <ul class="breadcrumb">
+                
+                    
+                              <li class="">
+                    <a href="http://www.apache.org" class="externalLink" title="Apache">
+        Apache</a>
+        </li>
+      <li class="divider ">/</li>
+            <li class="">
+                    <a href="http://metron.apache.org/" class="externalLink" title="Metron">
+        Metron</a>
+        </li>
+      <li class="divider ">/</li>
+            <li class="">
+                    <a href="../../index.html" title="Documentation">
+        Documentation</a>
+        </li>
+      <li class="divider ">/</li>
+        <li class="">Parsers</li>
+        
+                
+                    
+                  <li id="publishDate" class="pull-right">Last Published: 2017-06-27</li> <li class="divider pull-right">|</li>
+              <li id="projectVersion" class="pull-right">Version: 0.4.0</li>
+            
+                            </ul>
+      </div>
+
+            
+      <div class="row-fluid">
+        <div id="leftColumn" class="span3">
+          <div class="well sidebar-nav">
+                
+                    
+                <ul class="nav nav-list">
+                    <li class="nav-header">User Documentation</li>
+                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          
+      <li>
+    
+                          <a href="../../index.html" title="Metron">
+          <i class="icon-chevron-down"></i>
+        Metron</a>
+                    <ul class="nav nav-list">
+                      
+      <li>
+    
+                          <a href="../../Upgrading.html" title="Upgrading">
+          <i class="none"></i>
+        Upgrading</a>
+            </li>
+                                                                                                                                                      
+      <li>
+    
+                          <a href="../../metron-analytics/index.html" title="Analytics">
+          <i class="icon-chevron-right"></i>
+        Analytics</a>
+                  </li>
+                                                                                                                                                                                                                                                                                                                                                                                    
+      <li>
+    
+                          <a href="../../metron-deployment/index.html" title="Deployment">
+          <i class="icon-chevron-right"></i>
+        Deployment</a>
+                  </li>
+                      
+      <li>
+    
+                          <a href="../../metron-docker/index.html" title="Docker">
+          <i class="none"></i>
+        Docker</a>
+            </li>
+                      
+      <li>
+    
+                          <a href="../../metron-interface/metron-config/index.html" title="Config">
+          <i class="none"></i>
+        Config</a>
+            </li>
+                      
+      <li>
+    
+                          <a href="../../metron-interface/metron-rest/index.html" title="Rest">
+          <i class="none"></i>
+        Rest</a>
+            </li>
+                                                                                                                                                                                                                                                          
+      <li>
+    
+                          <a href="../../metron-platform/index.html" title="Platform">
+          <i class="icon-chevron-down"></i>
+        Platform</a>
+                    <ul class="nav nav-list">
+                      
+      <li>
+    
+                          <a href="../../metron-platform/metron-api/index.html" title="Api">
+          <i class="none"></i>
+        Api</a>
+            </li>
+                                                                        
+      <li>
+    
+                          <a href="../../metron-platform/metron-common/index.html" title="Common">
+          <i class="icon-chevron-right"></i>
+        Common</a>
+                  </li>
+                      
+      <li>
+    
+                          <a href="../../metron-platform/metron-data-management/index.html" title="Data-management">
+          <i class="none"></i>
+        Data-management</a>
+            </li>
+                      
+      <li>
+    
+                          <a href="../../metron-platform/metron-enrichment/index.html" title="Enrichment">
+          <i class="none"></i>
+        Enrichment</a>
+            </li>
+                      
+      <li>
+    
+                          <a href="../../metron-platform/metron-indexing/index.html" title="Indexing">
+          <i class="none"></i>
+        Indexing</a>
+            </li>
+                      
+      <li>
+    
+                          <a href="../../metron-platform/metron-management/index.html" title="Management">
+          <i class="none"></i>
+        Management</a>
+            </li>
+                      
+      <li class="active">
+    
+            <a href="#"><i class="none"></i>Parsers</a>
+          </li>
+                      
+      <li>
+    
+                          <a href="../../metron-platform/metron-pcap-backend/index.html" title="Pcap-backend">
+          <i class="none"></i>
+        Pcap-backend</a>
+            </li>
+                      
+      <li>
+    
+                          <a href="../../metron-platform/metron-writer/index.html" title="Writer">
+          <i class="none"></i>
+        Writer</a>
+            </li>
+              </ul>
+        </li>
+                                                                                                            
+      <li>
+    
+                          <a href="../../metron-sensors/index.html" title="Sensors">
+          <i class="icon-chevron-right"></i>
+        Sensors</a>
+                  </li>
+              </ul>
+        </li>
+            </ul>
+                
+                    
+                
+          <hr class="divider" />
+
+           <div id="poweredBy">
+                            <div class="clear"></div>
+                            <div class="clear"></div>
+                            <div class="clear"></div>
+                             <a href="http://maven.apache.org/" title="Built by Maven" class="poweredBy">
+        <img class="builtBy" alt="Built by Maven" src="../../images/logos/maven-feather.png" />
+      </a>
+                  </div>
+          </div>
+        </div>
+        
+                
+        <div id="bodyColumn"  class="span9" >
+                                  
+            <h1>Parsers</h1>
+<p><a name="Parsers"></a></p>
+<div class="section">
+<h2><a name="Introduction"></a>Introduction</h2>
+<p>Parsers are pluggable components which are used to transform raw data (textual or raw bytes) into JSON messages suitable for downstream enrichment and indexing. </p>
+<p>There are two general types types of parsers:</p>
+
+<ul>
+  
+<li>A parser written in Java which conforms to the <tt>MessageParser</tt> interface. This kind of parser is optimized for speed and performance and is built for use with higher velocity topologies. These parsers are not easily modifiable and in order to make changes to them the entire topology need to be recompiled.</li>
+  
+<li>A general purpose parser. This type of parser is primarily designed for lower-velocity topologies or for quickly standing up a parser for a new telemetry before a permanent Java parser can be written for it. As of the time of this writing, we have:
+  
+<ul>
+    
+<li>Grok parser: <tt>org.apache.metron.parsers.GrokParser</tt> with possible <tt>parserConfig</tt> entries of
+    
+<ul>
+      
+<li><tt>grokPath</tt> : The path in HDFS (or in the Jar) to the grok statement</li>
+      
+<li><tt>patternLabel</tt> : The pattern label to use from the grok statement</li>
+      
+<li><tt>timestampField</tt> : The field to use for timestamp</li>
+      
+<li><tt>timeFields</tt> : A list of fields to be treated as time</li>
+      
+<li><tt>dateFormat</tt> : The date format to use to parse the time fields</li>
+      
+<li><tt>timezone</tt> : The timezone to use. <tt>UTC</tt> is default.</li>
+    </ul></li>
+    
+<li>CSV Parser: <tt>org.apache.metron.parsers.csv.CSVParser</tt> with possible <tt>parserConfig</tt> entries of
+    
+<ul>
+      
+<li><tt>timestampFormat</tt> : The date format of the timestamp to use. If unspecified, the parser assumes the timestamp is ms since unix epoch.</li>
+      
+<li><tt>columns</tt> : A map of column names you wish to extract from the CSV to their offsets (e.g. <tt>{ 'name' : 1, 'profession' : 3}</tt> would be a column map for extracting the 2nd and 4th columns from a CSV)</li>
+      
+<li><tt>separator</tt> : The column separator, <tt>,</tt> by default. just</li>
+    </ul></li>
+  </ul></li>
+</ul></div>
+<div class="section">
+<h2><a name="Parser_Architecture"></a>Parser Architecture</h2>
+<p><img src="../../images/parser_arch.png" alt="Architecture" /></p>
+<p>Data flows through the parser bolt via kafka and into the <tt>enrichments</tt> topology in kafka. Errors are collected with the context of the error (e.g. stacktrace) and original message causing the error and sent to an <tt>error</tt> queue. Invalid messages as determined by global validation functions are also treated as errors and sent to an <tt>error</tt> queue. </p></div>
+<div class="section">
+<h2><a name="Message_Format"></a>Message Format</h2>
+<p>All Metron messages follow a specific format in order to ingest a message. If a message does not conform to this format it will be dropped and put onto an error queue for further examination. The message must be of a JSON format and must have a JSON tag message like so:</p>
+
+<div class="source">
+<div class="source">
+<pre>{&quot;message&quot; : message content}
+</pre></div></div>
+<p>Where appropriate there is also a standardization around the 5-tuple JSON fields. This is done so the topology correlation engine further down stream can correlate messages from different topologies by these fields. We are currently working on expanding the message standardization beyond these fields, but this feature is not yet availabe. The standard field names are as follows:</p>
+
+<ul>
+  
+<li>ip_src_addr: layer 3 source IP</li>
+  
+<li>ip_dst_addr: layer 3 dest IP</li>
+  
+<li>ip_src_port: layer 4 source port</li>
+  
+<li>ip_dst_port: layer 4 dest port</li>
+  
+<li>protocol: layer 4 protocol</li>
+  
+<li>timestamp (epoch)</li>
+  
+<li>original_string: A human friendly string representation of the message</li>
+</ul>
+<p>The timestamp and original_string fields are madatory. The remaining standard fields are optional. If any of the optional fields are not applicable then the field should be left out of the JSON.</p>
+<p>So putting it all together a typical Metron message with all 5-tuple fields present would look like the following:</p>
+
+<div class="source">
+<div class="source">
+<pre>{
+&quot;message&quot;: 
+{&quot;ip_src_addr&quot;: xxxx, 
+&quot;ip_dst_addr&quot;: xxxx, 
+&quot;ip_src_port&quot;: xxxx, 
+&quot;ip_dst_port&quot;: xxxx, 
+&quot;protocol&quot;: xxxx, 
+&quot;original_string&quot;: xxx,
+&quot;additional-field 1&quot;: xxx,
+}
+}
+</pre></div></div></div>
+<div class="section">
+<h2><a name="Global_Configuration"></a>Global Configuration</h2>
+<p>See the &#x201c;<a href="../metron-common/index.html">Global Configuration</a>&#x201d; section.</p></div>
+<div class="section">
+<h2><a name="Parser_Configuration"></a>Parser Configuration</h2>
+<p>The configuration for the various parser topologies is defined by JSON documents stored in zookeeper.</p>
+<p>The document is structured in the following way</p>
+
+<ul>
+  
+<li><tt>parserClassName</tt> : The fully qualified classname for the parser to be used.</li>
+  
+<li><tt>filterClassName</tt> : The filter to use. This may be a fully qualified classname of a Class that implements the <tt>org.apache.metron.parsers.interfaces.MessageFilter&lt;JSONObject&gt;</tt> interface. Message Filters are intended to allow the user to ignore a set of messages via custom logic. The existing implementations are:
+  
+<ul>
+    
+<li><tt>STELLAR</tt> : Allows you to apply a stellar statement which returns a boolean, which will pass every message for which the statement returns <tt>true</tt>. The Stellar statement that is to be applied is specified by the <tt>filter.query</tt> property in the <tt>parserConfig</tt>. Example Stellar Filter which includes messages which contain a the <tt>field1</tt> field:</li>
+  </ul></li>
+</ul>
+
+<div class="source">
+<div class="source">
+<pre>   {
+    &quot;filterClassName&quot; : &quot;STELLAR&quot;
+   ,&quot;parserConfig&quot; : {
+    &quot;filter.query&quot; : &quot;exists(field1)&quot;
+    }
+   }
+</pre></div></div>
+
+<ul>
+  
+<li><tt>sensorTopic</tt> : The kafka topic to send the parsed messages to.</li>
+  
+<li><tt>parserConfig</tt> : A JSON Map representing the parser implementation specific configuration.</li>
+  
+<li><tt>fieldTransformations</tt> : An array of complex objects representing the transformations to be done on the message generated from the parser before writing out to the kafka topic.</li>
+</ul>
+<p>The <tt>fieldTransformations</tt> is a complex object which defines a transformation which can be done to a message. This transformation can </p>
+
+<ul>
+  
+<li>Modify existing fields to a message</li>
+  
+<li>Add new fields given the values of existing fields of a message</li>
+  
+<li>Remove existing fields of a message</li>
+</ul>
+<div class="section">
+<h3><a name="fieldTransformation_configuration"></a><tt>fieldTransformation</tt> configuration</h3>
+<p>The format of a <tt>fieldTransformation</tt> is as follows:</p>
+
+<ul>
+  
+<li><tt>input</tt> : An array of fields or a single field representing the input. This is optional; if unspecified, then the whole message is passed as input.</li>
+  
+<li><tt>output</tt> : The outputs to produce from the transformation. If unspecified, it is assumed to be the same as inputs.</li>
+  
+<li><tt>transformation</tt> : The fully qualified classname of the transformation to be used. This is either a class which implements <tt>FieldTransformation</tt> or a member of the <tt>FieldTransformations</tt> enum.</li>
+  
+<li><tt>config</tt> : A String to Object map of transformation specific configuration.</li>
+</ul>
+<p>The currently implemented fieldTransformations are:</p>
+
+<ul>
+  
+<li><tt>REMOVE</tt> : This transformation removes the specified input fields. If you want a conditional removal, you can pass a Metron Query Language statement to define the conditions under which you want to remove the fields.</li>
+</ul>
+<p>Consider the following simple configuration which will remove <tt>field1</tt> unconditionally:</p>
+
+<div class="source">
+<div class="source">
+<pre>{
+...
+    &quot;fieldTransformations&quot; : [
+          {
+            &quot;input&quot; : &quot;field1&quot;
+          , &quot;transformation&quot; : &quot;REMOVE&quot;
+          }
+                      ]
+}
+</pre></div></div>
+<p>Consider the following simple sensor parser configuration which will remove <tt>field1</tt> whenever <tt>field2</tt> exists and whose corresponding equal to &#x2018;foo&#x2019;:</p>
+
+<div class="source">
+<div class="source">
+<pre>{
+...
+  &quot;fieldTransformations&quot; : [
+          {
+            &quot;input&quot; : &quot;field1&quot;
+          , &quot;transformation&quot; : &quot;REMOVE&quot;
+          , &quot;config&quot; : {
+              &quot;condition&quot; : &quot;exists(field2) and field2 == 'foo'&quot;
+                       }
+          }
+                      ]
+}
+</pre></div></div>
+
+<ul>
+  
+<li><tt>IP_PROTOCOL</tt> : This transformation maps IANA protocol numbers to consistent string representations.</li>
+</ul>
+<p>Consider the following sensor parser config to map the <tt>protocol</tt> field to a textual representation of the protocol:</p>
+
+<div class="source">
+<div class="source">
+<pre>{
+...
+    &quot;fieldTransformations&quot; : [
+          {
+            &quot;input&quot; : &quot;protocol&quot;
+          , &quot;transformation&quot; : &quot;IP_PROTOCOL&quot;
+          }
+                      ]
+}
+</pre></div></div>
+<p>This transformation would transform <tt>{ &quot;protocol&quot; : 6, &quot;source.type&quot; : &quot;bro&quot;, ... }</tt> into <tt>{ &quot;protocol&quot; : &quot;TCP&quot;, &quot;source.type&quot; : &quot;bro&quot;, ...}</tt></p>
+
+<ul>
+  
+<li><tt>STELLAR</tt> : This transformation executes a set of transformations  expressed as <a href="../metron-common/index.html">Stellar Language</a> statements.</li>
+</ul>
+<p>Consider the following sensor parser config to add three new fields to a message:</p>
+
+<ul>
+  
+<li><tt>utc_timestamp</tt> : The unix epoch timestamp based on the <tt>timestamp</tt> field, a <tt>dc</tt> field which is the data center the message comes from and a <tt>dc2tz</tt> map mapping data centers to timezones</li>
+  
+<li><tt>url_host</tt> : The host associated with the url in the <tt>url</tt> field</li>
+  
+<li><tt>url_protocol</tt> : The protocol associated with the url in the <tt>url</tt> field</li>
+</ul>
+
+<div class="source">
+<div class="source">
+<pre>{
+...
+    &quot;fieldTransformations&quot; : [
+          {
+           &quot;transformation&quot; : &quot;STELLAR&quot;
+          ,&quot;output&quot; : [ &quot;utc_timestamp&quot;, &quot;url_host&quot;, &quot;url_protocol&quot; ]
+          ,&quot;config&quot; : {
+            &quot;utc_timestamp&quot; : &quot;TO_EPOCH_TIMESTAMP(timestamp, 'yyyy-MM-dd
+HH:mm:ss', MAP_GET(dc, dc2tz, 'UTC') )&quot;
+           ,&quot;url_host&quot; : &quot;URL_TO_HOST(url)&quot;
+           ,&quot;url_protocol&quot; : &quot;URL_TO_PROTOCOL(url)&quot;
+                      }
+          }
+                      ]
+   ,&quot;parserConfig&quot; : {
+      &quot;dc2tz&quot; : {
+                &quot;nyc&quot; : &quot;EST&quot;
+               ,&quot;la&quot; : &quot;PST&quot;
+               ,&quot;london&quot; : &quot;UTC&quot;
+                }
+    }
+}
+</pre></div></div>
+<p>Note that the <tt>dc2tz</tt> map is in the parser config, so it is accessible in the functions.</p></div>
+<div class="section">
+<h3><a name="An_Example_Configuration_for_a_Sensor"></a>An Example Configuration for a Sensor</h3>
+<p>Consider the following example configuration for the <tt>yaf</tt> sensor:</p>
+
+<div class="source">
+<div class="source">
+<pre>{
+  &quot;parserClassName&quot;:&quot;org.apache.metron.parsers.GrokParser&quot;,
+  &quot;sensorTopic&quot;:&quot;yaf&quot;,
+  &quot;fieldTransformations&quot; : [
+                    {
+                      &quot;input&quot; : &quot;protocol&quot;
+                     ,&quot;transformation&quot;: &quot;IP_PROTOCOL&quot;
+                    }
+                    ],
+  &quot;parserConfig&quot;:
+  {
+    &quot;grokPath&quot;:&quot;/patterns/yaf&quot;,
+    &quot;patternLabel&quot;:&quot;YAF_DELIMITED&quot;,
+    &quot;timestampField&quot;:&quot;start_time&quot;,
+    &quot;timeFields&quot;: [&quot;start_time&quot;, &quot;end_time&quot;],
+    &quot;dateFormat&quot;:&quot;yyyy-MM-dd HH:mm:ss.S&quot;
+  }
+}
+</pre></div></div></div></div>
+<div class="section">
+<h2><a name="Parser_Adapters"></a>Parser Adapters</h2>
+<p>Parser adapters are loaded dynamically in each Metron topology. They are defined in the Parser Config (defined above) JSON file in Zookeeper.</p>
+<div class="section">
+<h3><a name="Java_Parser_Adapters"></a>Java Parser Adapters</h3>
+<p>Java parser adapters are indended for higher-velocity topologies and are not easily changed or extended. As the adoption of Metron continues we plan on extending our library of Java adapters to process more log formats. As of this moment the Java adapters included with Metron are:</p>
+
+<ul>
+  
+<li>org.apache.metron.parsers.ise.BasicIseParser : Parse ISE messages</li>
+  
+<li>org.apache.metron.parsers.bro.BasicBroParser : Parse Bro messages</li>
+  
+<li>org.apache.metron.parsers.sourcefire.BasicSourcefireParser : Parse Sourcefire messages</li>
+  
+<li>org.apache.metron.parsers.lancope.BasicLancopeParser : Parse Lancope messages</li>
+</ul></div>
+<div class="section">
+<h3><a name="Grok_Parser_Adapters"></a>Grok Parser Adapters</h3>
+<p>Grok parser adapters are designed primarly for someone who is not a Java coder for quickly standing up a parser adapter for lower velocity topologies. Grok relies on Regex for message parsing, which is much slower than purpose-built Java parsers, but is more extensible. Grok parsers are defined via a config file and the topplogy does not need to be recombiled in order to make changes to them. An example of a Grok perser is:</p>
+
+<ul>
+  
+<li>org.apache.metron.parsers.GrokParser</li>
+</ul>
+<p>For more information on the Grok project please refer to the following link:</p>
+<p><a class="externalLink" href="https://github.com/thekrakken/java-grok">https://github.com/thekrakken/java-grok</a></p>
+<p><a name="Starting_the_Parser_Topology"></a></p>
+<h1>Starting the Parser Topology</h1>
+<p>Starting a particular parser topology on a running Metron deployment is as easy as running the <tt>start_parser_topology.sh</tt> script located in <tt>$METRON_HOME/bin</tt>. This utility will allow you to configure and start the running topology assuming that the sensor specific parser configuration exists within zookeeper.</p>
+<p>The usage for <tt>start_parser_topology.sh</tt> is as follows:</p>
+
+<div class="source">
+<div class="source">
+<pre>usage: start_parser_topology.sh
+ -e,--extra_topology_options &lt;JSON_FILE&gt;        Extra options in the form
+                                                of a JSON file with a map
+                                                for content.
+ -esc,--extra_kafka_spout_config &lt;JSON_FILE&gt;    Extra spout config options
+                                                in the form of a JSON file
+                                                with a map for content.
+                                                Possible keys are:
+                                                retryDelayMaxMs,retryDelay
+                                                Multiplier,retryInitialDel
+                                                ayMs,stateUpdateIntervalMs
+                                                ,bufferSizeBytes,fetchMaxW
+                                                ait,fetchSizeBytes,maxOffs
+                                                etBehind,metricsTimeBucket
+                                                SizeInSecs,socketTimeoutMs
+ -ewnt,--error_writer_num_tasks &lt;NUM_TASKS&gt;     Error Writer Num Tasks
+ -ewp,--error_writer_p &lt;PARALLELISM_HINT&gt;       Error Writer Parallelism
+                                                Hint
+ -h,--help                                      This screen
+ -k,--kafka &lt;BROKER_URL&gt;                        Kafka Broker URL
+ -mt,--message_timeout &lt;TIMEOUT_IN_SECS&gt;        Message Timeout in Seconds
+ -mtp,--max_task_parallelism &lt;MAX_TASK&gt;         Max task parallelism
+ -na,--num_ackers &lt;NUM_ACKERS&gt;                  Number of Ackers
+ -nw,--num_workers &lt;NUM_WORKERS&gt;                Number of Workers
+ -pnt,--parser_num_tasks &lt;NUM_TASKS&gt;            Parser Num Tasks
+ -pp,--parser_p &lt;PARALLELISM_HINT&gt;              Parser Parallelism Hint
+ -s,--sensor &lt;SENSOR_TYPE&gt;                      Sensor Type
+ -snt,--spout_num_tasks &lt;NUM_TASKS&gt;             Spout Num Tasks
+ -sp,--spout_p &lt;SPOUT_PARALLELISM_HINT&gt;         Spout Parallelism Hint
+ -t,--test &lt;TEST&gt;                               Run in Test Mode
+ -z,--zk &lt;ZK_QUORUM&gt;                            Zookeeper Quroum URL
+                                                (zk1:2181,zk2:2181,...
+</pre></div></div></div></div>
+<div class="section">
+<h2><a name="The_--extra_kafka_spout_config_Option"></a>The <tt>--extra_kafka_spout_config</tt> Option</h2>
+<p>These options are intended to configure the Storm Kafka Spout more completely. These options can be specified in a JSON file containing a map associating the kafka spout configuration parameter to a value. The range of values possible to configure are:</p>
+
+<ul>
+  
+<li><tt>spout.pollTimeoutMs</tt> - Specifies the time, in milliseconds, spent waiting in poll if data is not available. Default is 2s</li>
+  
+<li><tt>spout.firstPollOffsetStrategy</tt> - Sets the offset used by the Kafka spout in the first poll to Kafka broker upon process start. One of
+  
+<ul>
+    
+<li><tt>EARLIEST</tt></li>
+    
+<li><tt>LATEST</tt></li>
+    
+<li><tt>UNCOMMITTED_EARLIEST</tt> - Last uncommitted and if offsets aren&#x2019;t found, defaults to earliest. NOTE: This is the default.</li>
+    
+<li><tt>UNCOMMITTED_LATEST</tt> - Last uncommitted and if offsets aren&#x2019;t found, defaults to latest.</li>
+  </ul></li>
+  
+<li><tt>spout.offsetCommitPeriodMs</tt> - Specifies the period, in milliseconds, the offset commit task is periodically called. Default is 15s.</li>
+  
+<li><tt>spout.maxUncommittedOffsets</tt> - Defines the max number of polled offsets (records) that can be pending commit, before another poll can take place. Once this limit is reached, no more offsets (records) can be polled until the next successful commit(s) sets the number of pending offsets bellow the threshold. The default is 10,000,000.</li>
+  
+<li><tt>spout.maxRetries</tt> - Defines the max number of retrials in case of tuple failure. The default is to retry forever, which means that no new records are committed until the previous polled records have been acked. This guarantees at once delivery of all the previously polled records. By specifying a finite value for maxRetries, the user decides to sacrifice guarantee of delivery for the previous polled records in favor of processing more records.</li>
+  
+<li>Any of the configs in the Consumer API for <a class="externalLink" href="http://kafka.apache.org/0100/documentation.html#newconsumerconfigs">Kafka 0.10.x</a></li>
+</ul>
+<p>For instance, creating a JSON file which will set the offsets to <tt>UNCOMMITTED_EARLIEST</tt></p>
+
+<div class="source">
+<div class="source">
+<pre>{
+  &quot;spout.firstPollOffsetStrategy&quot; : &quot;UNCOMMITTED_EARLIEST&quot;
+}
+</pre></div></div>
+<p>This would be loaded by passing the file as argument to <tt>--extra_kafka_spout_config</tt></p></div>
+<div class="section">
+<h2><a name="The_--extra_topology_options_Option"></a>The <tt>--extra_topology_options</tt> Option</h2>
+<p>These options are intended to be Storm configuration options and will live in a JSON file which will be loaded into the Storm config. For instance, if you wanted to set a storm property on the config called <tt>topology.ticks.tuple.freq.secs</tt> to 1000 and <tt>storm.local.dir</tt> to <tt>/opt/my/path</tt> you could create a file called <tt>custom_config.json</tt> containing </p>
+
+<div class="source">
+<div class="source">
+<pre>{ 
+  &quot;topology.ticks.tuple.freq.secs&quot; : 1000,
+  &quot;storm.local.dir&quot; : &quot;/opt/my/path&quot;
+}
+</pre></div></div>
+<p>and pass <tt>--extra_topology_options custom_config.json</tt> to <tt>start_parser_topology.sh</tt>.</p>
+<p><a name="Notes_on_Performance_Tuning"></a></p>
+<h1>Notes on Performance Tuning</h1>
+<p>Default installed Metron is untuned for production deployment. There are a few knobs to tune to get the most out of your system.</p></div>
+<div class="section">
+<h2><a name="Kafka_Queue"></a>Kafka Queue</h2>
+<p>The kafka queue associated with your parser is a collection point for all of the data sent to your parser. As such, make sure that the number of partitions in the kafka topic is sufficient to handle the throughput that you expect from your parser topology.</p></div>
+<div class="section">
+<h2><a name="Parser_Topology"></a>Parser Topology</h2>
+<p>The enrichment topology as started by the <tt>$METRON_HOME/bin/start_parser_topology.sh</tt> script uses a default of one executor per bolt. In a real production system, this should be customized by modifying the arguments sent to this utility.</p>
+
+<ul>
+  
+<li>Topology Wide
+  
+<ul>
+    
+<li><tt>--num_workers</tt> : The number of workers for the topology</li>
+    
+<li><tt>--num_ackers</tt> : The number of ackers for the topology</li>
+  </ul></li>
+  
+<li>The Kafka Spout
+  
+<ul>
+    
+<li><tt>--spout_num_tasks</tt> : The number of tasks for the spout</li>
+    
+<li><tt>--spout_p</tt> : The parallelism hint for the spout</li>
+    
+<li>Ensure that the spout has enough parallelism so that it can dedicate a worker per partition in your kafka topic.</li>
+  </ul></li>
+  
+<li>The Parser Bolt
+  
+<ul>
+    
+<li><tt>--parser_num_tasks</tt> : The number of tasks for the parser bolt</li>
+    
+<li><tt>--parser_p</tt> : The parallelism hint for the spout</li>
+    
+<li>This is bolt that gets the most processing, so ensure that it is configured with sufficient parallelism to match your throughput expectations.</li>
+  </ul></li>
+  
+<li>The Error Message Writer Bolt
+  
+<ul>
+    
+<li><tt>--error_writer_num_tasks</tt> : The number of tasks for the error writer bolt</li>
+    
+<li><tt>--error_writer_p</tt> : The parallelism hint for the error writer bolt</li>
+  </ul></li>
+</ul>
+<p>Finally, if workers and executors are new to you, the following might be of use to you:</p>
+
+<ul>
+  
+<li><a class="externalLink" href="http://www.michael-noll.com/blog/2012/10/16/understanding-the-parallelism-of-a-storm-topology/">Understanding the Parallelism of a Storm Topology</a></li>
+</ul></div>
+                  </div>
+            </div>
+          </div>
+
+    <hr/>
+
+    <footer>
+            <div class="container-fluid">
+              <div class="row span12">Copyright &copy;                    2017
+                        <a href="https://www.apache.org">The Apache Software Foundation</a>.
+            All Rights Reserved.      
+                    
+      </div>
+
+                          
+        
+                </div>
+    </footer>
+  </body>
+</html>

Added: dev/metron/0.4.0-RC4/site-book/metron-platform/metron-pcap-backend/index.html
==============================================================================
--- dev/metron/0.4.0-RC4/site-book/metron-platform/metron-pcap-backend/index.html (added)
+++ dev/metron/0.4.0-RC4/site-book/metron-platform/metron-pcap-backend/index.html Tue Jun 27 18:15:56 2017
@@ -0,0 +1,634 @@
+<!DOCTYPE html>
+<!--
+ | Generated by Apache Maven Doxia at 2017-06-27
+ | Rendered using Apache Maven Fluido Skin 1.3.0
+-->
+<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
+  <head>
+    <meta charset="UTF-8" />
+    <meta name="viewport" content="width=device-width, initial-scale=1.0" />
+    <meta name="Date-Revision-yyyymmdd" content="20170627" />
+    <meta http-equiv="Content-Language" content="en" />
+    <title>Metron &#x2013; Metron PCAP Backend</title>
+    <link rel="stylesheet" href="../../css/apache-maven-fluido-1.3.0.min.css" />
+    <link rel="stylesheet" href="../../css/site.css" />
+    <link rel="stylesheet" href="../../css/print.css" media="print" />
+
+      
+    <script type="text/javascript" src="../../js/apache-maven-fluido-1.3.0.min.js"></script>
+
+                          
+        
+<script type="text/javascript">$( document ).ready( function() { $( '.carousel' ).carousel( { interval: 3500 } ) } );</script>
+          
+            </head>
+        <body class="topBarDisabled">
+          
+                
+                    
+    
+        <div class="container-fluid">
+          <div id="banner">
+        <div class="pull-left">
+                                    <a href="http://metron.apache.org/" id="bannerLeft">
+                                                                                                <img src="../../images/metron-logo.png"  alt="Apache Metron" width="148px" height="48px"/>
+                </a>
+                      </div>
+        <div class="pull-right">  </div>
+        <div class="clear"><hr/></div>
+      </div>
+
+      <div id="breadcrumbs">
+        <ul class="breadcrumb">
+                
+                    
+                              <li class="">
+                    <a href="http://www.apache.org" class="externalLink" title="Apache">
+        Apache</a>
+        </li>
+      <li class="divider ">/</li>
+            <li class="">
+                    <a href="http://metron.apache.org/" class="externalLink" title="Metron">
+        Metron</a>
+        </li>
+      <li class="divider ">/</li>
+            <li class="">
+                    <a href="../../index.html" title="Documentation">
+        Documentation</a>
+        </li>
+      <li class="divider ">/</li>
+        <li class="">Metron PCAP Backend</li>
+        
+                
+                    
+                  <li id="publishDate" class="pull-right">Last Published: 2017-06-27</li> <li class="divider pull-right">|</li>
+              <li id="projectVersion" class="pull-right">Version: 0.4.0</li>
+            
+                            </ul>
+      </div>
+
+            
+      <div class="row-fluid">
+        <div id="leftColumn" class="span3">
+          <div class="well sidebar-nav">
+                
+                    
+                <ul class="nav nav-list">
+                    <li class="nav-header">User Documentation</li>
+                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          
+      <li>
+    
+                          <a href="../../index.html" title="Metron">
+          <i class="icon-chevron-down"></i>
+        Metron</a>
+                    <ul class="nav nav-list">
+                      
+      <li>
+    
+                          <a href="../../Upgrading.html" title="Upgrading">
+          <i class="none"></i>
+        Upgrading</a>
+            </li>
+                                                                                                                                                      
+      <li>
+    
+                          <a href="../../metron-analytics/index.html" title="Analytics">
+          <i class="icon-chevron-right"></i>
+        Analytics</a>
+                  </li>
+                                                                                                                                                                                                                                                                                                                                                                                    
+      <li>
+    
+                          <a href="../../metron-deployment/index.html" title="Deployment">
+          <i class="icon-chevron-right"></i>
+        Deployment</a>
+                  </li>
+                      
+      <li>
+    
+                          <a href="../../metron-docker/index.html" title="Docker">
+          <i class="none"></i>
+        Docker</a>
+            </li>
+                      
+      <li>
+    
+                          <a href="../../metron-interface/metron-config/index.html" title="Config">
+          <i class="none"></i>
+        Config</a>
+            </li>
+                      
+      <li>
+    
+                          <a href="../../metron-interface/metron-rest/index.html" title="Rest">
+          <i class="none"></i>
+        Rest</a>
+            </li>
+                                                                                                                                                                                                                                                          
+      <li>
+    
+                          <a href="../../metron-platform/index.html" title="Platform">
+          <i class="icon-chevron-down"></i>
+        Platform</a>
+                    <ul class="nav nav-list">
+                      
+      <li>
+    
+                          <a href="../../metron-platform/metron-api/index.html" title="Api">
+          <i class="none"></i>
+        Api</a>
+            </li>
+                                                                        
+      <li>
+    
+                          <a href="../../metron-platform/metron-common/index.html" title="Common">
+          <i class="icon-chevron-right"></i>
+        Common</a>
+                  </li>
+                      
+      <li>
+    
+                          <a href="../../metron-platform/metron-data-management/index.html" title="Data-management">
+          <i class="none"></i>
+        Data-management</a>
+            </li>
+                      
+      <li>
+    
+                          <a href="../../metron-platform/metron-enrichment/index.html" title="Enrichment">
+          <i class="none"></i>
+        Enrichment</a>
+            </li>
+                      
+      <li>
+    
+                          <a href="../../metron-platform/metron-indexing/index.html" title="Indexing">
+          <i class="none"></i>
+        Indexing</a>
+            </li>
+                      
+      <li>
+    
+                          <a href="../../metron-platform/metron-management/index.html" title="Management">
+          <i class="none"></i>
+        Management</a>
+            </li>
+                      
+      <li>
+    
+                          <a href="../../metron-platform/metron-parsers/index.html" title="Parsers">
+          <i class="none"></i>
+        Parsers</a>
+            </li>
+                      
+      <li class="active">
+    
+            <a href="#"><i class="none"></i>Pcap-backend</a>
+          </li>
+                      
+      <li>
+    
+                          <a href="../../metron-platform/metron-writer/index.html" title="Writer">
+          <i class="none"></i>
+        Writer</a>
+            </li>
+              </ul>
+        </li>
+                                                                                                            
+      <li>
+    
+                          <a href="../../metron-sensors/index.html" title="Sensors">
+          <i class="icon-chevron-right"></i>
+        Sensors</a>
+                  </li>
+              </ul>
+        </li>
+            </ul>
+                
+                    
+                
+          <hr class="divider" />
+
+           <div id="poweredBy">
+                            <div class="clear"></div>
+                            <div class="clear"></div>
+                            <div class="clear"></div>
+                             <a href="http://maven.apache.org/" title="Built by Maven" class="poweredBy">
+        <img class="builtBy" alt="Built by Maven" src="../../images/logos/maven-feather.png" />
+      </a>
+                  </div>
+          </div>
+        </div>
+        
+                
+        <div id="bodyColumn"  class="span9" >
+                                  
+            <h1>Metron PCAP Backend</h1>
+<p><a name="Metron_PCAP_Backend"></a></p>
+<p>The purpose of the Metron PCAP backend is to create a storm topology capable of rapidly ingesting raw packet capture data directly into HDFS from Kafka.</p>
+
+<ul>
+  
+<li><a href="#the-sensors-feeding-kafka">Sensors</a></li>
+  
+<li><a href="#the-pcap-topology">PCAP Topology</a></li>
+  
+<li><a href="#the-files-on-hdfs">HDFS Files</a></li>
+  
+<li><a href="#Configuration">Configuration</a></li>
+  
+<li><a href="#Starting_the_Topology">Starting the Topology</a></li>
+  
+<li><a href="#Utilities">Utilities</a>
+  
+<ul>
+    
+<li><a href="#Inspector_Utility">Inspector Utility</a></li>
+    
+<li><a href="#Query_Filter_Utility">Query Filter Utility</a></li>
+  </ul></li>
+  
+<li><a href="#Performance_Tuning">Performance Tuning</a></li>
+</ul>
+<div class="section">
+<h2><a name="The_Sensors_Feeding_Kafka"></a>The Sensors Feeding Kafka</h2>
+<p>This component must be fed by fast packet capture components upstream via Kafka. The two supported components shipped with Metron are as follows:</p>
+
+<ul>
+  
+<li>The pycapa <a href="../../metron-sensors/pycapa/index.html">tool</a> aimed at low-volume packet capture</li>
+  
+<li>The <a class="externalLink" href="http://dpdk.org/">DPDK</a> based <a href="../../metron-sensors/fastcapa/index.html">tool</a> aimed at high-volume packet capture</li>
+</ul>
+<p>Both of these sensors feed kafka raw packet data directly into Kafka. The format of the record structure that this component expects is the following:</p>
+
+<ul>
+  
+<li>A key which is the byte representation of a 64-bit <tt>unsigned long</tt> representing a time-unit since the unix epoch</li>
+  
+<li>A value which is the raw packet data without header (either global pcap header or packet header)</li>
+</ul></div>
+<div class="section">
+<h2><a name="The_PCAP_Topology"></a>The PCAP Topology</h2>
+<p>The structure of the topology is extremely simple. In fact, it is a spout-only topology. The <tt>Storm Kafka</tt> spout is used but extended to allow a callback to be used rather than having a separate bolt. </p>
+<p>The following happens as part of this spout for each packet:</p>
+
+<ul>
+  
+<li>A custom <tt>Scheme</tt> is used which attaches the appropriate headers to the packet (both global and packet headers) using the timestamp in the key and the raw packet data in the value.</li>
+  
+<li>A callback is called which appends the packet data to a sequence file in HDFS.</li>
+</ul></div>
+<div class="section">
+<h2><a name="The_Files_on_HDFS"></a>The Files on HDFS</h2>
+<p>The sequence files on HDFS fit the following pattern: <tt>$BASE_PATH/pcap_$TOPIC_$TS_$PARTITION_$UUID</tt></p>
+<p>where</p>
+
+<ul>
+  
+<li><tt>BASE_PATH</tt> is the base path to where pcap data is stored in HDFS</li>
+  
+<li><tt>TOPIC</tt> is the kafka topic</li>
+  
+<li><tt>TS</tt> is the timestamp, in nanoseconds since the unix epoch</li>
+  
+<li><tt>PARTITION</tt> is the kafka partition</li>
+  
+<li><tt>UUID</tt> the UUID for the storm worker</li>
+</ul>
+<p>These files contain a set of packet data with headers on them in sequence files.</p></div>
+<div class="section">
+<h2><a name="Configuration"></a>Configuration</h2>
+<p>The configuration file for the Flux topology is located at <tt>$METRON_HOME/config/pcap.properties</tt> and the possible options are as follows:</p>
+
+<ul>
+  
+<li><tt>spout.kafka.topic.pcap</tt> : The kafka topic to listen to</li>
+  
+<li><tt>storm.auto.credentials</tt> : The kerberos ticket renewal. If running on a kerberized cluster, this should be <tt>['org.apache.storm.security.auth.kerberos.AutoTGT']</tt></li>
+  
+<li><tt>kafka.security.protocol</tt> : The security protocol to use for kafka. This should be <tt>PLAINTEXT</tt> for a non-kerberized cluster and probably <tt>SASL_PLAINTEXT</tt> for a kerberized cluster.</li>
+  
+<li><tt>kafka.zk</tt> : The comma separated zookeeper quorum (i.e. host:2181,host2:2181)</li>
+  
+<li><tt>kafka.pcap.start</tt> : One of <tt>EARLIEST</tt>, <tt>LATEST</tt>, <tt>UNCOMMITTED_EARLIEST</tt>, <tt>UNCOMMITTED_LATEST</tt> representing where to start listening on the queue.</li>
+  
+<li><tt>kafka.pcap.numPackets</tt> : The number of packets to keep in one file.</li>
+  
+<li><tt>kafka.pcap.maxTimeMS</tt> : The number of packets to keep in one file in terms of duration (in milliseconds). For instance, you may only want to keep an hour&#x2019;s worth of packets in a given file.</li>
+  
+<li><tt>kafka.pcap.ts_scheme</tt> : One of <tt>FROM_KEY</tt> or <tt>FROM_VALUE</tt>. You really only want <tt>FROM_KEY</tt> as that fits the current tooling. <tt>FROM_VALUE</tt> assumes that fully headerized packets are coming in on the value, which is legacy.</li>
+  
+<li><tt>kafka.pcap.out</tt> : The directory in HDFS to store the packet capture data</li>
+  
+<li><tt>kafka.pcap.ts_granularity</tt> : The granularity of timing used in the timestamps. One of <tt>MILLISECONDS</tt>, <tt>MICROSECONDS</tt>, or <tt>NANOSECONDS</tt> representing milliseconds, microseconds or nanoseconds since the unix epoch (respectively).</li>
+</ul></div>
+<div class="section">
+<h2><a name="Starting_the_Topology"></a>Starting the Topology</h2>
+<p>To assist in starting the topology, a utility script which takes no arguments has been created to make this very simple. Simply, execute <tt>$METRON_HOME/bin/start_pcap_topology.sh</tt>.</p></div>
+<div class="section">
+<h2><a name="Utilities"></a>Utilities</h2>
+<div class="section">
+<h3><a name="Inspector_Utility"></a>Inspector Utility</h3>
+<p>In order to ensure that data can be read back out, a utility, <tt>$METRON_HOME/bin/pcap_inspector.sh</tt> has been created to read portions of the sequence files.</p>
+
+<div class="source">
+<div class="source">
+<pre>usage: PcapInspector
+ -h,--help               Generate Help screen
+ -i,--input &lt;SEQ_FILE&gt;   Input sequence file on HDFS
+ -n,--num_packets &lt;N&gt;    Number of packets to dump
+</pre></div></div></div>
+<div class="section">
+<h3><a name="Query_Filter_Utility"></a>Query Filter Utility</h3>
+<p>This tool exposes the two methods for filtering PCAP data via a command line tool:</p>
+
+<ul>
+  
+<li>fixed</li>
+  
+<li>query (via Stellar)</li>
+</ul>
+<p>The tool is executed via </p>
+
+<div class="source">
+<div class="source">
+<pre>${metron_home}/bin/pcap_query.sh [fixed|query]
+</pre></div></div>
+<div class="section">
+<h4><a name="Usage"></a>Usage</h4>
+
+<div class="source">
+<div class="source">
+<pre>usage: Fixed filter options
+ -bop,--base_output_path &lt;arg&gt;   Query result output path. Default is
+                                 '/tmp'
+ -bp,--base_path &lt;arg&gt;           Base PCAP data path. Default is
+                                 '/apps/metron/pcap'
+ -da,--ip_dst_addr &lt;arg&gt;         Destination IP address
+ -df,--date_format &lt;arg&gt;         Date format to use for parsing start_time
+                                 and end_time. Default is to use time in
+                                 millis since the epoch.
+ -dp,--ip_dst_port &lt;arg&gt;         Destination port
+ -pf,--packet_filter &lt;arg&gt;       Packet filter regex
+ -et,--end_time &lt;arg&gt;            Packet end time range. Default is current
+                                 system time.
+ -nr,--num_reducers &lt;arg&gt;        The number of reducers to use.  Default
+                                 is 10.
+ -h,--help                       Display help
+ -ir,--include_reverse           Indicates if filter should check swapped
+                                 src/dest addresses and IPs
+ -p,--protocol &lt;arg&gt;             IP Protocol
+ -sa,--ip_src_addr &lt;arg&gt;         Source IP address
+ -sp,--ip_src_port &lt;arg&gt;         Source port
+ -st,--start_time &lt;arg&gt;          (required) Packet start time range.
+</pre></div></div>
+
+<div class="source">
+<div class="source">
+<pre>usage: Query filter options
+ -bop,--base_output_path &lt;arg&gt;   Query result output path. Default is
+                                 '/tmp'
+ -bp,--base_path &lt;arg&gt;           Base PCAP data path. Default is
+                                 '/apps/metron/pcap'
+ -df,--date_format &lt;arg&gt;         Date format to use for parsing start_time
+                                 and end_time. Default is to use time in
+                                 millis since the epoch.
+ -et,--end_time &lt;arg&gt;            Packet end time range. Default is current
+                                 system time.
+ -nr,--num_reducers &lt;arg&gt;        The number of reducers to use.  Default
+                                 is 10.
+ -h,--help                       Display help
+ -q,--query &lt;arg&gt;                Query string to use as a filter
+ -st,--start_time &lt;arg&gt;          (required) Packet start time range.
+</pre></div></div>
+<p>The Query filter&#x2019;s <tt>--query</tt> argument specifies the Stellar expression to execute on each packet. To interact with the packet, a few variables are exposed:</p>
+
+<ul>
+  
+<li><tt>packet</tt> : The packet data (a <tt>byte[]</tt>)</li>
+  
+<li><tt>ip_src_addr</tt> : The source address for the packet (a <tt>String</tt>)</li>
+  
+<li><tt>ip_src_port</tt> : The source port for the packet (an <tt>Integer</tt>)</li>
+  
+<li><tt>ip_dst_addr</tt> : The destination address for the packet (a <tt>String</tt>)</li>
+  
+<li><tt>ip_dst_port</tt> : The destination port for the packet (an <tt>Integer</tt>)</li>
+</ul></div>
+<div class="section">
+<h4><a name="Binary_Regex"></a>Binary Regex</h4>
+<p>Filtering can be done both by the packet header as well as via a binary regular expression which can be run on the packet payload itself. This filter can be specified via:</p>
+
+<ul>
+  
+<li>The <tt>-pf</tt> or <tt>--packet_filter</tt> options for the fixed query filter</li>
+  
+<li>The <tt>BYTEARRAY_MATCHER(pattern, data)</tt> Stellar function. The first argument is the regex pattern and the second argument is the data. The packet data will be exposed via the<tt>packet</tt> variable in Stellar.</li>
+</ul>
+<p>The format of this regular expression is described <a class="externalLink" href="https://github.com/nishihatapalmer/byteseek/blob/master/sequencesyntax.md">here</a>.</p></div></div></div>
+<div class="section">
+<h2><a name="Performance_Tuning"></a>Performance Tuning</h2>
+<p>The PCAP topology is extremely lightweight and functions as a Spout-only topology. In order to tune the topology, users currently must specify a combination of properties in pcap.properties as well as configuration in the pcap remote.yaml flux file itself. Tuning the number of partitions in your Kafka topic will have a dramatic impact on performance as well. We ran data into Kafka at 1.1 Gbps and our tests resulted in configuring 128 partitions for our kakfa topic along with the following settings in pcap.properties and remote.yaml (unrelated properties for performance have been removed):</p>
+<div class="section">
+<h3><a name="pcap.properties_file"></a>pcap.properties file</h3>
+
+<div class="source">
+<div class="source">
+<pre>spout.kafka.topic.pcap=pcap
+storm.topology.workers=16
+kafka.spout.parallelism=128
+kafka.pcap.numPackets=1000000000
+kafka.pcap.maxTimeMS=0
+hdfs.replication=1
+hdfs.sync.every=10000
+</pre></div></div>
+<p>You&#x2019;ll notice that the number of kakfa partitions equals the spout parallelism, and this is no coincidence. The ordering guarantees for a partition in Kafka enforces that you may have no more consumers than 1 per topic. Any additional parallelism will leave you with dormant threads consuming resources but performing no additional work. For our cluster with 4 Storm Supervisors, we found 16 workers to provide optimal throughput as well. We were largely IO bound rather than CPU bound with the incoming PCAP data.</p></div>
+<div class="section">
+<h3><a name="remote.yaml"></a>remote.yaml</h3>
+<p>In the flux file, we introduced the following configuration:</p>
+
+<div class="source">
+<div class="source">
+<pre>name: &quot;pcap&quot;
+config:
+    topology.workers: ${storm.topology.workers}
+    topology.worker.childopts: ${topology.worker.childopts}
+    topology.auto-credentials: ${storm.auto.credentials}
+    topology.ackers.executors: 0
+components:
+
+  # Any kafka props for the producer go here.
+  - id: &quot;kafkaProps&quot;
+    className: &quot;java.util.HashMap&quot;
+    configMethods:
+      -   name: &quot;put&quot;
+          args:
+            - &quot;value.deserializer&quot;
+            - &quot;org.apache.kafka.common.serialization.ByteArrayDeserializer&quot;
+      -   name: &quot;put&quot;
+          args:
+            - &quot;key.deserializer&quot;
+            - &quot;org.apache.kafka.common.serialization.ByteArrayDeserializer&quot;
+      -   name: &quot;put&quot;
+          args:
+            - &quot;group.id&quot;
+            - &quot;pcap&quot;
+      -   name: &quot;put&quot;
+          args:
+            - &quot;security.protocol&quot;
+            - &quot;${kafka.security.protocol}&quot;
+      -   name: &quot;put&quot;
+          args:
+            - &quot;poll.timeout.ms&quot;
+            - 100
+      -   name: &quot;put&quot;
+          args:
+            - &quot;offset.commit.period.ms&quot;
+            - 30000
+      -   name: &quot;put&quot;
+          args:
+            - &quot;session.timeout.ms&quot;
+            - 30000
+      -   name: &quot;put&quot;
+          args:
+            - &quot;max.uncommitted.offsets&quot;
+            - 200000000
+      -   name: &quot;put&quot;
+          args:
+            - &quot;max.poll.interval.ms&quot;
+            - 10
+      -   name: &quot;put&quot;
+          args:
+            - &quot;max.poll.records&quot;
+            - 200000
+      -   name: &quot;put&quot;
+          args:
+            - &quot;receive.buffer.bytes&quot;
+            - 431072
+      -   name: &quot;put&quot;
+          args:
+            - &quot;max.partition.fetch.bytes&quot;
+            - 8097152
+
+  - id: &quot;hdfsProps&quot;
+    className: &quot;java.util.HashMap&quot;
+    configMethods:
+      -   name: &quot;put&quot;
+          args:
+            - &quot;io.file.buffer.size&quot;
+            - 1000000
+      -   name: &quot;put&quot;
+          args:
+            - &quot;dfs.blocksize&quot;
+            - 1073741824
+
+  - id: &quot;kafkaConfig&quot;
+    className: &quot;org.apache.metron.storm.kafka.flux.SimpleStormKafkaBuilder&quot;
+    constructorArgs:
+      - ref: &quot;kafkaProps&quot;
+      # topic name
+      - &quot;${spout.kafka.topic.pcap}&quot;
+      - &quot;${kafka.zk}&quot;
+    configMethods:
+      -   name: &quot;setFirstPollOffsetStrategy&quot;
+          args:
+            # One of EARLIEST, LATEST, UNCOMMITTED_EARLIEST, UNCOMMITTED_LATEST
+            - ${kafka.pcap.start}
+
+  - id: &quot;writerConfig&quot;
+    className: &quot;org.apache.metron.spout.pcap.HDFSWriterConfig&quot;
+    configMethods:
+      -   name: &quot;withOutputPath&quot;
+          args:
+            - &quot;${kafka.pcap.out}&quot;
+      -   name: &quot;withNumPackets&quot;
+          args:
+            - ${kafka.pcap.numPackets}
+      -   name: &quot;withMaxTimeMS&quot;
+          args:
+            - ${kafka.pcap.maxTimeMS}
+      -   name: &quot;withZookeeperQuorum&quot;
+          args:
+            - &quot;${kafka.zk}&quot;
+      -   name: &quot;withSyncEvery&quot;
+          args:
+            - ${hdfs.sync.every}
+      -   name: &quot;withReplicationFactor&quot;
+          args:
+            - ${hdfs.replication}
+      -   name: &quot;withHDFSConfig&quot;
+          args:
+              - ref: &quot;hdfsProps&quot;
+      -   name: &quot;withDeserializer&quot;
+          args:
+            - &quot;${kafka.pcap.ts_scheme}&quot;
+            - &quot;${kafka.pcap.ts_granularity}&quot;
+spouts:
+  - id: &quot;kafkaSpout&quot;
+    className: &quot;org.apache.metron.spout.pcap.KafkaToHDFSSpout&quot;
+    parallelism: ${kafka.spout.parallelism}
+    constructorArgs:
+      - ref: &quot;kafkaConfig&quot;
+      - ref: &quot;writerConfig&quot;
+
+</pre></div></div>
+<div class="section">
+<h4><a name="Flux_Changes_Introduced"></a>Flux Changes Introduced</h4>
+<div class="section">
+<h5><a name="Topology_Configuration"></a>Topology Configuration</h5>
+<p>The only change here is <tt>topology.ackers.executors: 0</tt>, which disables Storm tuple acking for maximum throughput.</p></div>
+<div class="section">
+<h5><a name="Kafka_configuration"></a>Kafka configuration</h5>
+
+<div class="source">
+<div class="source">
+<pre>poll.timeout.ms
+offset.commit.period.ms
+session.timeout.ms
+max.uncommitted.offsets
+max.poll.interval.ms
+max.poll.records
+receive.buffer.bytes
+max.partition.fetch.bytes
+</pre></div></div></div>
+<div class="section">
+<h5><a name="Writer_Configuration"></a>Writer Configuration</h5>
+<p>This is a combination of settings for the HDFSWriter (see pcap.properties values above) as well as HDFS.</p>
+<p><b>HDFS config</b></p>
+<p>Component config HashMap with the following properties:</p>
+
+<div class="source">
+<div class="source">
+<pre>io.file.buffer.size
+dfs.blocksize
+</pre></div></div>
+<p><b>Writer config</b></p>
+<p>References the HDFS props component specified above.</p>
+
+<div class="source">
+<div class="source">
+<pre> -   name: &quot;withHDFSConfig&quot;
+     args:
+       - ref: &quot;hdfsProps&quot;
+</pre></div></div></div></div></div></div>
+                  </div>
+            </div>
+          </div>
+
+    <hr/>
+
+    <footer>
+            <div class="container-fluid">
+              <div class="row span12">Copyright &copy;                    2017
+                        <a href="https://www.apache.org">The Apache Software Foundation</a>.
+            All Rights Reserved.      
+                    
+      </div>
+
+                          
+        
+                </div>
+    </footer>
+  </body>
+</html>

Added: dev/metron/0.4.0-RC4/site-book/metron-platform/metron-writer/index.html
==============================================================================
--- dev/metron/0.4.0-RC4/site-book/metron-platform/metron-writer/index.html (added)
+++ dev/metron/0.4.0-RC4/site-book/metron-platform/metron-writer/index.html Tue Jun 27 18:15:56 2017
@@ -0,0 +1,321 @@
+<!DOCTYPE html>
+<!--
+ | Generated by Apache Maven Doxia at 2017-06-27
+ | Rendered using Apache Maven Fluido Skin 1.3.0
+-->
+<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
+  <head>
+    <meta charset="UTF-8" />
+    <meta name="viewport" content="width=device-width, initial-scale=1.0" />
+    <meta name="Date-Revision-yyyymmdd" content="20170627" />
+    <meta http-equiv="Content-Language" content="en" />
+    <title>Metron &#x2013; Writer</title>
+    <link rel="stylesheet" href="../../css/apache-maven-fluido-1.3.0.min.css" />
+    <link rel="stylesheet" href="../../css/site.css" />
+    <link rel="stylesheet" href="../../css/print.css" media="print" />
+
+      
+    <script type="text/javascript" src="../../js/apache-maven-fluido-1.3.0.min.js"></script>
+
+                          
+        
+<script type="text/javascript">$( document ).ready( function() { $( '.carousel' ).carousel( { interval: 3500 } ) } );</script>
+          
+            </head>
+        <body class="topBarDisabled">
+          
+                
+                    
+    
+        <div class="container-fluid">
+          <div id="banner">
+        <div class="pull-left">
+                                    <a href="http://metron.apache.org/" id="bannerLeft">
+                                                                                                <img src="../../images/metron-logo.png"  alt="Apache Metron" width="148px" height="48px"/>
+                </a>
+                      </div>
+        <div class="pull-right">  </div>
+        <div class="clear"><hr/></div>
+      </div>
+
+      <div id="breadcrumbs">
+        <ul class="breadcrumb">
+                
+                    
+                              <li class="">
+                    <a href="http://www.apache.org" class="externalLink" title="Apache">
+        Apache</a>
+        </li>
+      <li class="divider ">/</li>
+            <li class="">
+                    <a href="http://metron.apache.org/" class="externalLink" title="Metron">
+        Metron</a>
+        </li>
+      <li class="divider ">/</li>
+            <li class="">
+                    <a href="../../index.html" title="Documentation">
+        Documentation</a>
+        </li>
+      <li class="divider ">/</li>
+        <li class="">Writer</li>
+        
+                
+                    
+                  <li id="publishDate" class="pull-right">Last Published: 2017-06-27</li> <li class="divider pull-right">|</li>
+              <li id="projectVersion" class="pull-right">Version: 0.4.0</li>
+            
+                            </ul>
+      </div>
+
+            
+      <div class="row-fluid">
+        <div id="leftColumn" class="span3">
+          <div class="well sidebar-nav">
+                
+                    
+                <ul class="nav nav-list">
+                    <li class="nav-header">User Documentation</li>
+                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          
+      <li>
+    
+                          <a href="../../index.html" title="Metron">
+          <i class="icon-chevron-down"></i>
+        Metron</a>
+                    <ul class="nav nav-list">
+                      
+      <li>
+    
+                          <a href="../../Upgrading.html" title="Upgrading">
+          <i class="none"></i>
+        Upgrading</a>
+            </li>
+                                                                                                                                                      
+      <li>
+    
+                          <a href="../../metron-analytics/index.html" title="Analytics">
+          <i class="icon-chevron-right"></i>
+        Analytics</a>
+                  </li>
+                                                                                                                                                                                                                                                                                                                                                                                    
+      <li>
+    
+                          <a href="../../metron-deployment/index.html" title="Deployment">
+          <i class="icon-chevron-right"></i>
+        Deployment</a>
+                  </li>
+                      
+      <li>
+    
+                          <a href="../../metron-docker/index.html" title="Docker">
+          <i class="none"></i>
+        Docker</a>
+            </li>
+                      
+      <li>
+    
+                          <a href="../../metron-interface/metron-config/index.html" title="Config">
+          <i class="none"></i>
+        Config</a>
+            </li>
+                      
+      <li>
+    
+                          <a href="../../metron-interface/metron-rest/index.html" title="Rest">
+          <i class="none"></i>
+        Rest</a>
+            </li>
+                                                                                                                                                                                                                                                          
+      <li>
+    
+                          <a href="../../metron-platform/index.html" title="Platform">
+          <i class="icon-chevron-down"></i>
+        Platform</a>
+                    <ul class="nav nav-list">
+                      
+      <li>
+    
+                          <a href="../../metron-platform/metron-api/index.html" title="Api">
+          <i class="none"></i>
+        Api</a>
+            </li>
+                                                                        
+      <li>
+    
+                          <a href="../../metron-platform/metron-common/index.html" title="Common">
+          <i class="icon-chevron-right"></i>
+        Common</a>
+                  </li>
+                      
+      <li>
+    
+                          <a href="../../metron-platform/metron-data-management/index.html" title="Data-management">
+          <i class="none"></i>
+        Data-management</a>
+            </li>
+                      
+      <li>
+    
+                          <a href="../../metron-platform/metron-enrichment/index.html" title="Enrichment">
+          <i class="none"></i>
+        Enrichment</a>
+            </li>
+                      
+      <li>
+    
+                          <a href="../../metron-platform/metron-indexing/index.html" title="Indexing">
+          <i class="none"></i>
+        Indexing</a>
+            </li>
+                      
+      <li>
+    
+                          <a href="../../metron-platform/metron-management/index.html" title="Management">
+          <i class="none"></i>
+        Management</a>
+            </li>
+                      
+      <li>
+    
+                          <a href="../../metron-platform/metron-parsers/index.html" title="Parsers">
+          <i class="none"></i>
+        Parsers</a>
+            </li>
+                      
+      <li>
+    
+                          <a href="../../metron-platform/metron-pcap-backend/index.html" title="Pcap-backend">
+          <i class="none"></i>
+        Pcap-backend</a>
+            </li>
+                      
+      <li class="active">
+    
+            <a href="#"><i class="none"></i>Writer</a>
+          </li>
+              </ul>
+        </li>
+                                                                                                            
+      <li>
+    
+                          <a href="../../metron-sensors/index.html" title="Sensors">
+          <i class="icon-chevron-right"></i>
+        Sensors</a>
+                  </li>
+              </ul>
+        </li>
+            </ul>
+                
+                    
+                
+          <hr class="divider" />
+
+           <div id="poweredBy">
+                            <div class="clear"></div>
+                            <div class="clear"></div>
+                            <div class="clear"></div>
+                             <a href="http://maven.apache.org/" title="Built by Maven" class="poweredBy">
+        <img class="builtBy" alt="Built by Maven" src="../../images/logos/maven-feather.png" />
+      </a>
+                  </div>
+          </div>
+        </div>
+        
+                
+        <div id="bodyColumn"  class="span9" >
+                                  
+            <!-- Licensed to the Apache Software Foundation (ASF) under one or more
+contributor license agreements.  See the NOTICE file distributed with
+this work for additional information regarding copyright ownership.
+The ASF licenses this file to You under the Apache License, Version 2.0
+(the "License"); you may not use this file except in compliance with
+the License.  You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License. --><h1>Writer</h1>
+<p><a name="Writer"></a></p>
+<div class="section">
+<h2><a name="Introduction"></a>Introduction</h2>
+<p>The writer module provides some utilties for writing to outside components from within Storm. This includes managing bulk writing. An implemention is included for writing to HDFS in this module. Other writers can be found in their own modules.</p></div>
+<div class="section">
+<h2><a name="HDFS_Writer"></a>HDFS Writer</h2>
+<p>The HDFS writer included here expands on what Storm has in several ways. There&#x2019;s customization in syncing to HDFS, rotation policy, etc. In addition, the writer allows for users to define output paths based on the fields in the provided JSON message. This can be defined using Stellar.</p>
+<p>To manage the output path, a base path argument is provided by the Flux file, with the FileNameFormat as follows</p>
+
+<div class="source">
+<div class="source">
+<pre>    -   id: &quot;fileNameFormat&quot;
+        className: &quot;org.apache.storm.hdfs.bolt.format.DefaultFileNameFormat&quot;
+        configMethods:
+            -   name: &quot;withPrefix&quot;
+                args:
+                    - &quot;enrichment-&quot;
+            -   name: &quot;withExtension&quot;
+                args:
+                  - &quot;.json&quot;
+            -   name: &quot;withPath&quot;
+                args:
+                    - &quot;/apps/metron/&quot;
+</pre></div></div>
+<p>This means that all output will land in <tt>/apps/metron/</tt>. With no further adjustment, it will be <tt>/apps/metron/&lt;sensor&gt;/</tt>. However, by modifying the sensor&#x2019;s JSON config, it is possible to provide additional pathing based on the the message itself.</p>
+<p>E.g.</p>
+
+<div class="source">
+<div class="source">
+<pre>{
+  &quot;index&quot;: &quot;bro&quot;,
+  &quot;batchSize&quot;: 5,
+  &quot;outputPathFunction&quot;: &quot;FORMAT('uid-%s', uid)&quot;
+}
+</pre></div></div>
+<p>will land data in <tt>/apps/metron/uid-&lt;uid&gt;/</tt>.</p>
+<p>For example, if the data contains uid&#x2019;s 1, 3, and 5, there will be 3 output folders in HDFS:</p>
+
+<div class="source">
+<div class="source">
+<pre>/apps/metron/uid-1/
+/apps/metron/uid-3/
+/apps/metron/uid-5/
+</pre></div></div>
+<p>The Stellar function must return a String, but is not limited to FORMAT functions. Other functions, such as <tt>TO_LOWER</tt>, <tt>TO_UPPER</tt>, etc. are all available for use. Typically, it&#x2019;s preferable to do nontrivial transformations as part of enrichment and simply reference the output here.</p>
+<p>If no Stellar function is provided, it will default to putting the sensor in a folder, as above.</p>
+<p>A caveat is that the writer will only allow a certain number of files to be created at once. HdfsWriter has a function <tt>withMaxOpenFiles</tt> allowing this to be set. The default is 500. This can be set in Flux:</p>
+
+<div class="source">
+<div class="source">
+<pre>    -   id: &quot;hdfsWriter&quot;
+        className: &quot;org.apache.metron.writer.hdfs.HdfsWriter&quot;
+        configMethods:
+            -   name: &quot;withFileNameFormat&quot;
+                args:
+                    - ref: &quot;fileNameFormat&quot;
+            -   name: &quot;withRotationPolicy&quot;
+                args:
+                    - ref: &quot;hdfsRotationPolicy&quot;
+            -   name: &quot;withMaxOpenFiles&quot;
+                args: 500
+</pre></div></div></div>
+                  </div>
+            </div>
+          </div>
+
+    <hr/>
+
+    <footer>
+            <div class="container-fluid">
+              <div class="row span12">Copyright &copy;                    2017
+                        <a href="https://www.apache.org">The Apache Software Foundation</a>.
+            All Rights Reserved.      
+                    
+      </div>
+
+                          
+        
+                </div>
+    </footer>
+  </body>
+</html>



Mime
View raw message