metron-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From ceste...@apache.org
Subject svn commit: r23994 [20/24] - in /release/metron: 0.4.1/ 0.4.2/ 0.4.2/site-book/ 0.4.2/site-book/css/ 0.4.2/site-book/images/ 0.4.2/site-book/images/logos/ 0.4.2/site-book/images/profiles/ 0.4.2/site-book/img/ 0.4.2/site-book/js/ 0.4.2/site-book/metron-...
Date Wed, 03 Jan 2018 18:25:58 GMT
Added: release/metron/0.4.2/site-book/metron-platform/metron-pcap-backend/index.html
==============================================================================
--- release/metron/0.4.2/site-book/metron-platform/metron-pcap-backend/index.html (added)
+++ release/metron/0.4.2/site-book/metron-platform/metron-pcap-backend/index.html Wed Jan  3 18:25:57 2018
@@ -0,0 +1,676 @@
+<!DOCTYPE html>
+<!--
+ | Generated by Apache Maven Doxia at 2017-12-08
+ | Rendered using Apache Maven Fluido Skin 1.3.0
+-->
+<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
+  <head>
+    <meta charset="UTF-8" />
+    <meta name="viewport" content="width=device-width, initial-scale=1.0" />
+    <meta name="Date-Revision-yyyymmdd" content="20171208" />
+    <meta http-equiv="Content-Language" content="en" />
+    <title>Metron &#x2013; Metron PCAP Backend</title>
+    <link rel="stylesheet" href="../../css/apache-maven-fluido-1.3.0.min.css" />
+    <link rel="stylesheet" href="../../css/site.css" />
+    <link rel="stylesheet" href="../../css/print.css" media="print" />
+
+      
+    <script type="text/javascript" src="../../js/apache-maven-fluido-1.3.0.min.js"></script>
+
+                          
+        
+<script type="text/javascript">$( document ).ready( function() { $( '.carousel' ).carousel( { interval: 3500 } ) } );</script>
+          
+            </head>
+        <body class="topBarDisabled">
+          
+                
+                    
+    
+        <div class="container-fluid">
+          <div id="banner">
+        <div class="pull-left">
+                                    <a href="http://metron.apache.org/" id="bannerLeft">
+                                                                                                <img src="../../images/metron-logo.png"  alt="Apache Metron" width="148px" height="48px"/>
+                </a>
+                      </div>
+        <div class="pull-right">  </div>
+        <div class="clear"><hr/></div>
+      </div>
+
+      <div id="breadcrumbs">
+        <ul class="breadcrumb">
+                
+                    
+                              <li class="">
+                    <a href="http://www.apache.org" class="externalLink" title="Apache">
+        Apache</a>
+        </li>
+      <li class="divider ">/</li>
+            <li class="">
+                    <a href="http://metron.apache.org/" class="externalLink" title="Metron">
+        Metron</a>
+        </li>
+      <li class="divider ">/</li>
+            <li class="">
+                    <a href="../../index.html" title="Documentation">
+        Documentation</a>
+        </li>
+      <li class="divider ">/</li>
+        <li class="">Metron PCAP Backend</li>
+        
+                
+                    
+                  <li id="publishDate" class="pull-right">Last Published: 2017-12-08</li> <li class="divider pull-right">|</li>
+              <li id="projectVersion" class="pull-right">Version: 0.4.2</li>
+            
+                            </ul>
+      </div>
+
+            
+      <div class="row-fluid">
+        <div id="leftColumn" class="span3">
+          <div class="well sidebar-nav">
+                
+                    
+                <ul class="nav nav-list">
+                    <li class="nav-header">User Documentation</li>
+                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     
                                                                          
+      <li>
+    
+                          <a href="../../index.html" title="Metron">
+          <i class="icon-chevron-down"></i>
+        Metron</a>
+                    <ul class="nav nav-list">
+                      
+      <li>
+    
+                          <a href="../../Upgrading.html" title="Upgrading">
+          <i class="none"></i>
+        Upgrading</a>
+            </li>
+                                                                                                                                                      
+      <li>
+    
+                          <a href="../../metron-analytics/index.html" title="Analytics">
+          <i class="icon-chevron-right"></i>
+        Analytics</a>
+                  </li>
+                      
+      <li>
+    
+                          <a href="../../metron-contrib/metron-docker/index.html" title="Docker">
+          <i class="none"></i>
+        Docker</a>
+            </li>
+                                                                                                                                                                                                                                                                                                                                                                                                            
+      <li>
+    
+                          <a href="../../metron-deployment/index.html" title="Deployment">
+          <i class="icon-chevron-right"></i>
+        Deployment</a>
+                  </li>
+                      
+      <li>
+    
+                          <a href="../../metron-interface/metron-alerts/index.html" title="Alerts">
+          <i class="none"></i>
+        Alerts</a>
+            </li>
+                      
+      <li>
+    
+                          <a href="../../metron-interface/metron-config/index.html" title="Config">
+          <i class="none"></i>
+        Config</a>
+            </li>
+                      
+      <li>
+    
+                          <a href="../../metron-interface/metron-rest/index.html" title="Rest">
+          <i class="none"></i>
+        Rest</a>
+            </li>
+                                                                                                                                                                                                                                                                                              
+      <li>
+    
+                          <a href="../../metron-platform/index.html" title="Platform">
+          <i class="icon-chevron-down"></i>
+        Platform</a>
+                    <ul class="nav nav-list">
+                      
+      <li>
+    
+                          <a href="../../metron-platform/Performance-tuning-guide.html" title="Performance-tuning-guide">
+          <i class="none"></i>
+        Performance-tuning-guide</a>
+            </li>
+                      
+      <li>
+    
+                          <a href="../../metron-platform/metron-api/index.html" title="Api">
+          <i class="none"></i>
+        Api</a>
+            </li>
+                      
+      <li>
+    
+                          <a href="../../metron-platform/metron-common/index.html" title="Common">
+          <i class="none"></i>
+        Common</a>
+            </li>
+                      
+      <li>
+    
+                          <a href="../../metron-platform/metron-data-management/index.html" title="Data-management">
+          <i class="none"></i>
+        Data-management</a>
+            </li>
+                      
+      <li>
+    
+                          <a href="../../metron-platform/metron-elasticsearch/index.html" title="Elasticsearch">
+          <i class="none"></i>
+        Elasticsearch</a>
+            </li>
+                      
+      <li>
+    
+                          <a href="../../metron-platform/metron-enrichment/index.html" title="Enrichment">
+          <i class="none"></i>
+        Enrichment</a>
+            </li>
+                      
+      <li>
+    
+                          <a href="../../metron-platform/metron-indexing/index.html" title="Indexing">
+          <i class="none"></i>
+        Indexing</a>
+            </li>
+                      
+      <li>
+    
+                          <a href="../../metron-platform/metron-management/index.html" title="Management">
+          <i class="none"></i>
+        Management</a>
+            </li>
+                                                                        
+      <li>
+    
+                          <a href="../../metron-platform/metron-parsers/index.html" title="Parsers">
+          <i class="icon-chevron-right"></i>
+        Parsers</a>
+                  </li>
+                      
+      <li class="active">
+    
+            <a href="#"><i class="none"></i>Pcap-backend</a>
+          </li>
+                      
+      <li>
+    
+                          <a href="../../metron-platform/metron-writer/index.html" title="Writer">
+          <i class="none"></i>
+        Writer</a>
+            </li>
+              </ul>
+        </li>
+                                                                                          
+      <li>
+    
+                          <a href="../../metron-sensors/index.html" title="Sensors">
+          <i class="icon-chevron-right"></i>
+        Sensors</a>
+                  </li>
+                      
+      <li>
+    
+                          <a href="../../metron-stellar/stellar-3rd-party-example/index.html" title="Stellar-3rd-party-example">
+          <i class="none"></i>
+        Stellar-3rd-party-example</a>
+            </li>
+                                                                        
+      <li>
+    
+                          <a href="../../metron-stellar/stellar-common/index.html" title="Stellar-common">
+          <i class="icon-chevron-right"></i>
+        Stellar-common</a>
+                  </li>
+                                                                                          
+      <li>
+    
+                          <a href="../../use-cases/index.html" title="Use-cases">
+          <i class="icon-chevron-right"></i>
+        Use-cases</a>
+                  </li>
+              </ul>
+        </li>
+            </ul>
+                
+                    
+                
+          <hr class="divider" />
+
+           <div id="poweredBy">
+                            <div class="clear"></div>
+                            <div class="clear"></div>
+                            <div class="clear"></div>
+                             <a href="http://maven.apache.org/" title="Built by Maven" class="poweredBy">
+        <img class="builtBy" alt="Built by Maven" src="../../images/logos/maven-feather.png" />
+      </a>
+                  </div>
+          </div>
+        </div>
+        
+                
+        <div id="bodyColumn"  class="span9" >
+                                  
+            <h1>Metron PCAP Backend</h1>
+<p><a name="Metron_PCAP_Backend"></a></p>
+<p>The purpose of the Metron PCAP backend is to create a storm topology capable of rapidly ingesting raw packet capture data directly into HDFS from Kafka.</p>
+
+<ul>
+  
+<li><a href="#the-sensors-feeding-kafka">Sensors</a></li>
+  
+<li><a href="#the-pcap-topology">PCAP Topology</a></li>
+  
+<li><a href="#the-files-on-hdfs">HDFS Files</a></li>
+  
+<li><a href="#Configuration">Configuration</a></li>
+  
+<li><a href="#Starting_the_Topology">Starting the Topology</a></li>
+  
+<li><a href="#Utilities">Utilities</a>
+  
+<ul>
+    
+<li><a href="#Inspector_Utility">Inspector Utility</a></li>
+    
+<li><a href="#Query_Filter_Utility">Query Filter Utility</a></li>
+  </ul></li>
+  
+<li><a href="#Performance_Tuning">Performance Tuning</a></li>
+</ul>
+<div class="section">
+<h2><a name="The_Sensors_Feeding_Kafka"></a>The Sensors Feeding Kafka</h2>
+<p>This component must be fed by fast packet capture components upstream via Kafka. The two supported components shipped with Metron are as follows:</p>
+
+<ul>
+  
+<li>The pycapa <a href="../../metron-sensors/pycapa/index.html">tool</a> aimed at low-volume packet capture</li>
+  
+<li>The <a class="externalLink" href="http://dpdk.org/">DPDK</a> based <a href="../../metron-sensors/fastcapa/index.html">tool</a> aimed at high-volume packet capture</li>
+</ul>
+<p>Both of these sensors feed kafka raw packet data directly into Kafka. The format of the record structure that this component expects is the following:</p>
+
+<ul>
+  
+<li>A key which is the byte representation of a 64-bit <tt>unsigned long</tt> representing a time-unit since the unix epoch</li>
+  
+<li>A value which is the raw packet data without header (either global pcap header or packet header)</li>
+</ul></div>
+<div class="section">
+<h2><a name="The_PCAP_Topology"></a>The PCAP Topology</h2>
+<p>The structure of the topology is extremely simple. In fact, it is a spout-only topology. The <tt>Storm Kafka</tt> spout is used but extended to allow a callback to be used rather than having a separate bolt. </p>
+<p>The following happens as part of this spout for each packet:</p>
+
+<ul>
+  
+<li>A custom <tt>Scheme</tt> is used which attaches the appropriate headers to the packet (both global and packet headers) using the timestamp in the key and the raw packet data in the value.</li>
+  
+<li>A callback is called which appends the packet data to a sequence file in HDFS.</li>
+</ul></div>
+<div class="section">
+<h2><a name="The_Files_on_HDFS"></a>The Files on HDFS</h2>
+<p>The sequence files on HDFS fit the following pattern: <tt>$BASE_PATH/pcap_$TOPIC_$TS_$PARTITION_$UUID</tt></p>
+<p>where</p>
+
+<ul>
+  
+<li><tt>BASE_PATH</tt> is the base path to where pcap data is stored in HDFS</li>
+  
+<li><tt>TOPIC</tt> is the kafka topic</li>
+  
+<li><tt>TS</tt> is the timestamp, in nanoseconds since the unix epoch</li>
+  
+<li><tt>PARTITION</tt> is the kafka partition</li>
+  
+<li><tt>UUID</tt> the UUID for the storm worker</li>
+</ul>
+<p>These files contain a set of packet data with headers on them in sequence files.</p></div>
+<div class="section">
+<h2><a name="Configuration"></a>Configuration</h2>
+<p>The configuration file for the Flux topology is located at <tt>$METRON_HOME/config/pcap.properties</tt> and the possible options are as follows:</p>
+
+<ul>
+  
+<li><tt>spout.kafka.topic.pcap</tt> : The kafka topic to listen to</li>
+  
+<li><tt>storm.auto.credentials</tt> : The kerberos ticket renewal. If running on a kerberized cluster, this should be <tt>['org.apache.storm.security.auth.kerberos.AutoTGT']</tt></li>
+  
+<li><tt>kafka.security.protocol</tt> : The security protocol to use for kafka. This should be <tt>PLAINTEXT</tt> for a non-kerberized cluster and probably <tt>SASL_PLAINTEXT</tt> for a kerberized cluster.</li>
+  
+<li><tt>kafka.zk</tt> : The comma separated zookeeper quorum (i.e. host:2181,host2:2181)</li>
+  
+<li><tt>kafka.pcap.start</tt> : One of <tt>EARLIEST</tt>, <tt>LATEST</tt>, <tt>UNCOMMITTED_EARLIEST</tt>, <tt>UNCOMMITTED_LATEST</tt> representing where to start listening on the queue.</li>
+  
+<li><tt>kafka.pcap.numPackets</tt> : The number of packets to keep in one file.</li>
+  
+<li><tt>kafka.pcap.maxTimeMS</tt> : The number of packets to keep in one file in terms of duration (in milliseconds). For instance, you may only want to keep an hour&#x2019;s worth of packets in a given file.</li>
+  
+<li><tt>kafka.pcap.ts_scheme</tt> : One of <tt>FROM_KEY</tt> or <tt>FROM_VALUE</tt>. You really only want <tt>FROM_KEY</tt> as that fits the current tooling. <tt>FROM_VALUE</tt> assumes that fully headerized packets are coming in on the value, which is legacy.</li>
+  
+<li><tt>kafka.pcap.out</tt> : The directory in HDFS to store the packet capture data</li>
+  
+<li><tt>kafka.pcap.ts_granularity</tt> : The granularity of timing used in the timestamps. One of <tt>MILLISECONDS</tt>, <tt>MICROSECONDS</tt>, or <tt>NANOSECONDS</tt> representing milliseconds, microseconds or nanoseconds since the unix epoch (respectively).</li>
+</ul></div>
+<div class="section">
+<h2><a name="Starting_the_Topology"></a>Starting the Topology</h2>
+<p>To assist in starting the topology, a utility script which takes no arguments has been created to make this very simple. Simply, execute <tt>$METRON_HOME/bin/start_pcap_topology.sh</tt>.</p></div>
+<div class="section">
+<h2><a name="Utilities"></a>Utilities</h2>
+<div class="section">
+<h3><a name="Inspector_Utility"></a>Inspector Utility</h3>
+<p>In order to ensure that data can be read back out, a utility, <tt>$METRON_HOME/bin/pcap_inspector.sh</tt> has been created to read portions of the sequence files.</p>
+
+<div class="source">
+<div class="source">
+<pre>usage: PcapInspector
+ -h,--help               Generate Help screen
+ -i,--input &lt;SEQ_FILE&gt;   Input sequence file on HDFS
+ -n,--num_packets &lt;N&gt;    Number of packets to dump
+</pre></div></div></div>
+<div class="section">
+<h3><a name="Query_Filter_Utility"></a>Query Filter Utility</h3>
+<p>This tool exposes the two methods for filtering PCAP data via a command line tool:</p>
+
+<ul>
+  
+<li>fixed</li>
+  
+<li>query (via Stellar)</li>
+</ul>
+<p>The tool is executed via </p>
+
+<div class="source">
+<div class="source">
+<pre>${metron_home}/bin/pcap_query.sh [fixed|query]
+</pre></div></div>
+<div class="section">
+<h4><a name="Usage"></a>Usage</h4>
+
+<div class="source">
+<div class="source">
+<pre>usage: Fixed filter options
+ -bop,--base_output_path &lt;arg&gt;   Query result output path. Default is
+                                 '/tmp'
+ -bp,--base_path &lt;arg&gt;           Base PCAP data path. Default is
+                                 '/apps/metron/pcap'
+ -da,--ip_dst_addr &lt;arg&gt;         Destination IP address
+ -df,--date_format &lt;arg&gt;         Date format to use for parsing start_time
+                                 and end_time. Default is to use time in
+                                 millis since the epoch.
+ -dp,--ip_dst_port &lt;arg&gt;         Destination port
+ -pf,--packet_filter &lt;arg&gt;       Packet filter regex
+ -et,--end_time &lt;arg&gt;            Packet end time range. Default is current
+                                 system time.
+ -nr,--num_reducers &lt;arg&gt;        The number of reducers to use.  Default
+                                 is 10.
+ -h,--help                       Display help
+ -ir,--include_reverse           Indicates if filter should check swapped
+                                 src/dest addresses and IPs
+ -p,--protocol &lt;arg&gt;             IP Protocol
+ -sa,--ip_src_addr &lt;arg&gt;         Source IP address
+ -sp,--ip_src_port &lt;arg&gt;         Source port
+ -st,--start_time &lt;arg&gt;          (required) Packet start time range.
+</pre></div></div>
+
+<div class="source">
+<div class="source">
+<pre>usage: Query filter options
+ -bop,--base_output_path &lt;arg&gt;   Query result output path. Default is
+                                 '/tmp'
+ -bp,--base_path &lt;arg&gt;           Base PCAP data path. Default is
+                                 '/apps/metron/pcap'
+ -df,--date_format &lt;arg&gt;         Date format to use for parsing start_time
+                                 and end_time. Default is to use time in
+                                 millis since the epoch.
+ -et,--end_time &lt;arg&gt;            Packet end time range. Default is current
+                                 system time.
+ -nr,--num_reducers &lt;arg&gt;        The number of reducers to use.  Default
+                                 is 10.
+ -h,--help                       Display help
+ -q,--query &lt;arg&gt;                Query string to use as a filter
+ -st,--start_time &lt;arg&gt;          (required) Packet start time range.
+</pre></div></div>
+<p>The Query filter&#x2019;s <tt>--query</tt> argument specifies the Stellar expression to execute on each packet. To interact with the packet, a few variables are exposed:</p>
+
+<ul>
+  
+<li><tt>packet</tt> : The packet data (a <tt>byte[]</tt>)</li>
+  
+<li><tt>ip_src_addr</tt> : The source address for the packet (a <tt>String</tt>)</li>
+  
+<li><tt>ip_src_port</tt> : The source port for the packet (an <tt>Integer</tt>)</li>
+  
+<li><tt>ip_dst_addr</tt> : The destination address for the packet (a <tt>String</tt>)</li>
+  
+<li><tt>ip_dst_port</tt> : The destination port for the packet (an <tt>Integer</tt>)</li>
+</ul></div>
+<div class="section">
+<h4><a name="Binary_Regex"></a>Binary Regex</h4>
+<p>Filtering can be done both by the packet header as well as via a binary regular expression which can be run on the packet payload itself. This filter can be specified via:</p>
+
+<ul>
+  
+<li>The <tt>-pf</tt> or <tt>--packet_filter</tt> options for the fixed query filter</li>
+  
+<li>The <tt>BYTEARRAY_MATCHER(pattern, data)</tt> Stellar function. The first argument is the regex pattern and the second argument is the data. The packet data will be exposed via the<tt>packet</tt> variable in Stellar.</li>
+</ul>
+<p>The format of this regular expression is described <a class="externalLink" href="https://github.com/nishihatapalmer/byteseek/blob/master/sequencesyntax.md">here</a>.</p></div></div></div>
+<div class="section">
+<h2><a name="Performance_Tuning"></a>Performance Tuning</h2>
+<p>The PCAP topology is extremely lightweight and functions as a Spout-only topology. In order to tune the topology, users currently must specify a combination of properties in pcap.properties as well as configuration in the pcap remote.yaml flux file itself. Tuning the number of partitions in your Kafka topic will have a dramatic impact on performance as well. We ran data into Kafka at 1.1 Gbps and our tests resulted in configuring 128 partitions for our kakfa topic along with the following settings in pcap.properties and remote.yaml (unrelated properties for performance have been removed):</p>
+<div class="section">
+<h3><a name="pcap.properties_file"></a>pcap.properties file</h3>
+
+<div class="source">
+<div class="source">
+<pre>spout.kafka.topic.pcap=pcap
+storm.topology.workers=16
+kafka.spout.parallelism=128
+kafka.pcap.numPackets=1000000000
+kafka.pcap.maxTimeMS=0
+hdfs.replication=1
+hdfs.sync.every=10000
+</pre></div></div>
+<p>You&#x2019;ll notice that the number of kakfa partitions equals the spout parallelism, and this is no coincidence. The ordering guarantees for a partition in Kafka enforces that you may have no more consumers than 1 per topic. Any additional parallelism will leave you with dormant threads consuming resources but performing no additional work. For our cluster with 4 Storm Supervisors, we found 16 workers to provide optimal throughput as well. We were largely IO bound rather than CPU bound with the incoming PCAP data.</p></div>
+<div class="section">
+<h3><a name="remote.yaml"></a>remote.yaml</h3>
+<p>In the flux file, we introduced the following configuration:</p>
+
+<div class="source">
+<div class="source">
+<pre>name: &quot;pcap&quot;
+config:
+    topology.workers: ${storm.topology.workers}
+    topology.worker.childopts: ${topology.worker.childopts}
+    topology.auto-credentials: ${storm.auto.credentials}
+    topology.ackers.executors: 0
+components:
+
+  # Any kafka props for the producer go here.
+  - id: &quot;kafkaProps&quot;
+    className: &quot;java.util.HashMap&quot;
+    configMethods:
+      -   name: &quot;put&quot;
+          args:
+            - &quot;value.deserializer&quot;
+            - &quot;org.apache.kafka.common.serialization.ByteArrayDeserializer&quot;
+      -   name: &quot;put&quot;
+          args:
+            - &quot;key.deserializer&quot;
+            - &quot;org.apache.kafka.common.serialization.ByteArrayDeserializer&quot;
+      -   name: &quot;put&quot;
+          args:
+            - &quot;group.id&quot;
+            - &quot;pcap&quot;
+      -   name: &quot;put&quot;
+          args:
+            - &quot;security.protocol&quot;
+            - &quot;${kafka.security.protocol}&quot;
+      -   name: &quot;put&quot;
+          args:
+            - &quot;poll.timeout.ms&quot;
+            - 100
+      -   name: &quot;put&quot;
+          args:
+            - &quot;offset.commit.period.ms&quot;
+            - 30000
+      -   name: &quot;put&quot;
+          args:
+            - &quot;session.timeout.ms&quot;
+            - 30000
+      -   name: &quot;put&quot;
+          args:
+            - &quot;max.uncommitted.offsets&quot;
+            - 200000000
+      -   name: &quot;put&quot;
+          args:
+            - &quot;max.poll.interval.ms&quot;
+            - 10
+      -   name: &quot;put&quot;
+          args:
+            - &quot;max.poll.records&quot;
+            - 200000
+      -   name: &quot;put&quot;
+          args:
+            - &quot;receive.buffer.bytes&quot;
+            - 431072
+      -   name: &quot;put&quot;
+          args:
+            - &quot;max.partition.fetch.bytes&quot;
+            - 8097152
+
+  - id: &quot;hdfsProps&quot;
+    className: &quot;java.util.HashMap&quot;
+    configMethods:
+      -   name: &quot;put&quot;
+          args:
+            - &quot;io.file.buffer.size&quot;
+            - 1000000
+      -   name: &quot;put&quot;
+          args:
+            - &quot;dfs.blocksize&quot;
+            - 1073741824
+
+  - id: &quot;kafkaConfig&quot;
+    className: &quot;org.apache.metron.storm.kafka.flux.SimpleStormKafkaBuilder&quot;
+    constructorArgs:
+      - ref: &quot;kafkaProps&quot;
+      # topic name
+      - &quot;${spout.kafka.topic.pcap}&quot;
+      - &quot;${kafka.zk}&quot;
+    configMethods:
+      -   name: &quot;setFirstPollOffsetStrategy&quot;
+          args:
+            # One of EARLIEST, LATEST, UNCOMMITTED_EARLIEST, UNCOMMITTED_LATEST
+            - ${kafka.pcap.start}
+
+  - id: &quot;writerConfig&quot;
+    className: &quot;org.apache.metron.spout.pcap.HDFSWriterConfig&quot;
+    configMethods:
+      -   name: &quot;withOutputPath&quot;
+          args:
+            - &quot;${kafka.pcap.out}&quot;
+      -   name: &quot;withNumPackets&quot;
+          args:
+            - ${kafka.pcap.numPackets}
+      -   name: &quot;withMaxTimeMS&quot;
+          args:
+            - ${kafka.pcap.maxTimeMS}
+      -   name: &quot;withZookeeperQuorum&quot;
+          args:
+            - &quot;${kafka.zk}&quot;
+      -   name: &quot;withSyncEvery&quot;
+          args:
+            - ${hdfs.sync.every}
+      -   name: &quot;withReplicationFactor&quot;
+          args:
+            - ${hdfs.replication}
+      -   name: &quot;withHDFSConfig&quot;
+          args:
+              - ref: &quot;hdfsProps&quot;
+      -   name: &quot;withDeserializer&quot;
+          args:
+            - &quot;${kafka.pcap.ts_scheme}&quot;
+            - &quot;${kafka.pcap.ts_granularity}&quot;
+spouts:
+  - id: &quot;kafkaSpout&quot;
+    className: &quot;org.apache.metron.spout.pcap.KafkaToHDFSSpout&quot;
+    parallelism: ${kafka.spout.parallelism}
+    constructorArgs:
+      - ref: &quot;kafkaConfig&quot;
+      - ref: &quot;writerConfig&quot;
+
+</pre></div></div>
+<div class="section">
+<h4><a name="Flux_Changes_Introduced"></a>Flux Changes Introduced</h4>
+<div class="section">
+<h5><a name="Topology_Configuration"></a>Topology Configuration</h5>
+<p>The only change here is <tt>topology.ackers.executors: 0</tt>, which disables Storm tuple acking for maximum throughput.</p></div>
+<div class="section">
+<h5><a name="Kafka_configuration"></a>Kafka configuration</h5>
+
+<div class="source">
+<div class="source">
+<pre>poll.timeout.ms
+offset.commit.period.ms
+session.timeout.ms
+max.uncommitted.offsets
+max.poll.interval.ms
+max.poll.records
+receive.buffer.bytes
+max.partition.fetch.bytes
+</pre></div></div></div>
+<div class="section">
+<h5><a name="Writer_Configuration"></a>Writer Configuration</h5>
+<p>This is a combination of settings for the HDFSWriter (see pcap.properties values above) as well as HDFS.</p>
+<p><b>HDFS config</b></p>
+<p>Component config HashMap with the following properties:</p>
+
+<div class="source">
+<div class="source">
+<pre>io.file.buffer.size
+dfs.blocksize
+</pre></div></div>
+<p><b>Writer config</b></p>
+<p>References the HDFS props component specified above.</p>
+
+<div class="source">
+<div class="source">
+<pre> -   name: &quot;withHDFSConfig&quot;
+     args:
+       - ref: &quot;hdfsProps&quot;
+</pre></div></div></div></div></div></div>
+                  </div>
+            </div>
+          </div>
+
+    <hr/>
+
+    <footer>
+            <div class="container-fluid">
+              <div class="row span12">Copyright &copy;                    2017
+                        <a href="https://www.apache.org">The Apache Software Foundation</a>.
+            All Rights Reserved.      
+                    
+      </div>
+
+                          
+        
+                </div>
+    </footer>
+  </body>
+</html>

Added: release/metron/0.4.2/site-book/metron-platform/metron-writer/index.html
==============================================================================
--- release/metron/0.4.2/site-book/metron-platform/metron-writer/index.html (added)
+++ release/metron/0.4.2/site-book/metron-platform/metron-writer/index.html Wed Jan  3 18:25:57 2018
@@ -0,0 +1,363 @@
+<!DOCTYPE html>
+<!--
+ | Generated by Apache Maven Doxia at 2017-12-08
+ | Rendered using Apache Maven Fluido Skin 1.3.0
+-->
+<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
+  <head>
+    <meta charset="UTF-8" />
+    <meta name="viewport" content="width=device-width, initial-scale=1.0" />
+    <meta name="Date-Revision-yyyymmdd" content="20171208" />
+    <meta http-equiv="Content-Language" content="en" />
+    <title>Metron &#x2013; Writer</title>
+    <link rel="stylesheet" href="../../css/apache-maven-fluido-1.3.0.min.css" />
+    <link rel="stylesheet" href="../../css/site.css" />
+    <link rel="stylesheet" href="../../css/print.css" media="print" />
+
+      
+    <script type="text/javascript" src="../../js/apache-maven-fluido-1.3.0.min.js"></script>
+
+                          
+        
+<script type="text/javascript">$( document ).ready( function() { $( '.carousel' ).carousel( { interval: 3500 } ) } );</script>
+          
+            </head>
+        <body class="topBarDisabled">
+          
+                
+                    
+    
+        <div class="container-fluid">
+          <div id="banner">
+        <div class="pull-left">
+                                    <a href="http://metron.apache.org/" id="bannerLeft">
+                                                                                                <img src="../../images/metron-logo.png"  alt="Apache Metron" width="148px" height="48px"/>
+                </a>
+                      </div>
+        <div class="pull-right">  </div>
+        <div class="clear"><hr/></div>
+      </div>
+
+      <div id="breadcrumbs">
+        <ul class="breadcrumb">
+                
+                    
+                              <li class="">
+                    <a href="http://www.apache.org" class="externalLink" title="Apache">
+        Apache</a>
+        </li>
+      <li class="divider ">/</li>
+            <li class="">
+                    <a href="http://metron.apache.org/" class="externalLink" title="Metron">
+        Metron</a>
+        </li>
+      <li class="divider ">/</li>
+            <li class="">
+                    <a href="../../index.html" title="Documentation">
+        Documentation</a>
+        </li>
+      <li class="divider ">/</li>
+        <li class="">Writer</li>
+        
+                
+                    
+                  <li id="publishDate" class="pull-right">Last Published: 2017-12-08</li> <li class="divider pull-right">|</li>
+              <li id="projectVersion" class="pull-right">Version: 0.4.2</li>
+            
+                            </ul>
+      </div>
+
+            
+      <div class="row-fluid">
+        <div id="leftColumn" class="span3">
+          <div class="well sidebar-nav">
+                
+                    
+                <ul class="nav nav-list">
+                    <li class="nav-header">User Documentation</li>
+                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     
                                                                          
+      <li>
+    
+                          <a href="../../index.html" title="Metron">
+          <i class="icon-chevron-down"></i>
+        Metron</a>
+                    <ul class="nav nav-list">
+                      
+      <li>
+    
+                          <a href="../../Upgrading.html" title="Upgrading">
+          <i class="none"></i>
+        Upgrading</a>
+            </li>
+                                                                                                                                                      
+      <li>
+    
+                          <a href="../../metron-analytics/index.html" title="Analytics">
+          <i class="icon-chevron-right"></i>
+        Analytics</a>
+                  </li>
+                      
+      <li>
+    
+                          <a href="../../metron-contrib/metron-docker/index.html" title="Docker">
+          <i class="none"></i>
+        Docker</a>
+            </li>
+                                                                                                                                                                                                                                                                                                                                                                                                            
+      <li>
+    
+                          <a href="../../metron-deployment/index.html" title="Deployment">
+          <i class="icon-chevron-right"></i>
+        Deployment</a>
+                  </li>
+                      
+      <li>
+    
+                          <a href="../../metron-interface/metron-alerts/index.html" title="Alerts">
+          <i class="none"></i>
+        Alerts</a>
+            </li>
+                      
+      <li>
+    
+                          <a href="../../metron-interface/metron-config/index.html" title="Config">
+          <i class="none"></i>
+        Config</a>
+            </li>
+                      
+      <li>
+    
+                          <a href="../../metron-interface/metron-rest/index.html" title="Rest">
+          <i class="none"></i>
+        Rest</a>
+            </li>
+                                                                                                                                                                                                                                                                                              
+      <li>
+    
+                          <a href="../../metron-platform/index.html" title="Platform">
+          <i class="icon-chevron-down"></i>
+        Platform</a>
+                    <ul class="nav nav-list">
+                      
+      <li>
+    
+                          <a href="../../metron-platform/Performance-tuning-guide.html" title="Performance-tuning-guide">
+          <i class="none"></i>
+        Performance-tuning-guide</a>
+            </li>
+                      
+      <li>
+    
+                          <a href="../../metron-platform/metron-api/index.html" title="Api">
+          <i class="none"></i>
+        Api</a>
+            </li>
+                      
+      <li>
+    
+                          <a href="../../metron-platform/metron-common/index.html" title="Common">
+          <i class="none"></i>
+        Common</a>
+            </li>
+                      
+      <li>
+    
+                          <a href="../../metron-platform/metron-data-management/index.html" title="Data-management">
+          <i class="none"></i>
+        Data-management</a>
+            </li>
+                      
+      <li>
+    
+                          <a href="../../metron-platform/metron-elasticsearch/index.html" title="Elasticsearch">
+          <i class="none"></i>
+        Elasticsearch</a>
+            </li>
+                      
+      <li>
+    
+                          <a href="../../metron-platform/metron-enrichment/index.html" title="Enrichment">
+          <i class="none"></i>
+        Enrichment</a>
+            </li>
+                      
+      <li>
+    
+                          <a href="../../metron-platform/metron-indexing/index.html" title="Indexing">
+          <i class="none"></i>
+        Indexing</a>
+            </li>
+                      
+      <li>
+    
+                          <a href="../../metron-platform/metron-management/index.html" title="Management">
+          <i class="none"></i>
+        Management</a>
+            </li>
+                                                                        
+      <li>
+    
+                          <a href="../../metron-platform/metron-parsers/index.html" title="Parsers">
+          <i class="icon-chevron-right"></i>
+        Parsers</a>
+                  </li>
+                      
+      <li>
+    
+                          <a href="../../metron-platform/metron-pcap-backend/index.html" title="Pcap-backend">
+          <i class="none"></i>
+        Pcap-backend</a>
+            </li>
+                      
+      <li class="active">
+    
+            <a href="#"><i class="none"></i>Writer</a>
+          </li>
+              </ul>
+        </li>
+                                                                                          
+      <li>
+    
+                          <a href="../../metron-sensors/index.html" title="Sensors">
+          <i class="icon-chevron-right"></i>
+        Sensors</a>
+                  </li>
+                      
+      <li>
+    
+                          <a href="../../metron-stellar/stellar-3rd-party-example/index.html" title="Stellar-3rd-party-example">
+          <i class="none"></i>
+        Stellar-3rd-party-example</a>
+            </li>
+                                                                        
+      <li>
+    
+                          <a href="../../metron-stellar/stellar-common/index.html" title="Stellar-common">
+          <i class="icon-chevron-right"></i>
+        Stellar-common</a>
+                  </li>
+                                                                                          
+      <li>
+    
+                          <a href="../../use-cases/index.html" title="Use-cases">
+          <i class="icon-chevron-right"></i>
+        Use-cases</a>
+                  </li>
+              </ul>
+        </li>
+            </ul>
+                
+                    
+                
+          <hr class="divider" />
+
+           <div id="poweredBy">
+                            <div class="clear"></div>
+                            <div class="clear"></div>
+                            <div class="clear"></div>
+                             <a href="http://maven.apache.org/" title="Built by Maven" class="poweredBy">
+        <img class="builtBy" alt="Built by Maven" src="../../images/logos/maven-feather.png" />
+      </a>
+                  </div>
+          </div>
+        </div>
+        
+                
+        <div id="bodyColumn"  class="span9" >
+                                  
+            <!-- Licensed to the Apache Software Foundation (ASF) under one or more
+contributor license agreements.  See the NOTICE file distributed with
+this work for additional information regarding copyright ownership.
+The ASF licenses this file to You under the Apache License, Version 2.0
+(the "License"); you may not use this file except in compliance with
+the License.  You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License. --><h1>Writer</h1>
+<p><a name="Writer"></a></p>
+<div class="section">
+<h2><a name="Introduction"></a>Introduction</h2>
+<p>The writer module provides some utilties for writing to outside components from within Storm. This includes managing bulk writing. An implemention is included for writing to HDFS in this module. Other writers can be found in their own modules.</p></div>
+<div class="section">
+<h2><a name="HDFS_Writer"></a>HDFS Writer</h2>
+<p>The HDFS writer included here expands on what Storm has in several ways. There&#x2019;s customization in syncing to HDFS, rotation policy, etc. In addition, the writer allows for users to define output paths based on the fields in the provided JSON message. This can be defined using Stellar.</p>
+<p>To manage the output path, a base path argument is provided by the Flux file, with the FileNameFormat as follows</p>
+
+<div class="source">
+<div class="source">
+<pre>    -   id: &quot;fileNameFormat&quot;
+        className: &quot;org.apache.storm.hdfs.bolt.format.DefaultFileNameFormat&quot;
+        configMethods:
+            -   name: &quot;withPrefix&quot;
+                args:
+                    - &quot;enrichment-&quot;
+            -   name: &quot;withExtension&quot;
+                args:
+                  - &quot;.json&quot;
+            -   name: &quot;withPath&quot;
+                args:
+                    - &quot;/apps/metron/&quot;
+</pre></div></div>
+<p>This means that all output will land in <tt>/apps/metron/</tt>. With no further adjustment, it will be <tt>/apps/metron/&lt;sensor&gt;/</tt>. However, by modifying the sensor&#x2019;s JSON config, it is possible to provide additional pathing based on the the message itself.</p>
+<p>E.g.</p>
+
+<div class="source">
+<div class="source">
+<pre>{
+  &quot;index&quot;: &quot;bro&quot;,
+  &quot;batchSize&quot;: 5,
+  &quot;outputPathFunction&quot;: &quot;FORMAT('uid-%s', uid)&quot;
+}
+</pre></div></div>
+<p>will land data in <tt>/apps/metron/uid-&lt;uid&gt;/</tt>.</p>
+<p>For example, if the data contains uid&#x2019;s 1, 3, and 5, there will be 3 output folders in HDFS:</p>
+
+<div class="source">
+<div class="source">
+<pre>/apps/metron/uid-1/
+/apps/metron/uid-3/
+/apps/metron/uid-5/
+</pre></div></div>
+<p>The Stellar function must return a String, but is not limited to FORMAT functions. Other functions, such as <tt>TO_LOWER</tt>, <tt>TO_UPPER</tt>, etc. are all available for use. Typically, it&#x2019;s preferable to do nontrivial transformations as part of enrichment and simply reference the output here.</p>
+<p>If no Stellar function is provided, it will default to putting the sensor in a folder, as above.</p>
+<p>A caveat is that the writer will only allow a certain number of files to be created at once. HdfsWriter has a function <tt>withMaxOpenFiles</tt> allowing this to be set. The default is 500. This can be set in Flux:</p>
+
+<div class="source">
+<div class="source">
+<pre>    -   id: &quot;hdfsWriter&quot;
+        className: &quot;org.apache.metron.writer.hdfs.HdfsWriter&quot;
+        configMethods:
+            -   name: &quot;withFileNameFormat&quot;
+                args:
+                    - ref: &quot;fileNameFormat&quot;
+            -   name: &quot;withRotationPolicy&quot;
+                args:
+                    - ref: &quot;hdfsRotationPolicy&quot;
+            -   name: &quot;withMaxOpenFiles&quot;
+                args: 500
+</pre></div></div></div>
+                  </div>
+            </div>
+          </div>
+
+    <hr/>
+
+    <footer>
+            <div class="container-fluid">
+              <div class="row span12">Copyright &copy;                    2017
+                        <a href="https://www.apache.org">The Apache Software Foundation</a>.
+            All Rights Reserved.      
+                    
+      </div>
+
+                          
+        
+                </div>
+    </footer>
+  </body>
+</html>



Mime
View raw message