From commits-return-3872-archive-asf-public=cust-asf.ponee.io@metron.apache.org Tue Sep 18 16:54:46 2018 Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by mx-eu-01.ponee.io (Postfix) with SMTP id 237861807A1 for ; Tue, 18 Sep 2018 16:54:43 +0200 (CEST) Received: (qmail 28429 invoked by uid 500); 18 Sep 2018 14:54:43 -0000 Mailing-List: contact commits-help@metron.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@metron.apache.org Delivered-To: mailing list commits@metron.apache.org Received: (qmail 28376 invoked by uid 99); 18 Sep 2018 14:54:43 -0000 Received: from git1-us-west.apache.org (HELO git1-us-west.apache.org) (140.211.11.23) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 18 Sep 2018 14:54:43 +0000 Received: by git1-us-west.apache.org (ASF Mail Server at git1-us-west.apache.org, from userid 33) id D95B9E10BB; Tue, 18 Sep 2018 14:54:42 +0000 (UTC) Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit From: nickallen@apache.org To: commits@metron.apache.org Date: Tue, 18 Sep 2018 14:54:48 -0000 Message-Id: <0157617d95624138a88e5adafc0fb36e@git.apache.org> In-Reply-To: References: X-Mailer: ASF-Git Admin Mailer Subject: [07/21] metron git commit: METRON-1776 Update public web site to point at 0.6.0 new release (justinleet) closes apache/metron#1195 http://git-wip-us.apache.org/repos/asf/metron/blob/a97e575f/site/current-book/metron-platform/metron-enrichment/Performance.html ---------------------------------------------------------------------- diff --git a/site/current-book/metron-platform/metron-enrichment/Performance.html b/site/current-book/metron-platform/metron-enrichment/Performance.html index 136d939..0857ee4 100644 --- a/site/current-book/metron-platform/metron-enrichment/Performance.html +++ b/site/current-book/metron-platform/metron-enrichment/Performance.html @@ -1,13 +1,13 @@ - + Metron – Enrichment Performance @@ -32,8 +32,8 @@
  • Metron/
  • Documentation/
  • Enrichment Performance
  • -
  • | Last Published: 2018-06-07
  • -
  • Version: 0.5.0
  • +
  • | Last Published: 2018-09-12
  • +
  • Version: 0.6.0
  • @@ -55,7 +55,6 @@
  • Platform
  • Indexing
  • +
  • Job
  • Management
  • Parsers
  • Pcap-backend
  • +
  • Solr
  • Writer
  • http://git-wip-us.apache.org/repos/asf/metron/blob/a97e575f/site/current-book/metron-platform/metron-enrichment/index.html ---------------------------------------------------------------------- diff --git a/site/current-book/metron-platform/metron-enrichment/index.html b/site/current-book/metron-platform/metron-enrichment/index.html index de1e4fa..e750946 100644 --- a/site/current-book/metron-platform/metron-enrichment/index.html +++ b/site/current-book/metron-platform/metron-enrichment/index.html @@ -1,13 +1,13 @@ - + Metron – Enrichment @@ -32,8 +32,8 @@
  • Metron/
  • Documentation/
  • Enrichment
  • -
  • | Last Published: 2018-06-07
  • -
  • Version: 0.5.0
  • +
  • | Last Published: 2018-09-12
  • +
  • Version: 0.6.0
  • @@ -55,7 +55,6 @@
  • Platform
  • Indexing
  • +
  • Job
  • Management
  • Parsers
  • Pcap-backend
  • +
  • Solr
  • Writer
  • @@ -162,14 +163,22 @@ limitations under the License.

    There are two types of configurations at the moment, global and sensor specific.

    Global Configuration

    -

    There are a few enrichments which have independent configurations, such as from the global config.

    +

    There are a few enrichments which have independent configurations, such as from the global config. You can also configure the enrichment topology’s writer batching settings.

    Also, see the “Global Configuration” section for more discussion of the global config.

    GeoIP

    Metron supports enrichment of IP information using GeoLite2. The location of the file is managed in the global config.

    geo.hdfs.file

    -

    The location on HDFS of the GeoLite2 database file to use for GeoIP lookups. This file will be localized on the storm supervisors running the topology and used from there. This is lazy, so if this property changes in a running topology, the file will be localized from HDFS upon first time the file is used via the geo enrichment.

    +

    The location on HDFS of the GeoLite2 database file to use for GeoIP lookups. This file will be localized on the storm supervisors running the topology and used from there. This is lazy, so if this property changes in a running topology, the file will be localized from HDFS upon first time the file is used via the geo enrichment.

    +
    +

    Writer Batching

    +
    +

    enrichment.writer.batchSize

    +

    The size of the batch that is written to Kafka at once. Defaults to 15 (size of 1 disables batching).

    +
    +

    enrichment.writer.batchTimeout

    +

    The timeout after which a batch will be flushed even if batchSize has not been met. Optional. If unspecified, or set to 0, it defaults to a system-determined duration which is a fraction of the Storm parameter topology.message.timeout.secs. Ignored if batchSize is 1, since this disables batching.

    Sensor Enrichment Configuration

    The sensor specific configuration is intended to configure the individual enrichments and threat intelligence enrichments for a given sensor type (e.g. snort).

    http://git-wip-us.apache.org/repos/asf/metron/blob/a97e575f/site/current-book/metron-platform/metron-indexing/index.html ---------------------------------------------------------------------- diff --git a/site/current-book/metron-platform/metron-indexing/index.html b/site/current-book/metron-platform/metron-indexing/index.html index 6d4b8d6..65c0bad 100644 --- a/site/current-book/metron-platform/metron-indexing/index.html +++ b/site/current-book/metron-platform/metron-indexing/index.html @@ -1,13 +1,13 @@ - + Metron – Indexing @@ -32,8 +32,8 @@
  • Metron/
  • Documentation/
  • Indexing
  • -
  • | Last Published: 2018-06-07
  • -
  • Version: 0.5.0
  • +
  • | Last Published: 2018-09-12
  • +
  • Version: 0.6.0
  • @@ -55,15 +55,16 @@
  • Platform
  • @@ -139,15 +140,38 @@ limitations under the License.
  • hdfs
  • solr
  • -

    Depending on how you start the indexing topology, it will have either elasticsearch or solr and hdfs writers running.

    -

    The configuration for an individual writer-specific configuration is a JSON map with the following fields:

    -
      +

      Depending on how you start the indexing topology, it will have either Elasticsearch or Solr and HDFS writers running.

      + + -
    • index : The name of the index to write to (defaulted to the name of the sensor).
    • -
    • batchSize : The size of the batch that is written to the indices at once. Defaults to 1 (no batching).
    • -
    • batchTimeout : The timeout after which a batch will be flushed even if batchSize has not been met. Optional. If unspecified, or set to 0, it defaults to a system-determined duration which is a fraction of the Storm parameter topology.message.timeout.secs. Ignored if batchSize is 1, since this disables batching.
    • -
    • enabled : Whether the writer is enabled (default true).
    • - + + + + + + + + + + + + + + + + + + + + + + + + + + + +
      Property Description Default Value
      index The name of the index to write to. Defaults to the name of the sensor.
      batchSize The size of the batch that is written to the indices at once. Defaults to 1; no batching.
      batchTimeout The timeout after which a batch will be flushed even if batchSize has not been met. Defaults to a duration which is a fraction of the Storm parameter topology.message.timeout.secs, if left undefined or set to 0. Ignored if batchSize is 1, since this disables batching.
      enabled A boolean indicating whether the writer is enabled. Defaults to true
      fieldNameConverter Defines how field names are transformed before being written to the index. Only applicable to elasticsearch. Defaults to DEDOT. Acceptable values are DEDOT that replaces all ‘.’ with ‘:’ or NOOP that does not change the field names .

      Meta Alerts

      Alerts can be grouped, after appropriate searching, into a set of alerts called a meta alert. A meta alert is useful for maintaining the context of searching and grouping during further investigations. Standard searches can return meta alerts, but grouping and other aggregation or sorting requests will not, because there’s not a clear way to aggregate in many cases if there are multiple alerts contained in the meta alert. All meta alerts will have the source type of metaalert, regardless of the contained alert’s origins.

      @@ -155,6 +179,19 @@ limitations under the License.

      Elasticsearch

      Metron comes with built-in templates for the default sensors for Elasticsearch. When adding a new sensor, it will be necessary to add a new template defining the output fields appropriately. In addition, there is a requirement for a field alert of type nested for Elasticsearch 2.x installs. This is detailed at Using Metron with Elasticsearch 2.x

    +

    Solr

    +

    Metron comes with built-in schemas for the default sensors for Solr. When adding a new sensor, it will be necessary to add a new schema defining the output fields appropriately. In addition, these fields are used internally by Metron and also required:

    +
      + +
    • <field name="guid" type="string" indexed="true" stored="true" required="true" multiValued="false" />
    • +
    • <field name="source.type" type="string" indexed="true" stored="true" />
    • +
    • <field name="timestamp" type="timestamp" indexed="true" stored="true" />
    • +
    • <field name="comments" type="string" indexed="true" stored="true" multiValued="true"/>
    • +
    • <field name="metaalerts" type="string" multiValued="true" indexed="true" stored="true"/>
    • +
    +

    The unique key should be set to guid by including <uniqueKey>guid</uniqueKey> in the schema.

    +

    It is strongly suggested the fieldTypes match those in the built-in schemas.

    +

    Indexing Configuration Examples

    For a given sensor, the following scenarios would be indicated by the following cases:

    @@ -294,7 +331,7 @@ limitations under the License.

    The HBase column family to use for message updates.

    The MetaAlertDao

    -

    The goal of meta alerts is to be able to group together a set of alerts while being able to transparently perform actions like searches, as if meta alerts were normal alerts. org.apache.metron.indexing.dao.MetaAlertDao extends IndexDao and enables several features:

    +

    The goal of meta alerts is to be able to group together a set of alerts while being able to transparently perform actions like searches, as if meta alerts were normal alerts. org.apache.metron.indexing.dao.metaalert.MetaAlertDao extends IndexDao and enables several features:

    • the ability to get all meta alerts associated with an alert
    • http://git-wip-us.apache.org/repos/asf/metron/blob/a97e575f/site/current-book/metron-platform/metron-job/index.html ---------------------------------------------------------------------- diff --git a/site/current-book/metron-platform/metron-job/index.html b/site/current-book/metron-platform/metron-job/index.html new file mode 100644 index 0000000..388ea80 --- /dev/null +++ b/site/current-book/metron-platform/metron-job/index.html @@ -0,0 +1,126 @@ + + + + + + + + + Metron – Metron Job + + + + + + + +
      + + + +
      + +
      + +

      Metron Job

      +

      +

      This module holds abstractions for creating jobs. The main actors are a JobManager interface and subsequent implementation, InMemoryJobManger, that handles maintaining a cache of running and completed Statusable jobs. Each Statusable can provide a Finalizer implementation that should be executed on completion of the underlying job. Successful jobs should return a Pageable object that allow consumers to request results on a per-page basis.

      +
      +

      Job State Statechart

      +

      Job State Statechart

      +
      +
      +
      +
      +
      +
      +
      +© 2015-2016 The Apache Software Foundation. Apache Metron, Metron, Apache, the Apache feather logo, + and the Apache Metron project logo are trademarks of The Apache Software Foundation. +
      +
      +
      + + http://git-wip-us.apache.org/repos/asf/metron/blob/a97e575f/site/current-book/metron-platform/metron-management/index.html ---------------------------------------------------------------------- diff --git a/site/current-book/metron-platform/metron-management/index.html b/site/current-book/metron-platform/metron-management/index.html index f9ea1ce..0aa9eda 100644 --- a/site/current-book/metron-platform/metron-management/index.html +++ b/site/current-book/metron-platform/metron-management/index.html @@ -1,13 +1,13 @@ - + Metron – Stellar REPL Management Utilities @@ -32,8 +32,8 @@
    • Metron/
    • Documentation/
    • Stellar REPL Management Utilities
    • -
    • | Last Published: 2018-06-07
    • -
    • Version: 0.5.0
    • +
    • | Last Published: 2018-09-12
    • +
    • Version: 0.6.0
    @@ -55,15 +55,16 @@
  • Platform
  • http://git-wip-us.apache.org/repos/asf/metron/blob/a97e575f/site/current-book/metron-platform/metron-parsers/3rdPartyParser.html ---------------------------------------------------------------------- diff --git a/site/current-book/metron-platform/metron-parsers/3rdPartyParser.html b/site/current-book/metron-platform/metron-parsers/3rdPartyParser.html index 988580b..8d75d41 100644 --- a/site/current-book/metron-platform/metron-parsers/3rdPartyParser.html +++ b/site/current-book/metron-platform/metron-parsers/3rdPartyParser.html @@ -1,13 +1,13 @@ - + Metron – Custom Metron Parsers @@ -32,8 +32,8 @@
  • Metron/
  • Documentation/
  • Custom Metron Parsers
  • -
  • | Last Published: 2018-06-07
  • -
  • Version: 0.5.0
  • +
  • | Last Published: 2018-09-12
  • +
  • Version: 0.6.0
  • @@ -55,20 +55,22 @@
  • Platform
  • @@ -145,18 +147,18 @@ limitations under the License.

    For this demonstration, let’s create a maven project to compile our project. We’ll call it extra_parsers, so in your workspace, let’s set up the maven project:

      -
    • Create the maven infrastructure for extra_parsers via
    • -
    +
  • + +

    Create the maven infrastructure for extra_parsers via

    mkdir -p extra_parsers/src/{main,test}/java
     
    +
  • +
  • -
      - -
    • Create a pom file indicating how we should build our parsers by editing extra_parsers/pom.xml with the following content:
    • -
    +

    Create a pom file indicating how we should build our parsers by editing extra_parsers/pom.xml with the following content:

    @@ -207,7 +209,7 @@ limitations under the License. <!-- We will set up the shade plugin to create a single jar at the end of the build lifecycle. We will exclude some things and relocate others to simulate a real situation. - + One thing to note is that it's a good practice to shade and relocate common libraries that may be dependencies in Metron. Your jar will be merged with the parsers jar, so the metron @@ -289,11 +291,10 @@ limitations under the License. </build> </project>
    +
  • +
  • -
      - -
    • Now let’s create our parser com.thirdparty.SimpleParser by creating the file extra-parsers/src/main/java/com/thirdparty/SimpleParser.java with the following content:
    • -
    +

    Now let’s create our parser com.thirdparty.SimpleParser by creating the file extra-parsers/src/main/java/com/thirdparty/SimpleParser.java with the following content:

    @@ -332,12 +333,13 @@ public class SimpleParser extends BasicParser { } }
    - -
      - +
    • Compile the parser via mvn clean package in extra_parsers
    • -
    +
  • +

    This will create a jar containing your parser and its dependencies (sans Metron dependencies) in extra-parsers/target/extra-parsers-1.0-SNAPSHOT-uber.jar

    +
  • +

    Deploying Your Custom Parser

    In order to deploy your newly built custom parser, you would place the jar file above in the $METRON_HOME/parser_contrib directory on the Metron host (i.e. any host you would start parsers from or, alternatively, where the Metron REST is hosted).

    @@ -362,11 +364,15 @@ public class SimpleParser extends BasicParser {

    Restart the REST service in Ambari

    In order for new parsers to be picked up, the REST service must be restarted. You can do that from within Ambari by restarting the Metron REST service.

    -

    Push the Zookeeper Configs

    -

    Now push the config to Zookeeper with the following command: $METRON_HOME/bin/zk_load_configs.sh -m PUSH -i $METRON_HOME/config/zookeeper/ -z $ZOOKEEPER

    -

    Create a Kafka Topic

    -

    Create a kafka topic, let’s call it test via: /usr/hdp/current/kafka-broker/bin/kafka-topics.sh --zookeeper $ZOOKEEPER --create --topic test --partitions 1 --replication-factor 1

    +

    Create a kafka topic, let’s call it test.

    + +
    +
    +
    KAFKA_HOME=/usr/hdp/current/kafka-broker
    +$KAFKA_HOME/bin/kafka-topics.sh --zookeeper $ZOOKEEPER --create --topic test --partitions 1 --replication-factor 1
    +
    +

    Note, in a real deployment, that topic would be named something more descriptive and would have replication factor and partitions set to something less trivial.

    Configure Test Parser

    @@ -381,32 +387,40 @@ public class SimpleParser extends BasicParser {
    +

    Push the Zookeeper Configs

    +

    Now push the config to Zookeeper with the following command.

    + +
    +
    +
    $METRON_HOME/bin/zk_load_configs.sh -m PUSH -i $METRON_HOME/config/zookeeper/ -z $ZOOKEEPER
    +
    +
    +

    Start Parser

    Now we can start the parser and send some data through:

      -
    • Start the parser
    • -
    +
  • + +

    Start the parser

    $METRON_HOME/bin/start_parser_topology.sh -k $BROKERLIST -z $ZOOKEEPER -s test
     
    +
  • +
  • -
      - -
    • Send example data through:
    • -
    +

    Send example data through:

    echo "apache,metron" | /usr/hdp/current/kafka-broker/bin/kafka-console-producer.sh --broker-list $BROKERLIST --topic test
     
    +
  • +
  • -
      - -
    • Validate data was written in ES:
    • -
    +

    Validate data was written in ES:

    @@ -416,6 +430,8 @@ public class SimpleParser extends BasicParser { } '
    +
  • +
  • This should yield something like:

    @@ -447,7 +463,8 @@ public class SimpleParser extends BasicParser { } }
  • - + +

    Via the Management UI

    As long as the REST service is restarted after new parsers are added to $METRON_HOME/parser_contrib, they are available in the UI to creating and deploying parsers.

    http://git-wip-us.apache.org/repos/asf/metron/blob/a97e575f/site/current-book/metron-platform/metron-parsers/ParserChaining.html ---------------------------------------------------------------------- diff --git a/site/current-book/metron-platform/metron-parsers/ParserChaining.html b/site/current-book/metron-platform/metron-parsers/ParserChaining.html new file mode 100644 index 0000000..a4c9360 --- /dev/null +++ b/site/current-book/metron-platform/metron-parsers/ParserChaining.html @@ -0,0 +1,259 @@ + + + + + + + + + Metron – Parser Chaining + + + + + + + +
    + + + +
    + +
    + +

    Parser Chaining

    +

    +

    Aggregating many different types sensors into a single data source (e.g. syslog) and ingesting that aggregate sensor into Metron is a common pattern. It is not obvious precisely how to manage these types of aggregate sensors as they require two-pass parsing. This document will walk through an example of supporting this kind of multi-pass ingest.

    +

    Multi-pass parser involves the following requirements:

    +
      + +
    • The enveloping parser (e.g. the aggregation format such as syslog or plain CSV) may contain metadata which should be ingested along with the data.
    • +
    • The enveloping sensor contains many different sensor types
    • +
    +

    +

    High Level Solution

    +

    High Level Approach

    +

    At a high level, we continue to maintain the architectural invariant of a 1-1 relationship between logical sensors and storm topologies. Eventually this relationship may become more complex, but at the moment the approach is to construct a routing parser which will have two responsibilities:

    +
      + +
    • Parse the envelope (e.g. syslog data) and extract any metadata fields from the envelope to pass along
    • +
    • Route the unfolded data to the appropriate kafka topic associated with the enveloped sensor data
    • +
    +

    Because the data emitted from the routing parser is just like any data emitted from any other parser, in that it is a JSON blob like any data emitted from any parser, we will need to adjust the downstream parsers to extract the enveloped data from the JSON blob and treat it as the data to parse.

    +

    +

    Architecting a Parser Chaining Solution in Metron

    +

    Currently the approach to fulfill this requirement involves a couple knobs in the Parser infrastructure for Metron.

    +

    Consider the case, for instance, where we have many different TYPES of messages wrapped inside of syslog. As an architectural abstraction, we would want to have the following properties:

    +
      + +
    • separate the concerns of parsing the individual types of messages from each other
    • +
    • separate the concerns of parsing the individual types of messages from parsing the envelope
    • +
    +
    +

    Data Dependent Parser Writing

    +

    Parsers allow users to configure the topic which the kafka producer uses in a couple of ways (from the parser config in an individual parser):

    +
      + +
    • kafka.topic - Specify the topic in the config. This can be updated by updating the config, but it is data independent (e.g. not dependent on the data in a message).
    • +
    • kafka.topicField - Specify the topic as the value of a particular field. If unpopulated, then the message is dropped. This is inherrently data dependent.
    • +
    +

    The kafka.topicField parameter allows for data dependent topic selection and this inherrently enables the routing capabilities necessary for handling enveloped data.

    +
    +

    Flexibly Interpreting Data

    +
    +

    Aside: The Role of Metadata in Metron

    +

    Before we continue, let’s briefly talk about metadata. We have exposed the ability to pass along metadata and interact with metadata in a decoupled way from the actual parser logic (i.e. the GrokParser should not have to consider how to interpret metadata).

    +

    There are three choices about manipulating metadata in Metron:

    +
      + +
    • Should you merge metadata into the downstream message?
    • +
    • If you do, should you use a key prefix to set it off from the message by default?
    • +
    +

    This enables users to specify metadata independent of the data that is persisted downstream and can inform the operations of enrichment and the profiler.

    +
    +

    Interpretation

    +

    Now that we have an approach which enables the routing of the data, the remaining question is how to decouple parsing data from interpreting data and metadata. By default, Metron operates like so:

    +
      + +
    • The kafka record key (as a JSON Map) is considered metadata
    • +
    • The kafka record value is considered data
    • +
    +

    Beyond that, we presume defaults for this default strategy around handling metadata. In particular, by default we do not merge metadata and use a metron.metadata prefix for all metadata.

    +

    In order to enable chained parser WITH metadata, we allow the following to be specified via strategy in the parser config:

    +
      + +
    • How to extract the data from the kafka record
    • +
    • How to extract the metadata from the kafka record
    • +
    • The default operations for merging
    • +
    • The prefix for the metadata key
    • +
    +

    The available strategies, specified by the rawMessageStrategy configuration is eitherENVELOPE or DEFAULT.

    +

    Specifically, to enable parsing enveloped data (i.e. data in a field of a JSON blob with the other fields being metadata), one can specify the strategy and configuration of that strategy in the parser config. One must specify the rawMessageStrategy as ENVELOPE in the parser and the rawMessageStrategyConfig to indicate the field which contains the data.

    +

    Together with routing, we have the complete solution to chain parsers which can:

    +
      + +
    • parse the envelope
    • +
    • route the parsed data to specific parsers
    • +
    • have the specific parsers interpret the data via the rawMessageStrategy whereby they pull the data out from JSON Map that they receive
    • +
    +

    Together this enables a directed acyclic graph of parsers to handle single or multi-layer parsing.

    +
    +

    Example

    +

    For a complete example, look at the parser chaining use-case, however for a simple example the following should suffice.

    +

    If I want to configure a CSV parser to parse data which has 3 columns f1, f2 and f3 and is held in a field called payload inside of a JSON Map, I can do so like this:

    + +
    +
    +
    {
    +  "parserClassName" : "org.apache.metron.parsers.csv.CSVParser"
    +  ,"sensorTopic" : "my_topic"
    +  ,"rawMessageStrategy" : "ENVELOPE"
    +  ,"rawMessageStrategyConfig" : {
    +      "messageField" : "payload",
    +      "metadataPrefix" : ""
    +  }
    +  , "parserConfig": {
    +     "columns" : { "f1": 0,
    +                 , "f2": 1,
    +                 , "f3": 2
    +                 } 
    +   }
    +}
    +
    + +

    This would parse the following message:

    + +
    +
    +
    {
    +  "meta_f1" : "val1",
    +  "payload" : "foo,bar,grok",
    +  "original_string" : "2019 Jul, 01: val1 foo,bar,grok",
    +  "timestamp" : 10000
    +}
    +
    + +

    into

    + +
    +
    +
    {
    +  "meta_f1" : "val1",
    +  "f1" : "foo",
    +  "f2" : "bar",
    +  "f3" : "grok",
    +  "original_string" : "2019 Jul, 01: val1 foo,bar,grok",
    +  "timestamp" : 10002
    +}
    +
    + +

    Note a couple of things here:

    +
      + +
    • The metadata field meta_f1 is not prefixed here because we configured the strategy with metadataPrefix as empty string.
    • +
    • The timestamp is not inherited from the metadata
    • +
    • The original_string is inherited from the metadata
    • +
    +
    +
    +
    +
    +
    +
    +
    +© 2015-2016 The Apache Software Foundation. Apache Metron, Metron, Apache, the Apache feather logo, + and the Apache Metron project logo are trademarks of The Apache Software Foundation. +
    +
    +
    + + http://git-wip-us.apache.org/repos/asf/metron/blob/a97e575f/site/current-book/metron-platform/metron-parsers/index.html ---------------------------------------------------------------------- diff --git a/site/current-book/metron-platform/metron-parsers/index.html b/site/current-book/metron-platform/metron-parsers/index.html index 807a24e..0d1b7e4 100644 --- a/site/current-book/metron-platform/metron-parsers/index.html +++ b/site/current-book/metron-platform/metron-parsers/index.html @@ -1,13 +1,13 @@ - + Metron – Parsers @@ -32,8 +32,8 @@
  • Metron/
  • Documentation/
  • Parsers
  • -
  • | Last Published: 2018-06-07
  • -
  • Version: 0.5.0
  • +
  • | Last Published: 2018-09-12
  • +
  • Version: 0.6.0
  • @@ -55,20 +55,22 @@
  • Platform
  • @@ -152,6 +154,8 @@ limitations under the License.
  • jsonpQuery : A JSON Path query string. If present, the result of the JSON Path query should be a list of messages. This is useful if you have a JSON document which contains a list or array of messages embedded in it, and you do not have another means of splitting the message.
  • +
  • wrapInEntityArray : "true" or "false". If jsonQuery is present and this flag is present and set to "true", the incoming message will be wrapped in a JSON entity and array. for example: {"name":"value"},{"name2","value2} will be wrapped as {"message" : [{"name":"value"},{"name2","value2}]}. This is using the default value for wrapEntityName if that property is not set.
  • +
  • wrapEntityName : Sets the name to use when wrapping JSON using wrapInEntityArray. The jsonpQuery should reference this name.
  • A field called timestamp is expected to exist and, if it does not, then current time is inserted.
  • @@ -179,7 +183,9 @@ limitations under the License.

    Parser Architecture

    Architecture

    -

    Data flows through the parser bolt via kafka and into the enrichments topology in kafka. Errors are collected with the context of the error (e.g. stacktrace) and original message causing the error and sent to an error queue. Invalid messages as determined by global validation functions are also treated as errors and sent to an error queue.

    +

    Data flows through the parser bolt via kafka and into the enrichments topology in kafka. Errors are collected with the context of the error (e.g. stacktrace) and original message causing the error and sent to an error queue. Invalid messages as determined by global validation functions are also treated as errors and sent to an error queue.

    +

    Multiple sensors can be aggregated into a single Storm topology. When this is done, there will be multiple Kafka spouts, but only a single parser bolt which will handle delegating to the correct parser as needed. There are some constraints around this, in particular regarding some configuration. Additionally, all sensors must flow to the same error topic. The Kafka topic is retrieved from the input Tuple itself.

    +

    A worked example of this can be found in the Parser Chaining use case.

    Message Format

    All Metron messages follow a specific format in order to ingest a message. If a message does not conform to this format it will be dropped and put onto an error queue for further examination. The message must be of a JSON format and must have a JSON tag message like so:

    @@ -200,7 +206,7 @@ limitations under the License.
  • timestamp (epoch)
  • original_string: A human friendly string representation of the message
  • -

    The timestamp and original_string fields are madatory. The remaining standard fields are optional. If any of the optional fields are not applicable then the field should be left out of the JSON.

    +

    The timestamp and original_string fields are mandatory. The remaining standard fields are optional. If any of the optional fields are not applicable then the field should be left out of the JSON.

    So putting it all together a typical Metron message with all 5-tuple fields present would look like the following:

    @@ -234,7 +240,8 @@ limitations under the License.
  • raw_message : The raw message in string form
  • raw_message_bytes : The raw message bytes
  • error_hash : A hash of the error message
  • -
    + +

    When aggregating multiple sensors, all sensors must be using the same error topic.

    Parser Configuration

    The configuration for the various parser topologies is defined by JSON documents stored in zookeeper.

    @@ -263,20 +270,29 @@ limitations under the License.
    • sensorTopic : The kafka topic to send the parsed messages to. If the topic is prefixed and suffixed by / then it is assumed to be a regex and will match any topic matching the pattern (e.g. /bro.*/ would match bro_cust0, bro_cust1 and bro_cust2)
    • -
    • readMetadata : Boolean indicating whether to read metadata or not (false by default). See below for a discussion about metadata.
    • -
    • mergeMetadata : Boolean indicating whether to merge metadata with the message or not (false by default). See below for a discussion about metadata.
    • -
    • parserConfig : A JSON Map representing the parser implementation specific configuration.
    • +
    • readMetadata : Boolean indicating whether to read metadata or not (The default is raw message strategy dependent). See below for a discussion about metadata.
    • +
    • mergeMetadata : Boolean indicating whether to merge metadata with the message or not (The default is raw message strategy dependent). See below for a discussion about metadata.
    • +
    • rawMessageStrategy : The strategy to use when reading the raw data and metadata. See below for a discussion about message reading strategies.
    • +
    • rawMessageStrategyConfig : The raw message strategy configuration map. See below for a discussion about message reading strategies.
    • +
    • parserConfig : A JSON Map representing the parser implementation specific configuration. Also include batch sizing and timeout for writer configuration here. +
        + +
      • batchSize : Integer indicating number of records to batch together before sending to the writer. (default to 15)
      • +
      • batchTimeout : The timeout after which a batch will be flushed even if batchSize has not been met. Optional. If unspecified, or set to 0, it defaults to a system-determined duration which is a fraction of the Storm parameter topology.message.timeout.secs. Ignored if batchSize is 1, since this disables batching.
      • +
      • The kafka writer can be configured within the parser config as well. (This is all configured a priori, but this is convenient for overriding the settings). See here
      • +
      +
    • fieldTransformations : An array of complex objects representing the transformations to be done on the message generated from the parser before writing out to the kafka topic.
    • -
    • spoutParallelism : The kafka spout parallelism (default to 1). This can be overridden on the command line.
    • -
    • spoutNumTasks : The number of tasks for the spout (default to 1). This can be overridden on the command line.
    • -
    • parserParallelism : The parser bolt parallelism (default to 1). This can be overridden on the command line.
    • -
    • parserNumTasks : The number of tasks for the parser bolt (default to 1). This can be overridden on the command line.
    • +
    • spoutParallelism : The kafka spout parallelism (default to 1). This can be overridden on the command line, and if there are multiple sensors should be in a comma separated list in the same order as the sensors.
    • +
    • spoutNumTasks : The number of tasks for the spout (default to 1). This can be overridden on the command line, and if there are multiple sensors should be in a comma separated list in the same order as the sensors.
    • +
    • parserParallelism : The parser bolt parallelism (default to 1). If there are multiple sensors, the last one’s configuration will be used. This can be overridden on the command line.
    • +
    • parserNumTasks : The number of tasks for the parser bolt (default to 1). If there are multiple sensors, the last one’s configuration will be used. This can be overridden on the command line.
    • errorWriterParallelism : The error writer bolt parallelism (default to 1). This can be overridden on the command line.
    • errorWriterNumTasks : The number of tasks for the error writer bolt (default to 1). This can be overridden on the command line.
    • numWorkers : The number of workers to use in the topology (default is the storm default of 1).
    • numAckers : The number of acker executors to use in the topology (default is the storm default of 1).
    • -
    • spoutConfig : A map representing a custom spout config (this is a map). This can be overridden on the command line.
    • -
    • securityProtocol : The security protocol to use for reading from kafka (this is a string). This can be overridden on the command line and also specified in the spout config via the security.protocol key. If both are specified, then they are merged and the CLI will take precedence.
    • +
    • spoutConfig : A map representing a custom spout config (this is a map). If there are multiple sensors, the configs will be merged with the last specified taking precedence. This can be overridden on the command line.
    • +
    • securityProtocol : The security protocol to use for reading from kafka (this is a string). This can be overridden on the command line and also specified in the spout config via the security.protocol key. If both are specified, then they are merged and the CLI will take precedence. If multiple sensors are used, any non “PLAINTEXT” value will be used.
    • stormConfig : The storm config to use (this is a map). This can be overridden on the command line. If both are specified, they are merged with CLI properties taking precedence.
    • cacheConfig : Cache config for stellar field transformations. This configures a least frequently used cache. This is a map with the following keys. If not explicitly configured (the default), then no cache will be used.
        @@ -331,17 +347,52 @@ Consider the following scenarios:

      • Custom metadata: Custom metadata from an individual telemetry source that one might want to use within Metron.
      -

      Metadata is controlled by two fields in the parser:

      +

      Metadata is controlled by the following parser configs:

      +
        + +
      • rawMessageStrategy : This is a strategy which indicates how to read data and metadata. The strategies supported are: +
          + +
        • DEFAULT : Data is read directly from the kafka record value and metadata, if any, is read from the kafka record key. This strategy defaults to not reading metadata and not merging metadata. This is the default strategy.
        • +
        • ENVELOPE : Data from kafka record value is presumed to be a JSON blob. One of these fields must contain the raw data to pass to the parser. All other fields should be considered metadata. The field containing the raw data is specified in the rawMessageStrategyConfig. Data held in the kafka key as well as the non-data fields in the JSON blob passed into the kafka value are considered metadata. Note that the exception to this is that any original_string field is inherited from the envelope data so that the original string contains the envelope data. If you do not prefer this behavior, remove this field from the envelope data.
        • +
        +
      • +
      • rawMessageStrategyConfig : The configuration (a map) for the rawMessageStrategy. Available configurations are strategy dependent: +
          + +
        • DEFAULT +
            + +
          • metadataPrefix defines the key prefix for metadata (default is metron.metadata).
          • +
          +
        • +
        • ENVELOPE +
            + +
          • metadataPrefix defines the key prefix for metadata (default is metron.metadata)
          • +
          • messageField defines the field from the envelope to use as the data. All other fields are considered metadata.
          • +
          +
        • +
        +
      • +
      • readMetadata : This is a boolean indicating whether metadata will be read and made available to Field transformations (i.e. Stellar field transformations). The default is dependent upon the rawMessageStrategy: +
          + +
        • DEFAULT : default to false.
        • +
        • ENVELOPE : default to true.
        • +
        +
      • +
      • mergeMetadata : This is a boolean indicating whether metadata fields will be merged with the message automatically. That is to say, if this property is set to true then every metadata field will become part of the messages and, consequently, also available for use in field transformations. The default is dependent upon the rawMessageStrategy:
          -
        • readMetadata : This is a boolean indicating whether metadata will be read and made available to Field transformations (i.e. Stellar field transformations). The default is false.
        • -
        • mergeMetadata : This is a boolean indicating whether metadata fields will be merged with the message automatically.
          -That is to say, if this property is set to true then every metadata field will become part of the messages and, consequently, also available for use in field transformations.
        • +
        • DEFAULT : default to false.
        • +
        • ENVELOPE : default to true.
        • +
        +

      Field Naming

      -

      In order to avoid collisions from metadata fields, metadata fields will be prefixed with metron.metadata..
      -So, for instance the kafka topic would be in the field metron.metadata.topic.

      +

      In order to avoid collisions from metadata fields, metadata fields will be prefixed (the default is metron.metadata., but this is configurable in the rawMessageStrategyConfig). So, for instance the kafka topic would be in the field metron.metadata.topic.

    Specifying Custom Metadata

    Custom metadata is specified by sending a JSON Map in the key. If no key is sent, then, obviously, no metadata will be parsed. For instance, sending a metadata field called customer_id could be done by sending

    @@ -470,6 +521,36 @@ So, for instance the kafka topic would be in the field metron.metadata.topic ] }
    + +
      + +
    • REGEX_SELECT : This transformation lets users set an output field to one of a set of possibilities based on matching regexes. This transformation is useful when the number or conditions are large enough to make a stellar language match statement unwieldy.
    • +
    +

    The following config will set the field logical_source_type to one of the following, dependent upon the value of the pix_type field:

    +
      + +
    • cisco-6-302 if pix_type starts with either 6-302 or 06-302
    • +
    • cisco-5-304 if pix_type starts with 5-304
    • +
    + +
    +
    +
    {
    +...
    +  "fieldTransformations" : [
    +    {
    +     "transformation" : "REGEX_ROUTING"
    +    ,"input" :  "pix_type"
    +    ,"output" :  "logical_source_type"
    +    ,"config" : {
    +      "cisco-6-302" : [ "^6-302.*", "^06-302.*"]
    +      "cisco-5-304" : "^5-304.*"
    +                }
    +    }
    +                           ]
    +...  
    +}
    +

    Assignment to null

    @@ -685,10 +766,12 @@ HH:mm:ss', MAP_GET(dc, dc2tz, 'UTC') )"

    Notes on Performance Tuning

    Default installed Metron is untuned for production deployment. There are a few knobs to tune to get the most out of your system.

    +

    When using aggregated parsers, it’s highly recommended to aggregate parsers with similar velocity and parser complexity together.

    Notes on Adding a New Sensor

    In order to allow for meta alerts to be queries alongside regular alerts in Elasticsearch 2.x, it is necessary to add an additional field to the templates and mapping for existing sensors.

    -

    Please see a description of the steps necessary to make this change in the metron-elasticsearch Using Metron with Elasticsearch 2.x

    +

    Please see a description of the steps necessary to make this change in the metron-elasticsearch Using Metron with Elasticsearch 2.x

    +

    If Solr is selected as the real-time store, it is also necessary to add additional fields. See the Solr section in metron-indexing for more details.

    Kafka Queue

    The kafka queue associated with your parser is a collection point for all of the data sent to your parser. As such, make sure that the number of partitions in the kafka topic is sufficient to handle the throughput that you expect from your parser topology.

    http://git-wip-us.apache.org/repos/asf/metron/blob/a97e575f/site/current-book/metron-platform/metron-parsers/parser-testing.html ---------------------------------------------------------------------- diff --git a/site/current-book/metron-platform/metron-parsers/parser-testing.html b/site/current-book/metron-platform/metron-parsers/parser-testing.html index ff22013..945aebc 100644 --- a/site/current-book/metron-platform/metron-parsers/parser-testing.html +++ b/site/current-book/metron-platform/metron-parsers/parser-testing.html @@ -1,13 +1,13 @@ - + Metron – Parser Contribution and Testing @@ -32,8 +32,8 @@
  • Metron/
  • Documentation/
  • Parser Contribution and Testing
  • -
  • | Last Published: 2018-06-07
  • -
  • Version: 0.5.0
  • +
  • | Last Published: 2018-09-12
  • +
  • Version: 0.6.0