metron-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From mmiklav...@apache.org
Subject [metron] branch master updated: METRON-1950: Site-book generation broken in master (mmiklavc) closes apache/metron#1309
Date Thu, 20 Dec 2018 19:23:02 GMT
This is an automated email from the ASF dual-hosted git repository.

mmiklavcic pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/metron.git


The following commit(s) were added to refs/heads/master by this push:
     new 9e026e3  METRON-1950: Site-book generation broken in master (mmiklavc) closes apache/metron#1309
9e026e3 is described below

commit 9e026e3e902769dae364b4c4acf64e00839d24f5
Author: mmiklavc <michael.miklavcic@gmail.com>
AuthorDate: Thu Dec 20 12:22:19 2018 -0700

    METRON-1950: Site-book generation broken in master (mmiklavc) closes apache/metron#1309
---
 metron-platform/metron-parsing/README.md           | 536 +++++++++++----------
 .../{metron-parsers-common => }/parser_arch.png    | Bin
 site-book/bin/generate-md.sh                       |   6 +-
 3 files changed, 276 insertions(+), 266 deletions(-)

diff --git a/metron-platform/metron-parsing/README.md b/metron-platform/metron-parsing/README.md
index 76b6168..9a46532 100644
--- a/metron-platform/metron-parsing/README.md
+++ b/metron-platform/metron-parsing/README.md
@@ -21,127 +21,129 @@ limitations under the License.
 
 Parsers are pluggable components which are used to transform raw data
 (textual or raw bytes) into JSON messages suitable for downstream
-enrichment and indexing.  
+enrichment and indexing.
 
 There are two general types types of parsers:
 * A parser written in Java which conforms to the `MessageParser` interface.  This kind of
parser is optimized for speed and performance and is built for use with higher velocity topologies.
 These parsers are not easily modifiable and in order to make changes to them the entire topology
need to be recompiled.  
 * A general purpose parser.  This type of parser is primarily designed for lower-velocity
topologies or for quickly standing up a parser for a new telemetry before a permanent Java
parser can be written for it.  As of the time of this writing, we have:
-  * Grok parser: `org.apache.metron.parsers.GrokParser` with possible `parserConfig` entries
of 
-    * `grokPath` : The path in HDFS (or in the Jar) to the grok statement
-    * `patternLabel` : The pattern label to use from the grok statement
-    * `multiLine` : The raw data passed in should be handled as a long with multiple lines,
with each line to be parsed separately. This setting's valid values are 'true' or 'false'.
 The default if unset is 'false'. When set the parser will handle multiple lines with successfully
processed lines emitted normally, and lines with errors sent to the error topic.
-    * `timestampField` : The field to use for timestamp
-    * `timeFields` : A list of fields to be treated as time
-    * `dateFormat` : The date format to use to parse the time fields
-    * `timezone` : The timezone to use. `UTC` is default.
-    * The Grok parser supports either 1 line to parse per incoming message, or incoming messages
with multiple log lines, and will produce a json message per line
-  * CSV Parser: `org.apache.metron.parsers.csv.CSVParser` with possible `parserConfig` entries
of
-    * `timestampFormat` : The date format of the timestamp to use.  If unspecified, the parser
assumes the timestamp is ms since unix epoch.
-    * `columns` : A map of column names you wish to extract from the CSV to their offsets
(e.g. `{ 'name' : 1, 'profession' : 3}`  would be a column map for extracting the 2nd and
4th columns from a CSV)
-    * `separator` : The column separator, `,` by default.
-  * JSON Map Parser: `org.apache.metron.parsers.json.JSONMapParser` with possible `parserConfig`
entries of
-    * `mapStrategy` : A strategy to indicate how to handle multi-dimensional Maps.  This
is one of
-      * `DROP` : Drop fields which contain maps
-      * `UNFOLD` : Unfold inner maps.  So `{ "foo" : { "bar" : 1} }` would turn into `{"foo.bar"
: 1}`
-      * `ALLOW` : Allow multidimensional maps
-      * `ERROR` : Throw an error when a multidimensional map is encountered
-    * `jsonpQuery` : A [JSON Path](#json_path) query string. If present, the result of the
JSON Path query should be a list of messages. This is useful if you have a JSON document which
contains a list or array of messages embedded in it, and you do not have another means of
splitting the message.
-    * `wrapInEntityArray` : `"true" or "false"`. If `jsonQuery` is present and this flag
is present and set to `"true"`, the incoming message will be wrapped in a JSON  entity and
array.
-       for example:
-       `{"name":"value"},{"name2","value2}` will be wrapped as `{"message" : [{"name":"value"},{"name2","value2}]}`.
-       This is using the default value for `wrapEntityName` if that property is not set.
-    * `wrapEntityName` : Sets the name to use when wrapping JSON using `wrapInEntityArray`.
 The `jsonpQuery` should reference this name.
-    * A field called `timestamp` is expected to exist and, if it does not, then current time
is inserted.  
-  * Regular Expressions Parser
-      * `recordTypeRegex` : A regular expression to uniquely identify a record type.
-      * `messageHeaderRegex` : A regular expression used to extract fields from a message
part which is common across all the messages.
-      * `convertCamelCaseToUnderScore` : If this property is set to true, this parser will
automatically convert all the camel case property names to underscore seperated. 
-          For example, following convertions will automatically happen:
-
-          ```
-          ipSrcAddr -> ip_src_addr
-          ipDstAddr -> ip_dst_addr
-          ipSrcPort -> ip_src_port
-          ```
-          Note this property may be necessary, because java does not support underscores
in the named group names. So in case your property naming conventions requires underscores
in property names, use this property.
-          
-      * `fields` : A json list of maps contaning a record type to regular expression mapping.
-      
-      A complete configuration example would look like:
-      
-      ```json
-      "convertCamelCaseToUnderScore": true, 
-      "recordTypeRegex": "kernel|syslog",
-      "messageHeaderRegex": "(<syslogPriority>(<=^&lt;)\\d{1,4}(?=>)).*?(<timestamp>(<=>)[A-Za-z]
{3}\\s{1,2}\\d{1,2}\\s\\d{1,2}:\\d{1,2}:\\d{1,2}(?=\\s)).*?(<syslogHost>(<=\\s).*?(?=\\s))",
-      "fields": [
-        {
-          "recordType": "kernel",
-          "regex": ".*(<eventInfo>(<=\\]|\\w\\:).*?(?=$))"
-        },
-        {
-          "recordType": "syslog",
-          "regex": ".*(<processid>(<=PID\\s=\\s).*?(?=\\sLine)).*(<filePath>(<=64\\s)\/([A-Za-z0-9_-]+\/)+(?=\\w))
       (<fileName>.*?(?=\")).*(<eventInfo>(<=\").*?(?=$))"
-        }
-      ]
-      ```
-      **Note**: messageHeaderRegex and regex (withing fields) could be specified as lists
also e.g.
-      ```json
-          "messageHeaderRegex": [
+    * Grok parser: `org.apache.metron.parsers.GrokParser` with possible `parserConfig` entries
of
+        * `grokPath` : The path in HDFS (or in the Jar) to the grok statement
+        * `patternLabel` : The pattern label to use from the grok statement
+        * `multiLine` : The raw data passed in should be handled as a long with multiple
lines, with each line to be parsed separately. This setting's valid values are 'true' or 'false'.
 The default if unset is 'false'. When set the parser will handle multiple lines with successfully
processed lines emitted normally, and lines with errors sent to the error topic.
+        * `timestampField` : The field to use for timestamp
+        * `timeFields` : A list of fields to be treated as time
+        * `dateFormat` : The date format to use to parse the time fields
+        * `timezone` : The timezone to use. `UTC` is default.
+        * The Grok parser supports either 1 line to parse per incoming message, or incoming
messages with multiple log lines, and will produce a json message per line
+    * CSV Parser: `org.apache.metron.parsers.csv.CSVParser` with possible `parserConfig`
entries of
+        * `timestampFormat` : The date format of the timestamp to use.  If unspecified, the
parser assumes the timestamp is ms since unix epoch.
+        * `columns` : A map of column names you wish to extract from the CSV to their offsets
(e.g. `{ 'name' : 1, 'profession' : 3}`  would be a column map for extracting the 2nd and
4th columns from a CSV)
+        * `separator` : The column separator, `,` by default.
+    * JSON Map Parser: `org.apache.metron.parsers.json.JSONMapParser` with possible `parserConfig`
entries of
+        * `mapStrategy` : A strategy to indicate how to handle multi-dimensional Maps.  This
is one of
+            * `DROP` : Drop fields which contain maps
+            * `UNFOLD` : Unfold inner maps.  So `{ "foo" : { "bar" : 1} }` would turn into
`{"foo.bar" : 1}`
+            * `ALLOW` : Allow multidimensional maps
+            * `ERROR` : Throw an error when a multidimensional map is encountered
+        * `jsonpQuery` : A [JSON Path](#json_path) query string. If present, the result of
the JSON Path query should be a list of messages. This is useful if you have a JSON document
which contains a list or array of messages embedded in it, and you do not have another means
of splitting the message.
+        * `wrapInEntityArray` : `"true" or "false"`. If `jsonQuery` is present and this flag
is present and set to `"true"`, the incoming message will be wrapped in a JSON  entity and
array.
+           for example:
+           `{"name":"value"},{"name2","value2}` will be wrapped as `{"message" : [{"name":"value"},{"name2","value2}]}`.
+           This is using the default value for `wrapEntityName` if that property is not set.
+        * `wrapEntityName` : Sets the name to use when wrapping JSON using `wrapInEntityArray`.
 The `jsonpQuery` should reference this name.
+        * A field called `timestamp` is expected to exist and, if it does not, then current
time is inserted.
+    * Regular Expressions Parser
+        * `recordTypeRegex` : A regular expression to uniquely identify a record type.
+        * `messageHeaderRegex` : A regular expression used to extract fields from a message
part which is common across all the messages.
+        * `convertCamelCaseToUnderScore` : If this property is set to true, this parser will
automatically convert all the camel case property names to underscore seperated. For example,
following conversions will automatically happen:
+
+            ```
+            ipSrcAddr -> ip_src_addr
+            ipDstAddr -> ip_dst_addr
+            ipSrcPort -> ip_src_port
+            ```
+
+            Note this property may be necessary, because java does not support underscores
in the named group names. So in case your property naming conventions requires underscores
in property names, use this property.
+
+        * `fields` : A json list of maps contaning a record type to regular expression mapping.
+
+        A complete configuration example would look like:
+
+        ```json
+        "convertCamelCaseToUnderScore": true,
+        "recordTypeRegex": "kernel|syslog",
+        "messageHeaderRegex": "(<syslogPriority>(<=^&lt;)\\d{1,4}(?=>)).*?(<timestamp>(<=>)[A-Za-z]
{3}\\s{1,2}\\d{1,2}\\s\\d{1,2}:\\d{1,2}:\\d{1,2}(?=\\s)).*?(<syslogHost>(<=\\s).*?(?=\\s))",
+        "fields": [
+          {
+            "recordType": "kernel",
+            "regex": ".*(<eventInfo>(<=\\]|\\w\\:).*?(?=$))"
+          },
+          {
+            "recordType": "syslog",
+            "regex": ".*(<processid>(<=PID\\s=\\s).*?(?=\\sLine)).*(<filePath>(<=64\\s)\/([A-Za-z0-9_-]+\/)+(?=\\w))
       (<fileName>.*?(?=\")).*(<eventInfo>(<=\").*?(?=$))"
+          }
+        ]
+        ```
+
+        **Note**: messageHeaderRegex and regex (withing fields) could be specified as lists
also e.g.
+
+        ```json
+        "messageHeaderRegex": [
           "regular expression 1",
           "regular expression 2"
-          ]
-      ```
-      Where **regular expression 1** are valid regular expressions and may have named
-      groups, which would be extracted into fields. This list will be evaluated in order
until a
-      matching regular expression is found.
-      
-      **messageHeaderRegex** is run on all the messages.
-      Yes, all the messages are expected to contain the fields which are being extracted
using the **messageHeaderRegex**.
-      **messageHeaderRegex** is a sort of HCF (highest common factor) in all messages.
-      
-      **recordTypeRegex** can be a more advanced regular expression containing named goups.
For example
-  
-      "recordTypeRegex": "(&lt;process&gt;(<=\\s)\\b(kernel|syslog)\\b(?=\\[|:))"
-      
-      Here all the named goups (process in above example) will be extracted as fields.
-
-      Though having named group in recordType is completely optional, still one could want
extract named groups in recordType for following reasons:
-
-      1. Since **recordType** regular expression is already getting matched and we are paying
the price for a regular expression match already,
-      we can extract certain fields as a by product of this match.
-      2. Most likely the **recordType** field is common across all the messages. Hence having
it extracted in the recordType (or messageHeaderRegex) would
-      reduce the overall complexity of regular expressions in the regex field.
-      
-      **regex** within a field could be a list of regular expressions also. In this case
all regular expressions in the list will be attempted to match until a match is found. Once
a full match is found remaining regular expressions are ignored.
-  
-      ```json
-          "regex":  [ "record type specific regular expression 1",
-                      "record type specific regular expression 2"]
-
-      ```
-
-      **timesamp**
-
-      Since this parser is a general purpose parser, it will populate the timestamp field
with current UTC timestamp. Actual timestamp value can be overridden later using stellar.
-      For example in case of syslog timestamps, one could use following stellar construct
to override the timestamp value.
-      Let us say you parsed actual timestamp from the raw log:
-
-      <38>Jun 20 15:01:17 hostName sshd[11672]: Accepted publickey for prod from 55.55.55.55
port 66666 ssh2
-
-      syslogTimestamp="Jun 20 15:01:17"
-
-      Then something like below could be used to override the timestamp.
-
-      ```
-      "timestamp_str": "FORMAT('%s%s%s', YEAR(),' ',syslogTimestamp)",
-      "timestamp":"TO_EPOCH_TIMESTAMP(timestamp_str, 'yyyy MMM dd HH:mm:ss' )"
-      ```
-
-      OR, if you want to factor in the timezone
-
-      ```
-      "timestamp":"TO_EPOCH_TIMESTAMP(timestamp_str, timestamp_format, timezone_name )"
-      ```
+        ]
+        ```
+
+        Where **regular expression 1** are valid regular expressions and may have named
+        groups, which would be extracted into fields. This list will be evaluated in order
until a
+        matching regular expression is found.
+
+        **messageHeaderRegex** is run on all the messages.
+        Yes, all the messages are expected to contain the fields which are being extracted
using the **messageHeaderRegex**.
+        **messageHeaderRegex** is a sort of HCF (highest common factor) in all messages.
+
+        **recordTypeRegex** can be a more advanced regular expression containing named goups.
For example
+
+        "recordTypeRegex": "(&lt;process&gt;(<=\\s)\\b(kernel|syslog)\\b(?=\\[|:))"
+
+        Here all the named goups (process in above example) will be extracted as fields.
+
+        Though having named group in recordType is completely optional, still one could want
extract named groups in recordType for following reasons:
+
+        1. Since **recordType** regular expression is already getting matched and we are
paying the price for a regular expression match already,
+        we can extract certain fields as a by product of this match.
+        2. Most likely the **recordType** field is common across all the messages. Hence
having it extracted in the recordType (or messageHeaderRegex) would
+        reduce the overall complexity of regular expressions in the regex field.
+
+        **regex** within a field could be a list of regular expressions also. In this case
all regular expressions in the list will be attempted to match until a match is found. Once
a full match is found remaining regular expressions are ignored.
+
+        ```json
+        "regex":  [ "record type specific regular expression 1",
+                    "record type specific regular expression 2"]
+        ```
+
+        **timesamp**
+
+        Since this parser is a general purpose parser, it will populate the timestamp field
with current UTC timestamp. Actual timestamp value can be overridden later using stellar.
+        For example in case of syslog timestamps, one could use following stellar construct
to override the timestamp value.
+        Let us say you parsed actual timestamp from the raw log:
+
+        `<38>Jun 20 15:01:17 hostName sshd[11672]: Accepted publickey for prod from
55.55.55.55 port 66666 ssh2`
+
+        syslogTimestamp="Jun 20 15:01:17"
+
+        Then something like below could be used to override the timestamp.
+
+        ```
+        "timestamp_str": "FORMAT('%s%s%s', YEAR(),' ',syslogTimestamp)",
+        "timestamp":"TO_EPOCH_TIMESTAMP(timestamp_str, 'yyyy MMM dd HH:mm:ss' )"
+        ```
+
+        OR, if you want to factor in the timezone
+
+        ```
+        "timestamp":"TO_EPOCH_TIMESTAMP(timestamp_str, timestamp_format, timezone_name )"
+        ```
 
 ## Parser Error Routing
 
@@ -204,15 +206,15 @@ So putting it all together a typical Metron message with all 5-tuple
fields pres
 
 ```json
 {
-"message": 
-{"ip_src_addr": xxxx, 
-"ip_dst_addr": xxxx, 
-"ip_src_port": xxxx, 
-"ip_dst_port": xxxx, 
-"protocol": xxxx, 
-"original_string": xxx,
-"additional-field 1": xxx,
-}
+  "message": {
+    "ip_src_addr": xxxx,
+    "ip_dst_addr": xxxx,
+    "ip_src_port": xxxx,
+    "ip_dst_port": xxxx,
+    "protocol": xxxx,
+    "original_string": xxx,
+    "additional-field 1": xxx
+  }
 }
 ```
 
@@ -246,16 +248,19 @@ The document is structured in the following way
 
 * `parserClassName` : The fully qualified classname for the parser to be used.
 * `filterClassName` : The filter to use.  This may be a fully qualified classname of a Class
that implements the `org.apache.metron.parsers.interfaces.MessageFilter<JSONObject>`
interface.  Message Filters are intended to allow the user to ignore a set of messages via
custom logic.  The existing implementations are:
-  * `STELLAR` : Allows you to apply a stellar statement which returns a boolean, which will
pass every message for which the statement returns `true`.  The Stellar statement that is
to be applied is specified by the `filter.query` property in the `parserConfig`.
-Example Stellar Filter which includes messages which contain a the `field1` field:
-```
-   {
-    "filterClassName" : "STELLAR"
-   ,"parserConfig" : {
-    "filter.query" : "exists(field1)"
-    }
-   }
-```
+    * `STELLAR` : Allows you to apply a stellar statement which returns a boolean, which
will pass every message for which the statement returns `true`.  The Stellar statement that
is to be applied is specified by the `filter.query` property in the `parserConfig`.
+
+        Example Stellar Filter which includes messages which contain a the `field1` field:
+
+        ```
+        {
+          "filterClassName" : "STELLAR",
+          "parserConfig" : {
+            "filter.query" : "exists(field1)"
+          }
+        }
+        ```
+
 * `sensorTopic` : The kafka topic to send the parsed messages to.  If the topic is prefixed
and suffixed by `/` 
 then it is assumed to be a regex and will match any topic matching the pattern (e.g. `/bro.*/`
would match `bro_cust0`, `bro_cust1` and `bro_cust2`)
 * `readMetadata` : Boolean indicating whether to read metadata or not (The default is raw
message strategy dependent).  See below for a discussion about metadata.
@@ -263,26 +268,27 @@ then it is assumed to be a regex and will match any topic matching the
pattern (
 * `rawMessageStrategy` : The strategy to use when reading the raw data and metadata.  See
below for a discussion about message reading strategies.
 * `rawMessageStrategyConfig` : The raw message strategy configuration map.  See below for
a discussion about message reading strategies.
 * `parserConfig` : A JSON Map representing the parser implementation specific configuration.
Also include batch sizing and timeout for writer configuration here.
-  * `batchSize` : Integer indicating number of records to batch together before sending to
the writer. (default to `15`)
-  * `batchTimeout` : The timeout after which a batch will be flushed even if batchSize has
not been met.  Optional.
-    If unspecified, or set to `0`, it defaults to a system-determined duration which is a
fraction of the Storm
-    parameter `topology.message.timeout.secs`.  Ignored if batchSize is `1`, since this disables
batching.
-  * The kafka writer can be configured within the parser config as well.  (This is all configured
a priori, but this is convenient for overriding the settings).  See [here](../../metron-writer/README.md#kafka-writer)
+    * `batchSize` : Integer indicating number of records to batch together before sending
to the writer. (default to `15`)
+    * `batchTimeout` : The timeout after which a batch will be flushed even if batchSize
has not been met.  Optional.
+      If unspecified, or set to `0`, it defaults to a system-determined duration which is
a fraction of the Storm
+      parameter `topology.message.timeout.secs`.  Ignored if batchSize is `1`, since this
disables batching.
+    * The kafka writer can be configured within the parser config as well.  (This is all
configured a priori, but this is convenient for overriding the settings).  See [here](../../metron-writer/README.md#kafka-writer)
 * `fieldTransformations` : An array of complex objects representing the transformations to
be done on the message generated from the parser before writing out to the kafka topic.
 * `securityProtocol` : The security protocol to use for reading from kafka (this is a string).
 This can be overridden on the command line and also specified in the spout config via the
`security.protocol` key.  If both are specified, then they are merged and the CLI will take
precedence. If multiple sensors are used, any non "PLAINTEXT" value will be used.
 * `cacheConfig` : Cache config for stellar field transformations.   This configures a least
frequently used cache.  This is a map with the following keys.  If not explicitly configured
(the default), then no cache will be used.
-  * `stellar.cache.maxSize` - The maximum number of elements in the cache. Default is to
not use a cache.
-  * `stellar.cache.maxTimeRetain` - The maximum amount of time an element is kept in the
cache (in minutes). Default is to not use a cache.
+    * `stellar.cache.maxSize` - The maximum number of elements in the cache. Default is to
not use a cache.
+    * `stellar.cache.maxTimeRetain` - The maximum amount of time an element is kept in the
cache (in minutes). Default is to not use a cache.
 
-  Example of a cache config to contain at max `20000` stellar expressions for at most `20`
minutes.:
-```
-{
-  "cacheConfig" : {
-    "stellar.cache.maxSize" : 20000,
-    "stellar.cache.maxTimeRetain" : 20
-  }
-}
-```
+        Example of a cache config to contain at max `20000` stellar expressions for at most
`20` minutes.:
+
+        ```
+        {
+          "cacheConfig" : {
+            "stellar.cache.maxSize" : 20000,
+            "stellar.cache.maxTimeRetain" : 20
+          }
+        }
+        ```
 
 The `fieldTransformations` is a complex object which defines a
 transformation which can be done to a message.  This transformation can 
@@ -298,36 +304,34 @@ For platform specific configs, see the README of the appropriate project.
This w
 Metadata is a useful thing to send to Metron and use during enrichment or threat intelligence.
 
 Consider the following scenarios:
 * You have multiple telemetry sources of the same type that you want to 
-  * ensure downstream analysts can differentiate
-  * ensure profiles consider independently as they have different seasonality or some other
fundamental characteristic
+    * ensure downstream analysts can differentiate
+    * ensure profiles consider independently as they have different seasonality or some other
fundamental characteristic
 
 As such, there are two types of metadata that we seek to support in Metron:
 * Environmental metadata : Metadata about the system at large
-   * Consider the possibility that you have multiple kafka topics being processed by one
parser and you want to tag the messages with the kafka topic
-   * At the moment, only the kafka topic is kept as the field name.
+    * Consider the possibility that you have multiple kafka topics being processed by one
parser and you want to tag the messages with the kafka topic
+    * At the moment, only the kafka topic is kept as the field name.
 * Custom metadata: Custom metadata from an individual telemetry source that one might want
to use within Metron. 
 
 Metadata is controlled by the following parser configs:
-* `rawMessageStrategy` : This is a strategy which indicates how to read
-  data and metadata.  The strategies supported are:
-  * `DEFAULT` : Data is read directly from the kafka record value and metadata, if any, is
read from the kafka record key.  This strategy defaults to not reading metadata and not merging
metadata.  This is the default strategy.
-  * `ENVELOPE` : Data from kafka record value is presumed to be a JSON blob. One of
-    these fields must contain the raw data to pass to the parser.  All other fields should
be considered metadata.  The field containing the raw data is specified in the `rawMessageStrategyConfig`.
 Data held in the kafka key as well as the non-data fields in the JSON blob passed into the
kafka value are considered metadata. Note that the exception to this is that any `original_string`
field is inherited from the envelope data so that the original string contains the envelope
data.  If y [...]
+* `rawMessageStrategy` : This is a strategy which indicates how to read data and metadata.
 The strategies supported are:
+    * `DEFAULT` : Data is read directly from the kafka record value and metadata, if any,
is read from the kafka record key.  This strategy defaults to not reading metadata and not
merging metadata.  This is the default strategy.
+    * `ENVELOPE` : Data from kafka record value is presumed to be a JSON blob. One of
+      these fields must contain the raw data to pass to the parser.  All other fields should
be considered metadata.  The field containing the raw data is specified in the `rawMessageStrategyConfig`.
 Data held in the kafka key as well as the non-data fields in the JSON blob passed into the
kafka value are considered metadata. Note that the exception to this is that any `original_string`
field is inherited from the envelope data so that the original string contains the envelope
data.  If [...]
 * `rawMessageStrategyConfig` : The configuration (a map) for the `rawMessageStrategy`.  Available
configurations are strategy dependent:
-  * `DEFAULT` 
-    * `metadataPrefix` defines the key prefix for metadata (default is `metron.metadata`).
-  * `ENVELOPE` 
-    * `metadataPrefix` defines the key prefix for metadata (default is `metron.metadata`)

-    * `messageField` defines the field from the envelope to use as the data.  All other fields
are considered metadata.
+    * `DEFAULT`
+        * `metadataPrefix` defines the key prefix for metadata (default is `metron.metadata`).
+    * `ENVELOPE`
+        * `metadataPrefix` defines the key prefix for metadata (default is `metron.metadata`)
+        * `messageField` defines the field from the envelope to use as the data.  All other
fields are considered metadata.
 * `readMetadata` : This is a boolean indicating whether metadata will be read and made available
to Field 
 transformations (i.e. Stellar field transformations).  The default is
 dependent upon the `rawMessageStrategy`:
-  * `DEFAULT` : default to `false`.
-  * `ENVELOPE` : default to `true`.
+    * `DEFAULT` : default to `false`.
+    * `ENVELOPE` : default to `true`.
 * `mergeMetadata` : This is a boolean indicating whether metadata fields will be merged with
the message automatically.  That is to say, if this property is set to `true` then every metadata
field will become part of the messages and, consequently, also available for use in field
transformations.  The default is dependent upon the `rawMessageStrategy`:
-  * `DEFAULT` : default to `false`.
-  * `ENVELOPE` : default to `true`.
-
+    * `DEFAULT` : default to `false`.
+    * `ENVELOPE` : default to `true`.
 
 #### Field Naming
 
@@ -359,119 +363,125 @@ The format of a `fieldTransformation` is as follows:
 The currently implemented fieldTransformations are:
 * `REMOVE` : This transformation removes the specified input fields.  If you want a conditional
removal, you can pass a Metron Query Language statement to define the conditions under which
you want to remove the fields. 
 
-Consider the following simple configuration which will remove `field1`
-unconditionally:
-```
-{
-...
-    "fieldTransformations" : [
-          {
-            "input" : "field1"
-          , "transformation" : "REMOVE"
-          }
-                      ]
-}
-```
+    Consider the following simple configuration which will remove `field1`
+    unconditionally:
 
-Consider the following simple sensor parser configuration which will remove `field1`
-whenever `field2` exists and whose corresponding equal to 'foo':
-```
-{
-...
-  "fieldTransformations" : [
-          {
-            "input" : "field1"
-          , "transformation" : "REMOVE"
-          , "config" : {
-              "condition" : "exists(field2) and field2 == 'foo'"
-                       }
-          }
-                      ]
-}
-```
+    ```
+    {
+    ...
+        "fieldTransformations" : [
+              {
+                "input" : "field1"
+              , "transformation" : "REMOVE"
+              }
+                          ]
+    }
+    ```
+
+    Consider the following simple sensor parser configuration which will remove `field1`
+    whenever `field2` exists and whose corresponding equal to 'foo':
+
+    ```
+    {
+    ...
+      "fieldTransformations" : [
+              {
+                "input" : "field1"
+              , "transformation" : "REMOVE"
+              , "config" : {
+                  "condition" : "exists(field2) and field2 == 'foo'"
+                           }
+              }
+                          ]
+    }
+    ```
 
 * `SELECT`: This transformation filters the fields in the message to include only the configured
output fields, and drops any not explicitly included. 
 
-For example: 
-```
-{
-...
-    "fieldTransformations" : [
-          {
-            "output" : ["field1", "field2" ] 
-          , "transformation" : "SELECT"
-          }
-                      ]
-}
-```
+    For example:
+
+    ```
+    {
+    ...
+        "fieldTransformations" : [
+              {
+                "output" : ["field1", "field2" ]
+              , "transformation" : "SELECT"
+              }
+                          ]
+    }
+    ```
 
-when applied to a message containing keys field1, field2 and field3, will only output the
first two. It is also worth noting that two standard fields - timestamp and original_source
- will always be passed along whether they are listed in output or not, since they are considered
core required fields.
+    when applied to a message containing keys field1, field2 and field3, will only output
the first two. It is also worth noting that two standard fields - timestamp and original_source
- will always be passed along whether they are listed in output or not, since they are considered
core required fields.
 
 * `IP_PROTOCOL` : This transformation maps IANA protocol numbers to consistent string representations.
 
-Consider the following sensor parser config to map the `protocol` field
-to a textual representation of the protocol:
-```
-{
-...
-    "fieldTransformations" : [
-          {
-            "input" : "protocol"
-          , "transformation" : "IP_PROTOCOL"
-          }
-                      ]
-}
-```
+    Consider the following sensor parser config to map the `protocol` field
+    to a textual representation of the protocol:
+
+    ```
+    {
+    ...
+        "fieldTransformations" : [
+              {
+                "input" : "protocol"
+              , "transformation" : "IP_PROTOCOL"
+              }
+                          ]
+    }
+    ```
 
-This transformation would transform `{ "protocol" : 6, "source.type" : "bro", ... }` 
-into `{ "protocol" : "TCP", "source.type" : "bro", ...}`
+    This transformation would transform `{ "protocol" : 6, "source.type" : "bro", ... }`
+    into `{ "protocol" : "TCP", "source.type" : "bro", ...}`
 
-* `STELLAR` : This transformation executes a set of transformations
-  expressed as [Stellar Language](../../metron-common) statements.
+* `STELLAR` : This transformation executes a set of transformations expressed as [Stellar
Language](../../metron-common) statements.
 
 * `RENAME` : This transformation allows users to rename a set of fields.  Specifically,
 the config is presumed to be the mapping.  The keys to the config are the existing field
names
 and the values for the config map are the associated new field name.
 
-The following config will rename the fields `old_field` and `different_old_field` to
-`new_field` and `different_new_field` respectively:
-```
-{
-...
-    "fieldTransformations" : [
-          {
-            "transformation" : "RENAME",
-          , "config" : {
-            "old_field" : "new_field",
-            "different_old_field" : "different_new_field"
-                       }
-          }
-                      ]
-}
-```
+    The following config will rename the fields `old_field` and `different_old_field` to
+    `new_field` and `different_new_field` respectively:
+
+    ```
+    {
+    ...
+        "fieldTransformations" : [
+              {
+                "transformation" : "RENAME",
+              , "config" : {
+                "old_field" : "new_field",
+                "different_old_field" : "different_new_field"
+                           }
+              }
+                          ]
+    }
+    ```
+
 * `REGEX_SELECT` : This transformation lets users set an output field to one of a set of
possibilities based on matching regexes. This transformation is useful when the number or
conditions are large enough to make a stellar language match statement unwieldy.
  
-The following config will set the field `logical_source_type` to one of the
-following, dependent upon the value of the `pix_type` field:
-* `cisco-6-302` if `pix_type` starts with either `6-302` or `06-302`
-* `cisco-5-304` if `pix_type` starts with `5-304`
-```
-{
-...
-  "fieldTransformations" : [
+    The following config will set the field `logical_source_type` to one of the
+    following, dependent upon the value of the `pix_type` field:
+    * `cisco-6-302` if `pix_type` starts with either `6-302` or `06-302`
+    * `cisco-5-304` if `pix_type` starts with `5-304`
+
+    ```
     {
-     "transformation" : "REGEX_ROUTING"
-    ,"input" :  "pix_type"
-    ,"output" :  "logical_source_type"
-    ,"config" : {
-      "cisco-6-302" : [ "^6-302.*", "^06-302.*"]
-      "cisco-5-304" : "^5-304.*"
-                }
+    ...
+      "fieldTransformations" : [
+        {
+         "transformation" : "REGEX_ROUTING"
+        ,"input" :  "pix_type"
+        ,"output" :  "logical_source_type"
+        ,"config" : {
+          "cisco-6-302" : [ "^6-302.*", "^06-302.*"]
+          "cisco-5-304" : "^5-304.*"
+                    }
+        }
+                               ]
+    ...
     }
-                           ]
-...  
-}
-```
+    ```
 
 
 ### Assignment to `null`
diff --git a/metron-platform/metron-parsing/metron-parsers-common/parser_arch.png b/metron-platform/metron-parsing/parser_arch.png
similarity index 100%
rename from metron-platform/metron-parsing/metron-parsers-common/parser_arch.png
rename to metron-platform/metron-parsing/parser_arch.png
diff --git a/site-book/bin/generate-md.sh b/site-book/bin/generate-md.sh
index 60549f8..7ebb5f6 100755
--- a/site-book/bin/generate-md.sh
+++ b/site-book/bin/generate-md.sh
@@ -64,7 +64,7 @@ RESOURCE_LIST=(
     metron-deployment/readme-images/enable-kerberos-started.png
     metron-deployment/readme-images/enable-kerberos.png
     metron-platform/metron-job/metron-job_state_statechart_diagram.svg
-    metron-platform/metron-parsing/metron-parsers-common/parser_arch.png
+    metron-platform/metron-parsing/parser_arch.png
     metron-platform/metron-indexing/indexing_arch.png
     metron-platform/metron-enrichment/enrichment_arch.png
     metron-analytics/metron-maas-service/maas_arch.png
@@ -96,8 +96,8 @@ HREF_REWRITE_LIST=(
     metron-platform/metron-enrichment/README.md 's#(enrichment_arch.png)#(../../images/enrichment_arch.png)#g'
     metron-platform/metron-indexing/README.md 's#(indexing_arch.png)#(../../images/indexing_arch.png)#g'
     metron-platform/metron-job/README.md 's#(metron-job_state_statechart_diagram.svg)#(../../images/metron-job_state_statechart_diagram.svg)#g'
-    metron-platform/metron-parsing/metron-parsers-common/README.md 's#(parser_arch.png)#(../../images/parser_arch.png)#g'
-    metron-platform/metron-parsing/metron-parsers-common/ParserChaining.md 's#(../../use-cases/parser_chaining/message_routing_high_level.svg)#(../../images/message_routing_high_level.svg)#g'
+    metron-platform/metron-parsing/README.md 's#(parser_arch.png)#(../../images/parser_arch.png)#g'
+    metron-platform/metron-parsing/metron-parsers-common/ParserChaining.md 's#(../../../use-cases/parser_chaining/message_routing_high_level.svg)#(../../../images/message_routing_high_level.svg)#g'
     metron-analytics/metron-maas-service/README.md 's#(maas_arch.png)#(../../images/maas_arch.png)#g'
     metron-contrib/metron-performance/README.md 's#(performance_measurement.png)#(../../images/performance_measurement.png)#g'
     use-cases/forensic_clustering/README.md 's#(find_alerts.png)#(../../images/find_alerts.png)#g'


Mime
View raw message