X-Mailer: ASF-Git Admin Mailer Subject: [09/15] incubator-metron git commit: METRON-766: Release 0.3.1 closes apache/incubator-metron#477 archived-at: Fri, 17 Mar 2017 15:18:15 -0000 http://git-wip-us.apache.org/repos/asf/incubator-metron/blob/a055de44/current-book/metron-analytics/metron-statistics/index.html ---------------------------------------------------------------------- diff --git a/current-book/metron-analytics/metron-statistics/index.html b/current-book/metron-analytics/metron-statistics/index.html new file mode 100644 index 0000000..c167eaf --- /dev/null +++ b/current-book/metron-analytics/metron-statistics/index.html @@ -0,0 +1,916 @@ + + + + + + + + + Metron – Statistics and Mathematical Functions + + + + + + + + + + + + + + + + + +

+ + +

+ +

+ + +

User Documentation
+ + + + Metron +
- + + + + Upgrading +
- + + + + Analytics +
  - + + + + Maas-service +
  - + + + + Profiler +
  - + + + + Profiler-client +
  - + + Statistics +
    - + + + + HLLP +
    +
  +
- + + + + Deployment +
- + + + + Docker +
- + + + + Platform +
- + + + + Sensors +
+

+ + + +

+ +

+ + +

+ +

Statistics and Mathematical Functions

A variety of non-trivial and advanced analytics make use of statistics and advanced mathematical functions. Particular, capturing the statistical snapshots in a scalable way can open up doors for more advanced analytics such as outlier analysis. As such, this project is aimed at capturing a robust set of statistical functions and statistical-based algorithms in the form of Stellar functions. These functions can be used from everywhere where Stellar is used.

Stellar Functions

Approximation Statistics

`HLLP_ADD`

+ +

Description: Add value to the HyperLogLogPlus estimator set. See HLLP README
Input: + +
- hyperLogLogPlus - the hllp estimator to add a value to
- value+ - value to add to the set. Takes a single item or a list.
Returns: The HyperLogLogPlus set with a new value added

`HLLP_CARDINALITY`

+ +

Description: Returns HyperLogLogPlus-estimated cardinality for this set. See HLLP README
Input: + +
- hyperLogLogPlus - the hllp set
Returns: Long value representing the cardinality for this set

`HLLP_INIT`

+ +

Description: Initializes the HyperLogLogPlus estimator set. p must be a value between 4 and sp and sp must be less than 32 and greater than 4. See HLLP README
Input: + +
- p - the precision value for the normal set
- sp - the precision value for the sparse set. If p is set, but sp is 0 or not specified, the sparse set will be disabled.
Returns: A new HyperLogLogPlus set

`HLLP_MERGE`

+ +

Description: Merge hllp sets together. The resulting estimator is initialized with p and sp precision values from the first provided hllp estimator set. See HLLP README
Input: + +
- hllp - List of hllp estimators to merge. Takes a single hllp set or a list.
Returns: A new merged HyperLogLogPlus estimator set

Mathematical Functions

`ABS`

+ +

Description: Returns the absolute value of a number.
Input: + +
- number - The number to take the absolute value of
Returns: The absolute value of the number passed in.

`BIN`

+ +

Description: Computes the bin that the value is in given a set of bounds.
Input: + +
- value - The value to bin
- bounds - A list of value bounds (excluding min and max) in sorted order.
Returns: Which bin N the value falls in such that bound(N-1) < value <= bound(N). No min and max bounds are provided, so values smaller than the 0’th bound go in the 0’th bin, and values greater than the last bound go in the M’th bin.

Distributional Statistics

`STATS_ADD`

+ +

Description: Adds one or more input values to those that are used to calculate the summary statistics.
Input: + +
- stats - The Stellar statistics object. If null, then a new one is initialized.
- value+ - One or more numbers to add
Returns: A Stellar statistics object

`STATS_BIN`

+ +

Description: Computes the bin that the value is in based on the statistical distribution.
Input: + +
- stats - The Stellar statistics object
- value - The value to bin
- bounds? - A list of percentile bin bounds (excluding min and max) or a string representing a known and common set of bins. For convenience, we have provided QUARTILE, QUINTILE, and DECILE which you can pass in as a string arg. If this argument is omitted, then we assume a Quartile bin split.
Returns: "Which bin N the value falls in such that bound(N-1) < value <= bound(N). No min and max bounds are provided, so values smaller than the 0’th bound go in the 0’th bin, and values greater than the last bound go in the M’th bin.

`STATS_COUNT`

+ +

Description: Calculates the count of the values accumulated (or in the window if a window is used).
Input: + +
- stats - The Stellar statistics object
Returns: The count of the values in the window or NaN if the statistics object is null.

`STATS_GEOMETRIC_MEAN`

+ +

Description: Calculates the geometric mean of the accumulated values (or in the window if a window is used). See http://commons.apache.org/proper/commons-math/userguide/stat.html#a1.2_Descriptive_statistics
Input: + +
- stats - The Stellar statistics object
Returns: The geometric mean of the values in the window or NaN if the statistics object is null.

`STATS_INIT`

+ +

Description: Initializes a statistics object
Input: + +
- window_size - The number of input data values to maintain in a rolling window in memory. If window_size is equal to 0, then no rolling window is maintained. Using no rolling window is less memory intensive, but cannot calculate certain statistics like percentiles and kurtosis.
Returns: A Stellar statistics object

`STATS_KURTOSIS`

+ +

Description: Calculates the kurtosis of the accumulated values (or in the window if a window is used). See http://commons.apache.org/proper/commons-math/userguide/stat.html#a1.2_Descriptive_statistics
Input: + +
- stats - The Stellar statistics object
Returns: The kurtosis of the values in the window or NaN if the statistics object is null.

`STATS_MAX`

+ +

Description: Calculates the maximum of the accumulated values (or in the window if a window is used).
Input: + +
- stats - The Stellar statistics object
Returns: The maximum of the accumulated values in the window or NaN if the statistics object is null.

`STATS_MEAN`

+ +

Description: Calculates the mean of the accumulated values (or in the window if a window is used).
Input: + +
- stats - The Stellar statistics object
Returns: The mean of the values in the window or NaN if the statistics object is null.

`STATS_MERGE`

+ +

Description: Merges statistics objects.
Input: + +
- statistics - A list of statistics objects
Returns: A Stellar statistics object

`STATS_MIN`

+ +

Description: Calculates the minimum of the accumulated values (or in the window if a window is used).
Input: + +
- stats - The Stellar statistics object
Returns: The minimum of the accumulated values in the window or NaN if the statistics object is null.

`STATS_PERCENTILE`

+ +

Description: Computes the p’th percentile of the accumulated values (or in the window if a window is used).
Input: + +
- stats - The Stellar statistics object
- p - a double where 0 <= p < 1 representing the percentile
Returns: The p’th percentile of the data or NaN if the statistics object is null

`STATS_POPULATION_VARIANCE`

+ +

Description: Calculates the population variance of the accumulated values (or in the window if a window is used). See http://commons.apache.org/proper/commons-math/userguide/stat.html#a1.2_Descriptive_statistics
Input: + +
- stats - The Stellar statistics object
Returns: The population variance of the values in the window or NaN if the statistics object is null.

`STATS_QUADRATIC_MEAN`

+ +

Description: Calculates the quadratic mean of the accumulated values (or in the window if a window is used). See http://commons.apache.org/proper/commons-math/userguide/stat.html#a1.2_Descriptive_statistics
Input: + +
- stats - The Stellar statistics object
Returns: The quadratic mean of the values in the window or NaN if the statistics object is null.

`STATS_SD`

+ +

Description: Calculates the standard deviation of the accumulated values (or in the window if a window is used). See http://commons.apache.org/proper/commons-math/userguide/stat.html#a1.2_Descriptive_statistics
Input: + +
- stats - The Stellar statistics object
Returns: The standard deviation of the values in the window or NaN if the statistics object is null.

`STATS_SKEWNESS`

+ +

Description: Calculates the skewness of the accumulated values (or in the window if a window is used). See http://commons.apache.org/proper/commons-math/userguide/stat.html#a1.2_Descriptive_statistics
Input: + +
- stats - The Stellar statistics object
Returns: The skewness of the values in the window or NaN if the statistics object is null.

`STATS_SUM`

+ +

Description: Calculates the sum of the accumulated values (or in the window if a window is used).
Input: + +
- stats - The Stellar statistics object
Returns: The sum of the values in the window or NaN if the statistics object is null.

`STATS_SUM_LOGS`

+ +

Description: Calculates the sum of the (natural) log of the accumulated values (or in the window if a window is used). See http://commons.apache.org/proper/commons-math/userguide/stat.html#a1.2_Descriptive_statistics
Input: + +
- stats - The Stellar statistics object
Returns: The sum of the (natural) log of the values in the window or NaN if the statistics object is null.

`STATS_SUM_SQUARES`

+ +

Description: Calculates the sum of the squares of the accumulated values (or in the window if a window is used).
Input: + +
- stats - The Stellar statistics object
Returns: The sum of the squares of the values in the window or NaN if the statistics object is null.

`STATS_VARIANCE`

+ +

Description: Calculates the variance of the accumulated values (or in the window if a window is used). See http://commons.apache.org/proper/commons-math/userguide/stat.html#a1.2_Descriptive_statistics
Input: + +
- stats - The Stellar statistics object
Returns: The variance of the values in the window or NaN if the statistics object is null.

Statistical Outlier Detection

`OUTLIER_MAD_STATE_MERGE`

+ +

Description: Update the statistical state required to compute the Median Absolute Deviation.
Input: + +
- [state] - A list of Median Absolute Deviation States to merge. Generally these are states across time.
- currentState? - The current state (optional)
Returns: The Median Absolute Deviation state

`OUTLIER_MAD_ADD`

+ +

Description: Add a piece of data to the state.
Input: + +
- state - The MAD state
- value - The numeric value to add
Returns: The MAD state

`OUTLIER_MAD_SCORE`

+ +

Description: Get the modified z-score normalized by the MAD: scale * | x_i - median(X) | / MAD. See the first page of http://web.ipac.caltech.edu/staff/fmasci/home/astro_refs/BetterThanMAD.pdf
Input: + +
- state - The MAD state
- value - The numeric value to score
- scale? - Optionally the scale to use when computing the modified z-score. Default is 0.6745, see the first page of http://web.ipac.caltech.edu/staff/fmasci/home/astro_refs/BetterThanMAD.pdf
Returns: The modified z-score

Outlier Analysis

A common desire is to find anomalies in numerical data. To that end, we have some simple statistical anomaly detectors.

Median Absolute Deviation

Much has been written about this robust estimator. See the first page of http://web.ipac.caltech.edu/staff/fmasci/home/astro_refs/BetterThanMAD.pdf for a good coverage of the good and the bad of MAD. The usage, however is fairly straightforward:

+ +

Gather the statistical state required to compute the MAD + +
- The distribution of the values of a univariate random variable over time.
- The distribution of the absolute deviations of the values from the median.
Use this statistical state to score unseen values. The higher the score, the more unlike the previously seen data the value is.

There are a couple of issues which make MAD a bit hard to compute. First, the statistical state requires computing median, which can be computationally expensive to compute exactly. To get around this, we use the OnlineStatisticalProvider to compute a sketch rather than the exact median. Secondly, the statistical state for seasonal data should be limited to a fixed, trailing window. We do this by ensuring that the MAD state is mergeable and able to be queried from within the Profiler.

Example

We will create a dummy data stream of gaussian noise to illustrate how to use the MAD functionality along with the profiler to tag messages as outliers or not.

To do this, we will create a

+ +

data generator
parser
profiler profile
enrichment and threat triage

Data Generator

We can create a simple python script to generate a stream of gaussian noise at the frequency of one message per second as a python script which should be saved at ~/rand_gen.py:

+ +

#!/usr/bin/python
+import random
+import sys
+import time
+def main():
+  mu = float(sys.argv[1])
+  sigma = float(sys.argv[2])
+  freq_s = int(sys.argv[3])
+  while True:
+    print str(random.gauss(mu, sigma))
+    sys.stdout.flush()
+    time.sleep(freq_s)
+
+if __name__ == '__main__':
+  main()
+

This script will take the following as arguments:

+ +

The mean of the data generated
The standard deviation of the data generated
The frequency (in seconds) of the data generated

If, however, you’d like to test a longer tailed distribution, like the student t-distribution and have numpy installed, you can use the following as ~/rand_gen.py:

+ +

#!/usr/bin/python
+import random
+import sys
+import time
+import numpy as np
+
+def main():
+  df = float(sys.argv[1])
+  freq_s = int(sys.argv[2])
+  while True:
+    print str(np.random.standard_t(df))
+    sys.stdout.flush()
+    time.sleep(freq_s)
+
+if __name__ == '__main__':
+  main()
+

This script will take the following as arguments:

+ +

The degrees of freedom for the distribution
The frequency (in seconds) of the data generated

The Parser

We will create a parser that will take the single numbers in and create a message with a field called value in them using the CSVParser.

Add the following file to $METRON_HOME/config/zookeeper/parsers/mad.json:

+ +

{
+  "parserClassName" : "org.apache.metron.parsers.csv.CSVParser"
+ ,"sensorTopic" : "mad"
+ ,"parserConfig" : {
+    "columns" : {
+      "value_str" : 0
+                }
+                   }
+ ,"fieldTransformations" : [
+    {
+    "transformation" : "STELLAR"
+   ,"output" : [ "value" ]
+   ,"config" : {
+      "value" : "TO_DOUBLE(value_str)"
+               }
+    }
+                           ]
+}
+

Enrichment and Threat Intel

We will set a threat triage level of 10 if a message generates a outlier score of more than 3.5. This cutoff will depend on your data and should be adjusted based on the assumed underlying distribution. Note that under the assumptions of normality, MAD will act as a robust estimator of the standard deviation, so the cutoff should be considered the number of standard deviations away. For other distributions, there are other interpretations which will make sense in the context of measuring the “degree different”. See http://eurekastatistics.com/using-the-median-absolute-deviation-to-find-outliers/ for a brief discussion of this.

Create the following in $METRON_HOME/config/zookeeper/enrichments/mad.json:

+ +

{
+  "index": "mad",
+  "batchSize": 1,
+  "enrichment": {
+    "fieldMap": {
+      "stellar" : {
+        "config" : {
+          "parser_score" : "OUTLIER_MAD_SCORE(OUTLIER_MAD_STATE_MERGE(
+PROFILE_GET( 'sketchy_mad', 'global', PROFILE_FIXED(10, 'MINUTES')) ), value)"
+         ,"is_alert" : "if parser_score > 3.5 then true else is_alert"
+        }
+      }
+    }
+  ,"fieldToTypeMap": { }
+  },
+  "threatIntel": {
+    "fieldMap": { },
+    "fieldToTypeMap": { },
+    "triageConfig" : {
+      "riskLevelRules" : [
+        {
+          "rule" : "parser_score > 3.5",
+          "score" : 10
+        }
+      ],
+      "aggregator" : "MAX"
+    }
+  }
+}
+

The Profiler

We can set up the profiler to track the MAD statistical state required to compute MAD. For the purposes of this demonstration, we will configure the profiler to capture statistics on the minute mark. We will capture a global statistical state for the value field and we will look back for a 5 minute window when computing the median.

Create the following file at $METRON_HOME/config/zookeeper/profiler.json:

+ +

{
+  "profiles": [
+    {
+      "profile": "sketchy_mad",
+      "foreach": "'global'",
+      "onlyif": "true",
+      "init" : {
+        "s": "OUTLIER_MAD_STATE_MERGE(PROFILE_GET('sketchy_mad',
+'global', PROFILE_FIXED(5, 'MINUTES')))"
+               },
+      "update": {
+        "s": "OUTLIER_MAD_ADD(s, value)"
+                },
+      "result": "s"
+    }
+  ]
+}
+

Adjust $METRON_HOME/config/zookeeper/global.json to adjust the capture duration:

+ +

 "profiler.client.period.duration" : "1",
+ "profiler.client.period.duration.units" : "MINUTES"
+

Adjust $METRON_HOME/config/profiler.properties to adjust the capture duration by changing profiler.period.duration=15 to profiler.period.duration=1

Execute the Flow

+ +

+
Install the elasticsearch head plugin by executing: /usr/share/elasticsearch/bin/plugin install mobz/elasticsearch-head
+
Stopping all other parser topologies via monit
+
Create the mad kafka topic by executing: /usr/hdp/current/kafka-broker/bin/kafka-topics.sh --zookeeper node1:2181 --create --topic mad --partitions 1 --replication-factor 1
+
Push the modified configs by executing: $METRON_HOME/bin/zk_load_configs.sh --mode PUSH -z node1:2181 -i $METRON_HOME/config/zookeeper/
+
Start the profiler by executing: $METRON_HOME/bin/start_profiler_topology.sh
+
Start the parser topology by executing: $METRON_HOME/bin/start_parser_topology.sh -k node1:6667 -z node1:2181 -s mad
+
Ensure that the enrichment and indexing topologies are started. If not, then start those via monit or by hand.
+
Generate data into kafka by executing the following for at least 10 minutes: ~/rand_gen.py 0 1 1 | /usr/hdp/current/kafka-broker/bin/kafka-console-producer.sh --broker-list node1:6667 --topic mad Note: if you chose the use the t-distribution script above, you would adjust the parameters of the rand_gen.py script accordingly.
+
Stop the above with ctrl-c and send in an obvious outlier into kafka: echo "1000" | /usr/hdp/current/kafka-broker/bin/kafka-console-producer.sh --broker-list node1:6667 --topic mad

You should be able to find the outlier via the elasticsearch head plugin by searching for the messages where is_alert is true.

+ +

+ + + + http://git-wip-us.apache.org/repos/asf/incubator-metron/blob/a055de44/current-book/metron-deployment/amazon-ec2/index.html ---------------------------------------------------------------------- diff --git a/current-book/metron-deployment/amazon-ec2/index.html b/current-book/metron-deployment/amazon-ec2/index.html new file mode 100644 index 0000000..8eab79a --- /dev/null +++ b/current-book/metron-deployment/amazon-ec2/index.html @@ -0,0 +1,521 @@ + + + + + + + + + Metron – Apache Metron on Amazon EC2 + + + + + + + + + + + + + + + + + +

+ + +

+ +

+ + +

User Documentation
+ + + + Metron +
- + + + + Upgrading +
- + + + + Analytics +
- + + + + Deployment +
  - + + Amazon-ec2 +
  - + + + + Ansible-docker +
  - + + + + Rpm-docker +
  - + + + + Packer-build +
  - + + + + Roles +
  - + + + + Vagrant +
  +
- + + + + Docker +
- + + + + Platform +
- + + + + Sensors +
+

+ + + +

+ +

+ + +

+ +

Apache Metron on Amazon EC2

This project fully automates the provisioning of Apache Metron on Amazon EC2 infrastructure. Starting with only your Amazon EC2 credentials, this project will create a fully-functioning, end-to-end, multi-node cluster running Apache Metron.

Warning: Amazon will charge for the use of their resources when running Apache Metron. The amount will vary based on the number and size of hosts, along with current Amazon pricing structure. Be sure to stop or terminate all of the hosts instantiated by Apache Metron when not in use to avoid unnecessary charges.

Getting Started

Prerequisites

The host used to deploy Apache Metron will need the following software tools installed. The following versions are known to work as of the time of this writing, but by no means are these the only working versions.

+ +

Ansible 2.0.0.2
Python 2.7.11
Maven 3.3.9

Any platform that supports these tools is suitable, but the following instructions cover only macOS. The easiest means of installing these tools on a Mac is to use the excellent Homebrew project.

+ +

+
Install Homebrew by running the following command in a terminal. Refer to the Homebrew home page for the latest installation instructions.
+ +
+
+
```
  /usr/bin/ruby -e "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install)"
+
```

With Homebrew installed, run the following command in a terminal to install all of the required tools.

+ +

  brew cask install java
+  brew install maven git
+  brew install https://raw.githubusercontent.com/Homebrew/homebrew-core/ee1273bf919a5e4e50838513a9e55ea423e1d7ce/Formula/ansible.rb
+  brew switch ansible 2.0.0.2
+

Ensure that a public SSH key is located at ~/.ssh/id_rsa.pub.

+ +

  $ cat ~/.ssh/id_rsa.pub
+  ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQChv5GJxPjR39UJV7VY17ivbLVlxFrH7UHwh1Jsjem4d1eYiAtde5N2y65/HRNxWbhYli9ED8k0/MRP92ejewucEbrPNq5mytPqdC4IvZ98Ln2GbqTDwvlP3T7xa/wYFOpFsOmXXql8216wSrnrS4f3XK7ze34S6/VmY+lsBYnr3dzyj8sG/mexpJgFS/w83mWJV0e/ryf4Hd7P6DZ5fO+nmTXfKNK22ga4ctcnbZ+toYcPL+ODCh8598XCKVo97XjwF5OxN3vl1p1HHguo3cHB4H1OIaqX5mUt59gFIZcAXUME89PO6NUiZDd3RTstpf125nQVkQAHu2fvW96/f037 nick@localhost
+

If this file does not exist, run the following command at a terminal and accept all defaults. Only the public key, not the private key, will be uploaded to Amazon and configured on each host to enable SSH connectivity. While it is possible to create and use an alternative key those details will not be covered.

+ +

  ssh-keygen -t rsa
+

Amazon Web Services

If you already have an Amazon Web Services account that you have used to deploy EC2 hosts, then you should be able to skip the next few steps.

+ +

+
Head over to Amazon Web Services and create an account. As part of the account creation process you will need to provide a credit card to cover any charges that may apply.
+
Create a set of user credentials through Amazon’s Identity and Access Management (IAM) dashboard. On the IAM dashboard menu click “Users” and then “Create New User”. Provide a name and ensure that “Generate an access key for each user” remains checked. Download the credentials and keep them for later use.
+
While still in Amazon’s Identity and Access Management (IAM) dashboard, click on the user that was previously created. Click the “Permissions” tab and then the “Attach Policy” button. Attach the following policies to the user.
+ +
- AmazonEC2FullAccess
- AmazonVPCFullAccess
+
Apache Metron uses the official, open source CentOS 6 Amazon Machine Image (AMI). If you have never used this AMI before then you will need to accept Amazon’s terms and conditions. Navigate to the web page for this AMI and click the “Continue” button. Choose the “Manual Launch” tab then click the “Accept Software Terms” button.

Having successfully created your Amazon Web Services account, hopefully you will find that the most difficult tasks are behind us.

Deploy Metron

+ +

+
Use the Amazon access key by exporting its values via the shell’s environment. This allows Ansible to authenticate with Amazon EC2. For example:
+ +
+
+
```
  export AWS_ACCESS_KEY_ID="AKIAI6NRFEO27E5FFELQ"
+  export AWS_SECRET_ACCESS_KEY="vTDydWJQnAer7OWauUS150i+9Np7hfCXrrVVP6ed"
+
```
+
Notice: You must replace the access key values above with values from your own access key.
+
Start the Apache Metron deployment process. When prompted provide a unique name for your Metron environment or accept the default.
+ +
+
+
```
  $ ./run.sh
+  Metron Environment [metron-test]: my-metron-env
+  ...
+
```
+
The process is likely to take between 70-90 minutes. Fortunately, everything is fully automated and you should feel free to grab a coffee.

Explore Metron

+ +

After the deployment has completed successfully, a message like the following will be displayed. Navigate to the specified resources to explore your newly minted Apache Metron environment.

+ +

  TASK [debug] *******************************************************************
+  ok: [localhost] => {
+  "Success": [
+      "Apache Metron deployed successfully",
+      "   Metron  @  http://ec2-52-37-255-142.us-west-2.compute.amazonaws.com:5000",
+      "   Ambari  @  http://ec2-52-37-225-202.us-west-2.compute.amazonaws.com:8080",
+      "   Sensors @  ec2-52-37-225-202.us-west-2.compute.amazonaws.com on tap0",
+      "For additional information, see https://metron.incubator.apache.org/'"
+  ]
+  }
+

+
Each of the provisioned hosts will be accessible from the internet. Connecting to one over SSH as the user centos will not require a password as it will authenticate with the pre-defined SSH key.
+ +
+
+
```
  ssh centos@ec2-52-91-215-174.compute-1.amazonaws.com
+
```

Advanced Usage

Multiple Environments

This process can support provisioning of multiple, isolated environments. Simply change the env settings in conf/defaults.yml. For example, you might provision separate development, test, and production environments.

+ +

env: metron-test
+

Selective Provisioning

To provision only subsets of the entire Metron deployment, Ansible tags can be specified. For example, to only deploy the sensors on an Amazon EC2 environment, run the following command.

+ +

ansible-playbook -i ec2.py playbook.yml --tags "ec2,sensors"
+

Custom SSH Key

By default, the playbook will attempt to register your public SSH key ~/.ssh/id_rsa.pub with each provisioned host. This enables Ansible to communicate with each host using an SSH connection. If would prefer to use another key simply add the path to the public key file to the key_file property in conf/defaults.yml.

For example, generate a new SSH key for Metron that will be stored at ~/.ssh/my-metron-key.

+ +

$ ssh-keygen -q -f ~/.ssh/my-metron-key
+Enter passphrase (empty for no passphrase):
+Enter same passphrase again:
+

Add the path to the newly created SSH public key to conf/defaults.yml.

+ +

key_file: ~/.ssh/metron-private-key.pub
+

Common Errors

Error: [unsupported_operation_exception] custom format isn’t supported

This error might be seen within Metron’s default dashboard in Kibana 4. This occurs when the index templates do not exist for the Snort, Bro or YAF indices in Elasticsearch.

The dashboard expects fields to be of a certain type. If the index templates have not been loaded correctly, the data types for the fields in these indices will be incorrect and the dashboard will display this error.

Solution

If you see this error, please report your findings by creating a JIRA or dropping an email to the Metron Users mailing list. Follow these steps to work around the problem.

(1) Define which Elasticsearch host to interact with. Any Elasticsearch host should work.

+ +

export ES_HOST="http://ec2-52-25-237-20.us-west-2.compute.amazonaws.com:9200"
+

(2) Confirm the index templates are in fact missing.

+ +

curl -s -XGET $ES_HOST/_template
+

(3) Manually load the index templates.

+ +

cd metron-deployment
+curl -s -XPOST $ES_HOST/_template/bro_index -d @roles/metron_elasticsearch_templates/files/es_templates/bro_index.template
+curl -s -XPOST $ES_HOST/_template/snort_index -d @roles/metron_elasticsearch_templates/files/es_templates/snort_index.template
+curl -s -XPOST $ES_HOST/_template/yaf_index -d @roles/metron_elasticsearch_templates/files/es_templates/yaf_index.template
+

(4) Delete the existing indexes. Only a new index will use the templates defined in the previous step.

+ +

curl -s -XDELETE "$ES_HOST/yaf_index*"
+curl -s -XDELETE "$ES_HOST/bro_index*"
+curl -s -XDELETE "$ES_HOST/snort_index*"
+

(5) Open up Kibana and wait for the new indexes to be created. The dashboard should now work.

Error: ‘No handler was ready to authenticate…Check your credentials’

+ +

TASK [Define keypair] **********************************************************
+failed: [localhost] => (item=ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDXbcb1AlWsEPP
+  r9jEFrn0yun3PYNidJ/...david@hasselhoff.com) => {"failed": true, "item": "ssh-r
+  sa AAAAB3NzaC1yc2EAAAADAQABAAABAQDXbcb1AlWsEPPr9jEFr... david@hasselhoff.com",
+  "msg": "No handler was ready to authenticate. 1 handlers were checked.
+  ['HmacAuthV4Handler'] Check your credentials"}
+

Solution 1

This occurs when Ansible does not have the correct AWS access keys. The following commands must return a valid access key that is defined within Amazon’s Identity and Access Management console.

+ +

$ echo $AWS_ACCESS_KEY_ID
+AKIAI6NRFEO27E5FFELQ
+
+$ echo $AWS_SECRET_ACCESS_KEY
+vTDydWJQnAer7OWauUS150i+9Np7hfCXrrVVP6ed
+

Solution 2

This error can occur if you have exported the correct AWS access key, but you are using sudo to run the Ansible playbook. Do not use the sudo command when running the Ansible playbook.

Error: ‘OptInRequired: … you need to accept terms and subscribe’

+ +

TASK [metron-test: Instantiate 1 host(s) as sensors,ambari_master,metron,ec2] **
+fatal: [localhost]: FAILED! => {"changed": false, "failed": true, "msg":
+"Instance creation failed => OptInRequired: In order to use this AWS Marketplace
+product you need to accept terms and subscribe. To do so please visit
+http://aws.amazon.com/marketplace/pp?sku=6x5jmcajty9edm3f211pqjfn2"}
+to retry, use: --limit @playbook.retry
+

Solution

Apache Metron uses the official CentOS 6 Amazon Machine Image when provisioning hosts. Amazon requires that you accept certain terms and conditions when using any Amazon Machine Image (AMI). Follow the link provided in the error message to accept the terms and conditions then re-run the playbook.

Error: ‘PendingVerification: Your account is currently being verified’

+ +

TASK [metron-test: Instantiate 1 host(s) as sensors,ambari_master,metron,ec2] **
+fatal: [localhost]: FAILED! => {"changed": false, "failed": true, "msg":
+"Instance creation failed => PendingVerification: Your account is currently
+being verified. Verification normally takes less than 2 hours. Until your
+account is verified, you may not be able to launch additional instances or
+create additional volumes. If you are still receiving this message after more
+than 2 hours, please let us know by writing to aws-verification@amazon.com. We
+appreciate your patience."}
+to retry, use: --limit @playbook.retry
+

Solution

This will occur if you are attempting to deploy Apache Metron using a newly created Amazon Web Services account. Follow the advice of the message and wait until Amazon’s verification process is complete. Amazon has some additional advice for dealing with this error and more.

+ +

+
Your account is pending verification. Until the verification process is complete, you may not be able to carry out requests with this account. If you have questions, contact AWS Support.
+

Error: ‘Instance creation failed => InstanceLimitExceeded’

+ +

TASK [metron-test: Instantiate 3 host(s) as search,metron,ec2] *****************
+fatal: [localhost]: FAILED! => {"changed": false, "failed": true, "msg":
+"Instance creation failed => InstanceLimitExceeded: You have requested more
+instances (11) than your current instance limit of 10 allows for the specified
+instance type. Please visit http://aws.amazon.com/contact-us/ec2-request to
+request an adjustment to this limit."}
+to retry, use: --limit @playbook.retry
+

Solution

This will occur if Apache Metron attempts to deploy more host instances than allowed by your account. The total number of instances required for Apache Metron can be reduced by editing deployment/amazon-ec/playbook.yml. Perhaps a better alternative is to request of Amazon that this limit be increased. Amazon has some additional advice for dealing with this error and more.

+ +

+
You’ve reached the limit on the number of instances you can run concurrently. The limit depends on the instance type. For more information, see How many instances can I run in Amazon EC2. If you need additional instances, complete the Amazon EC2 Instance Request Form.
+

Error: ‘SSH encountered an unknown error during the connection’

+ +

TASK [setup] *******************************************************************
+fatal: [ec2-52-26-113-221.us-west-2.compute.amazonaws.com]: UNREACHABLE! => {
+  "changed": false, "msg": "SSH encountered an unknown error during the
+  connection. We recommend you re-run the command using -vvvv, which will enable
+  SSH debugging output to help diagnose the issue", "unreachable": true}
+

Solution

This most often indicates that Ansible cannot connect to the host with the SSH key that it has access to. This could occur if hosts are provisioned with one SSH key, but the playbook is executed subsequently with a different SSH key. The issue can be addressed by either altering the key_file variable to point to the key that was used to provision the hosts or by simply terminating all hosts and re-running the playbook.

+ +

+ + + + http://git-wip-us.apache.org/repos/asf/incubator-metron/blob/a055de44/current-book/metron-deployment/index.html ---------------------------------------------------------------------- diff --git a/current-book/metron-deployment/index.html b/current-book/metron-deployment/index.html new file mode 100644 index 0000000..43e6678 --- /dev/null +++ b/current-book/metron-deployment/index.html @@ -0,0 +1,434 @@ + + + + + + + + + Metron – Overview + + + + + + + + + + + + + + + + + +

+ + +

+ +

+ + +

User Documentation
+ + + + Metron +
- + + + + Upgrading +
- + + + + Analytics +
- + + Deployment +
  - + + + + Amazon-ec2 +
  - + + + + Ansible-docker +
  - + + + + Rpm-docker +
  - + + + + Packer-build +
  - + + + + Roles +
  - + + + + Vagrant +
  +
- + + + + Docker +
- + + + + Platform +
- + + + + Sensors +
+

+ + + +

+ +

+ + +

+ +

Overview

This set of playbooks can be used to deploy an Ambari-managed Hadoop cluster, Metron services, or both using ansible playbooks. These playbooks currently only target RHEL/CentOS 6.x operating systems.

In addition, an Ambari Management Pack can be built which can be deployed in conjuction with RPMs detailed in this README.

Prerequisites

The following tools are required to run these scripts:

+ +

Maven
Git
Ansible (version 2.0 or greater)

Currently Metron must be built from source. Before running these scripts perform the following steps:

+ +

Clone the Metron git repository with git clone git@github.com:apache/incubator-metron.git
Navigate to incubator-metron and run mvn clean package

These scripts depend on two files for configuration:

+ +

hosts - declares which Ansible roles will be run on which hosts
group_vars/all - various configuration settings needed to install Metron

Examples can be found in the incubator-metron/metron-deployment/inventory/metron_example directory and are a good starting point. Copy this directory into incubator-metron/metron-deployment/inventory/ and rename it to your project_name. More information about Ansible files and directory structure can be found at http://docs.ansible.com/ansible/playbooks_best_practices.html.

Ambari

The Ambari playbook will install a Hadoop cluster with all the services and configuration required by Metron. This section can be skipped if installing Metron on a pre-existing cluster.

Currently, this playbook supports building a local development cluster running on one node but options for other types of clusters will be added in the future.

Setting up your inventory

Make sure to update the hosts file in incubator-metron/metron-deployment/inventory/project_name/hosts or provide an alternate inventory file when you launch the playbooks, including the ssh user(s) and ssh keyfile location(s). These playbooks expect two host groups:

+ +

ambari_master
ambari_slaves

Running the playbook

This playbook will install the Ambari server on the ambari_master, install the ambari agents on the ambari_slaves, and create a cluster in Ambari with a blueprint for the required Metron components.

Navigate to incubator-metron/metron-deployment/playbooks and run: ansible-playbook -i ../inventory/project_name ambari_install.yml

Metron

The Metron playbook will gather the necessary cluster settings from Ambari and install the Metron services.

Setting up your inventory

Edit the hosts file at incubator-metron/metron-deployment/inventory/project_name/hosts. Declare where which hosts the Metron services will be installed on by updating these groups:

+ +

enrichment - submits the topology code to Storm and requires a storm client
search - host where Elasticsearch will be run
web - host where the Metron UI and underlying services will run
sensors - host where network data will be collected and published to Kafka

The Metron topologies depend on Kafka topics and HBase tables being created beforehand. Declare a host that has Kafka and HBase clients installed by updating these groups:

+ +

metron_kafka_topics
metron_hbase_tables

If only installing Metron, these groups can be ignored:

+ +

ambari_master
ambari_slaves

Configuring group variables

The Metron Ansible scripts depend on a set of variables. These variables can be found in the file at incubator-metron/metron-deployment/inventory/project_name/group_vars/all. Edit the ambari* variables to match your Ambari instance and update the java_home variable to match the java path on your hosts.

Running the playbook

Navigate to incubator-metron/metron-deployment/playbooks and run: ansible-playbook -i ../inventory/project_name metron_install.yml

Vagrant

A VagrantFile is included and will install a working version of the entire Metron stack. The following is required to run this:

+ +

Vagrant
Hostmanager plugin for vagrant - Run vagrant plugin install vagrant-hostmanager on the machine where Vagrant is installed

Navigate to incubator-metron/metron-deployment/vagrant/full-dev-platform and run vagrant up. This also provides a good example of how to run a full end-to-end Metron install.

Ambari Management Pack

An Ambari Management Pack can be built in order to make the Metron service available on top of an existing stack, rather than needing a direct stack update.

This will set up

+ +

Metron Parsers
Enrichment
Indexing
GeoIP data
Optional Elasticsearch
Optional Kibana

Prerequisites

+ +

A cluster managed by Ambari 2.4
Metron RPMs available on the cluster in the /localrepo directory. See RPM for further information.

Building Management Pack

From metron-deployment run

+ +

mvn clean package
+

A tar.gz that can be used with Ambari can be found at metron-deployment/packaging/ambari/metron-mpack/target/

Installing Management Pack

Before installing the mpack, update Storm’s topology.classpath in Ambari to include ‘/etc/hbase/conf:/etc/hadoop/conf’. Restart Storm service.

Place the mpack’s tar.gz onto the node running Ambari Server. From the command line on this node, run

+ +

ambari-server install-mpack --mpack=<mpack_location> --verbose
+

This will make the services available in Ambari in the same manner as any services in a stack, e.g. through Add Services or during cluster install. The Indexing / Parsers/ Enrichment masters should be colocated with a Kafka Broker (to create topics) and HBase client (to create the enrichment and theatintel tables). This colocation is currently not enforced by Ambari, and should be managed by either a Service or Stack advisor as an enhancement.

Several configuration parameters will need to be filled in, and should be pretty self explanatory (primarily a couple of Elasticsearch configs, and the Storm REST URL). Examples are provided in the descriptions on Ambari. Notably, the URL for the GeoIP database that is preloaded (and is prefilled by default) can be set to use a file:/// location

After installation, a custom action is available in Ambari (where stop / start services are) to install Elasticsearch templates. Similar to this, a custom Kibana action to Load Template is available.

Another custom action is available in Ambari to import Zeppelin dashboards. See the metron-indexing documentation

Offline installation

Currently there is only one point that would reach out to the internet during an install. This is the URL for the GeoIP database information.

The RPMs DO NOT reach out to the internet (because there is currently no hosting for them). They look on the local filesystem in /localrepo.

Current Limitations

There are a set of limitations that should be addressed based to improve the current state of the mpacks.

+ +

There is currently no hosting for RPMs remotely. They will have to be built locally.
Colocation of appropriate services should be enforced by Ambari. See [#Installing Management Pack] for more details.
Storm’s topology.classpath is not updated with the Metron service install and needs to be updated separately.
Several configuration parameters used when installing the Metron service could (and should) be grabbed from Ambari. Install will require them to be manually entered.
Need to handle upgrading Metron

RPM

RPMs can be built to install the components in metron-platform. These RPMs are built in a Docker container and placed into target.

Components in the RPMs:

+ +

metron-common
metron-data-management
metron-elasticsearch
metron-enrichment
metron-parsers
metron-pcap
metron-solr

Prerequisites

+ +

Docker. The image detailed in: metron-deployment/packaging/docker/rpm-docker/README.md will automatically be built (or rebuilt if necessary).
Artifacts for metron-platform have been produced. E.g. mvn clean package -DskipTests in metron-platform

Building RPMs

From metron-deployment run

+ +

mvn clean package -Pbuild-rpms
+

The output RPM files will land in target/RPMS/noarch. They can be installed with the standard

+ +

rpm -i <package>
+

TODO

+ +

Support Ubuntu deployments

+ +

+ + + +