Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 07984200C4F for ; Fri, 17 Mar 2017 16:18:15 +0100 (CET) Received: by cust-asf.ponee.io (Postfix) id 05C4B160B8C; Fri, 17 Mar 2017 15:18:15 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id B2E22160B6D for ; Fri, 17 Mar 2017 16:18:12 +0100 (CET) Received: (qmail 36361 invoked by uid 500); 17 Mar 2017 15:18:10 -0000 Mailing-List: contact commits-help@metron.incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@metron.incubator.apache.org Delivered-To: mailing list commits@metron.incubator.apache.org Received: (qmail 36352 invoked by uid 99); 17 Mar 2017 15:18:10 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd1-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 17 Mar 2017 15:18:10 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd1-us-west.apache.org (ASF Mail Server at spamd1-us-west.apache.org) with ESMTP id A55FDC0C65 for ; Fri, 17 Mar 2017 15:18:09 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd1-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -3.068 X-Spam-Level: X-Spam-Status: No, score=-3.068 tagged_above=-999 required=6.31 tests=[KAM_ASCII_DIVIDERS=0.8, KAM_NUMSUBJECT=0.5, RCVD_IN_DNSWL_HI=-5, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, RP_MATCHES_RCVD=-0.001, SPF_NEUTRAL=0.652, TVD_PH_BODY_ACCOUNTS_PRE=0.001] autolearn=disabled Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd1-us-west.apache.org [10.40.0.7]) (amavisd-new, port 10024) with ESMTP id aMZI4PuAjHmQ for ; Fri, 17 Mar 2017 15:18:04 +0000 (UTC) Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with SMTP id D2E565F613 for ; Fri, 17 Mar 2017 15:18:02 +0000 (UTC) Received: (qmail 34499 invoked by uid 99); 17 Mar 2017 14:51:22 -0000 Received: from git1-us-west.apache.org (HELO git1-us-west.apache.org) (140.211.11.23) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 17 Mar 2017 14:51:22 +0000 Received: by git1-us-west.apache.org (ASF Mail Server at git1-us-west.apache.org, from userid 33) id 5D525F3222; Fri, 17 Mar 2017 14:51:22 +0000 (UTC) Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit From: cestella@apache.org To: commits@metron.incubator.apache.org Date: Fri, 17 Mar 2017 14:51:30 -0000 Message-Id: In-Reply-To: References: X-Mailer: ASF-Git Admin Mailer Subject: [09/15] incubator-metron git commit: METRON-766: Release 0.3.1 closes apache/incubator-metron#477 archived-at: Fri, 17 Mar 2017 15:18:15 -0000 http://git-wip-us.apache.org/repos/asf/incubator-metron/blob/a055de44/current-book/metron-analytics/metron-statistics/index.html ---------------------------------------------------------------------- diff --git a/current-book/metron-analytics/metron-statistics/index.html b/current-book/metron-analytics/metron-statistics/index.html new file mode 100644 index 0000000..c167eaf --- /dev/null +++ b/current-book/metron-analytics/metron-statistics/index.html @@ -0,0 +1,916 @@ + + + + + + + + + Metron – Statistics and Mathematical Functions + + + + + + + + + + + + + + + + + +
+ + + + + +
+
+ +
+ + +
+ +

Statistics and Mathematical Functions

+

+

A variety of non-trivial and advanced analytics make use of statistics and advanced mathematical functions. Particular, capturing the statistical snapshots in a scalable way can open up doors for more advanced analytics such as outlier analysis. As such, this project is aimed at capturing a robust set of statistical functions and statistical-based algorithms in the form of Stellar functions. These functions can be used from everywhere where Stellar is used.

+
+

Stellar Functions

+
+

Approximation Statistics

+
+

HLLP_ADD

+ +
    + +
  • Description: Add value to the HyperLogLogPlus estimator set. See HLLP README
  • + +
  • Input: + +
      + +
    • hyperLogLogPlus - the hllp estimator to add a value to
    • + +
    • value+ - value to add to the set. Takes a single item or a list.
    • +
  • + +
  • Returns: The HyperLogLogPlus set with a new value added
  • +
+
+

HLLP_CARDINALITY

+ +
    + +
  • Description: Returns HyperLogLogPlus-estimated cardinality for this set. See HLLP README
  • + +
  • Input: + +
      + +
    • hyperLogLogPlus - the hllp set
    • +
  • + +
  • Returns: Long value representing the cardinality for this set
  • +
+
+

HLLP_INIT

+ +
    + +
  • Description: Initializes the HyperLogLogPlus estimator set. p must be a value between 4 and sp and sp must be less than 32 and greater than 4. See HLLP README
  • + +
  • Input: + +
      + +
    • p - the precision value for the normal set
    • + +
    • sp - the precision value for the sparse set. If p is set, but sp is 0 or not specified, the sparse set will be disabled.
    • +
  • + +
  • Returns: A new HyperLogLogPlus set
  • +
+
+

HLLP_MERGE

+ +
    + +
  • Description: Merge hllp sets together. The resulting estimator is initialized with p and sp precision values from the first provided hllp estimator set. See HLLP README
  • + +
  • Input: + +
      + +
    • hllp - List of hllp estimators to merge. Takes a single hllp set or a list.
    • +
  • + +
  • Returns: A new merged HyperLogLogPlus estimator set
  • +
+
+

Mathematical Functions

+
+

ABS

+ +
    + +
  • Description: Returns the absolute value of a number.
  • + +
  • Input: + +
      + +
    • number - The number to take the absolute value of
    • +
  • + +
  • Returns: The absolute value of the number passed in.
  • +
+
+

BIN

+ +
    + +
  • Description: Computes the bin that the value is in given a set of bounds.
  • + +
  • Input: + +
      + +
    • value - The value to bin
    • + +
    • bounds - A list of value bounds (excluding min and max) in sorted order.
    • +
  • + +
  • Returns: Which bin N the value falls in such that bound(N-1) < value <= bound(N). No min and max bounds are provided, so values smaller than the 0’th bound go in the 0’th bin, and values greater than the last bound go in the M’th bin.
  • +
+
+

Distributional Statistics

+
+

STATS_ADD

+ +
    + +
  • Description: Adds one or more input values to those that are used to calculate the summary statistics.
  • + +
  • Input: + +
      + +
    • stats - The Stellar statistics object. If null, then a new one is initialized.
    • + +
    • value+ - One or more numbers to add
    • +
  • + +
  • Returns: A Stellar statistics object
  • +
+
+

STATS_BIN

+ +
    + +
  • Description: Computes the bin that the value is in based on the statistical distribution.
  • + +
  • Input: + +
      + +
    • stats - The Stellar statistics object
    • + +
    • value - The value to bin
    • + +
    • bounds? - A list of percentile bin bounds (excluding min and max) or a string representing a known and common set of bins. For convenience, we have provided QUARTILE, QUINTILE, and DECILE which you can pass in as a string arg. If this argument is omitted, then we assume a Quartile bin split.
    • +
  • + +
  • Returns: "Which bin N the value falls in such that bound(N-1) < value <= bound(N). No min and max bounds are provided, so values smaller than the 0’th bound go in the 0’th bin, and values greater than the last bound go in the M’th bin.
  • +
+
+

STATS_COUNT

+ +
    + +
  • Description: Calculates the count of the values accumulated (or in the window if a window is used).
  • + +
  • Input: + +
      + +
    • stats - The Stellar statistics object
    • +
  • + +
  • Returns: The count of the values in the window or NaN if the statistics object is null.
  • +
+
+

STATS_GEOMETRIC_MEAN

+ +
+
+

STATS_INIT

+ +
    + +
  • Description: Initializes a statistics object
  • + +
  • Input: + +
      + +
    • window_size - The number of input data values to maintain in a rolling window in memory. If window_size is equal to 0, then no rolling window is maintained. Using no rolling window is less memory intensive, but cannot calculate certain statistics like percentiles and kurtosis.
    • +
  • + +
  • Returns: A Stellar statistics object
  • +
+
+

STATS_KURTOSIS

+ +
+
+

STATS_MAX

+ +
    + +
  • Description: Calculates the maximum of the accumulated values (or in the window if a window is used).
  • + +
  • Input: + +
      + +
    • stats - The Stellar statistics object
    • +
  • + +
  • Returns: The maximum of the accumulated values in the window or NaN if the statistics object is null.
  • +
+
+

STATS_MEAN

+ +
    + +
  • Description: Calculates the mean of the accumulated values (or in the window if a window is used).
  • + +
  • Input: + +
      + +
    • stats - The Stellar statistics object
    • +
  • + +
  • Returns: The mean of the values in the window or NaN if the statistics object is null.
  • +
+
+

STATS_MERGE

+ +
    + +
  • Description: Merges statistics objects.
  • + +
  • Input: + +
      + +
    • statistics - A list of statistics objects
    • +
  • + +
  • Returns: A Stellar statistics object
  • +
+
+

STATS_MIN

+ +
    + +
  • Description: Calculates the minimum of the accumulated values (or in the window if a window is used).
  • + +
  • Input: + +
      + +
    • stats - The Stellar statistics object
    • +
  • + +
  • Returns: The minimum of the accumulated values in the window or NaN if the statistics object is null.
  • +
+
+

STATS_PERCENTILE

+ +
    + +
  • Description: Computes the p’th percentile of the accumulated values (or in the window if a window is used).
  • + +
  • Input: + +
      + +
    • stats - The Stellar statistics object
    • + +
    • p - a double where 0 <= p < 1 representing the percentile
    • +
  • + +
  • Returns: The p’th percentile of the data or NaN if the statistics object is null
  • +
+
+

STATS_POPULATION_VARIANCE

+ +
+
+

STATS_QUADRATIC_MEAN

+ +
+
+

STATS_SD

+ +
+
+

STATS_SKEWNESS

+ +
+
+

STATS_SUM

+ +
    + +
  • Description: Calculates the sum of the accumulated values (or in the window if a window is used).
  • + +
  • Input: + +
      + +
    • stats - The Stellar statistics object
    • +
  • + +
  • Returns: The sum of the values in the window or NaN if the statistics object is null.
  • +
+
+

STATS_SUM_LOGS

+ +
+
+

STATS_SUM_SQUARES

+ +
    + +
  • Description: Calculates the sum of the squares of the accumulated values (or in the window if a window is used).
  • + +
  • Input: + +
      + +
    • stats - The Stellar statistics object
    • +
  • + +
  • Returns: The sum of the squares of the values in the window or NaN if the statistics object is null.
  • +
+
+

STATS_VARIANCE

+ +
+
+

Statistical Outlier Detection

+
+

OUTLIER_MAD_STATE_MERGE

+ +
    + +
  • Description: Update the statistical state required to compute the Median Absolute Deviation.
  • + +
  • Input: + +
      + +
    • [state] - A list of Median Absolute Deviation States to merge. Generally these are states across time.
    • + +
    • currentState? - The current state (optional)
    • +
  • + +
  • Returns: The Median Absolute Deviation state
  • +
+
+

OUTLIER_MAD_ADD

+ +
    + +
  • Description: Add a piece of data to the state.
  • + +
  • Input: + +
      + +
    • state - The MAD state
    • + +
    • value - The numeric value to add
    • +
  • + +
  • Returns: The MAD state
  • +
+
+

OUTLIER_MAD_SCORE

+ + +

+

Outlier Analysis

+

A common desire is to find anomalies in numerical data. To that end, we have some simple statistical anomaly detectors.

+
+

Median Absolute Deviation

+

Much has been written about this robust estimator. See the first page of http://web.ipac.caltech.edu/staff/fmasci/home/astro_refs/BetterThanMAD.pdf for a good coverage of the good and the bad of MAD. The usage, however is fairly straightforward:

+ +
    + +
  • Gather the statistical state required to compute the MAD + +
      + +
    • The distribution of the values of a univariate random variable over time.
    • + +
    • The distribution of the absolute deviations of the values from the median.
    • +
  • + +
  • Use this statistical state to score unseen values. The higher the score, the more unlike the previously seen data the value is.
  • +
+

There are a couple of issues which make MAD a bit hard to compute. First, the statistical state requires computing median, which can be computationally expensive to compute exactly. To get around this, we use the OnlineStatisticalProvider to compute a sketch rather than the exact median. Secondly, the statistical state for seasonal data should be limited to a fixed, trailing window. We do this by ensuring that the MAD state is mergeable and able to be queried from within the Profiler.

+
+

Example

+

We will create a dummy data stream of gaussian noise to illustrate how to use the MAD functionality along with the profiler to tag messages as outliers or not.

+

To do this, we will create a

+ +
    + +
  • data generator
  • + +
  • parser
  • + +
  • profiler profile
  • + +
  • enrichment and threat triage
  • +
+
+

Data Generator

+

We can create a simple python script to generate a stream of gaussian noise at the frequency of one message per second as a python script which should be saved at ~/rand_gen.py:

+ +
+
+
#!/usr/bin/python
+import random
+import sys
+import time
+def main():
+  mu = float(sys.argv[1])
+  sigma = float(sys.argv[2])
+  freq_s = int(sys.argv[3])
+  while True:
+    print str(random.gauss(mu, sigma))
+    sys.stdout.flush()
+    time.sleep(freq_s)
+
+if __name__ == '__main__':
+  main()
+
+

This script will take the following as arguments:

+ +
    + +
  • The mean of the data generated
  • + +
  • The standard deviation of the data generated
  • + +
  • The frequency (in seconds) of the data generated
  • +
+

If, however, you’d like to test a longer tailed distribution, like the student t-distribution and have numpy installed, you can use the following as ~/rand_gen.py:

+ +
+
+
#!/usr/bin/python
+import random
+import sys
+import time
+import numpy as np
+
+def main():
+  df = float(sys.argv[1])
+  freq_s = int(sys.argv[2])
+  while True:
+    print str(np.random.standard_t(df))
+    sys.stdout.flush()
+    time.sleep(freq_s)
+
+if __name__ == '__main__':
+  main()
+
+

This script will take the following as arguments:

+ +
    + +
  • The degrees of freedom for the distribution
  • + +
  • The frequency (in seconds) of the data generated
  • +
+
+

The Parser

+

We will create a parser that will take the single numbers in and create a message with a field called value in them using the CSVParser.

+

Add the following file to $METRON_HOME/config/zookeeper/parsers/mad.json:

+ +
+
+
{
+  "parserClassName" : "org.apache.metron.parsers.csv.CSVParser"
+ ,"sensorTopic" : "mad"
+ ,"parserConfig" : {
+    "columns" : {
+      "value_str" : 0
+                }
+                   }
+ ,"fieldTransformations" : [
+    {
+    "transformation" : "STELLAR"
+   ,"output" : [ "value" ]
+   ,"config" : {
+      "value" : "TO_DOUBLE(value_str)"
+               }
+    }
+                           ]
+}
+
+
+

Enrichment and Threat Intel

+

We will set a threat triage level of 10 if a message generates a outlier score of more than 3.5. This cutoff will depend on your data and should be adjusted based on the assumed underlying distribution. Note that under the assumptions of normality, MAD will act as a robust estimator of the standard deviation, so the cutoff should be considered the number of standard deviations away. For other distributions, there are other interpretations which will make sense in the context of measuring the “degree different”. See http://eurekastatistics.com/using-the-median-absolute-deviation-to-find-outliers/ for a brief discussion of this.

+

Create the following in $METRON_HOME/config/zookeeper/enrichments/mad.json:

+ +
+
+
{
+  "index": "mad",
+  "batchSize": 1,
+  "enrichment": {
+    "fieldMap": {
+      "stellar" : {
+        "config" : {
+          "parser_score" : "OUTLIER_MAD_SCORE(OUTLIER_MAD_STATE_MERGE(
+PROFILE_GET( 'sketchy_mad', 'global', PROFILE_FIXED(10, 'MINUTES')) ), value)"
+         ,"is_alert" : "if parser_score > 3.5 then true else is_alert"
+        }
+      }
+    }
+  ,"fieldToTypeMap": { }
+  },
+  "threatIntel": {
+    "fieldMap": { },
+    "fieldToTypeMap": { },
+    "triageConfig" : {
+      "riskLevelRules" : [
+        {
+          "rule" : "parser_score > 3.5",
+          "score" : 10
+        }
+      ],
+      "aggregator" : "MAX"
+    }
+  }
+}
+
+
+

The Profiler

+

We can set up the profiler to track the MAD statistical state required to compute MAD. For the purposes of this demonstration, we will configure the profiler to capture statistics on the minute mark. We will capture a global statistical state for the value field and we will look back for a 5 minute window when computing the median.

+

Create the following file at $METRON_HOME/config/zookeeper/profiler.json:

+ +
+
+
{
+  "profiles": [
+    {
+      "profile": "sketchy_mad",
+      "foreach": "'global'",
+      "onlyif": "true",
+      "init" : {
+        "s": "OUTLIER_MAD_STATE_MERGE(PROFILE_GET('sketchy_mad',
+'global', PROFILE_FIXED(5, 'MINUTES')))"
+               },
+      "update": {
+        "s": "OUTLIER_MAD_ADD(s, value)"
+                },
+      "result": "s"
+    }
+  ]
+}
+
+

Adjust $METRON_HOME/config/zookeeper/global.json to adjust the capture duration:

+ +
+
+
 "profiler.client.period.duration" : "1",
+ "profiler.client.period.duration.units" : "MINUTES"
+
+

Adjust $METRON_HOME/config/profiler.properties to adjust the capture duration by changing profiler.period.duration=15 to profiler.period.duration=1

+
+

Execute the Flow

+ +
    + +
  1. +

    Install the elasticsearch head plugin by executing: /usr/share/elasticsearch/bin/plugin install mobz/elasticsearch-head

  2. + +
  3. +

    Stopping all other parser topologies via monit

  4. + +
  5. +

    Create the mad kafka topic by executing: /usr/hdp/current/kafka-broker/bin/kafka-topics.sh --zookeeper node1:2181 --create --topic mad --partitions 1 --replication-factor 1

  6. + +
  7. +

    Push the modified configs by executing: $METRON_HOME/bin/zk_load_configs.sh --mode PUSH -z node1:2181 -i $METRON_HOME/config/zookeeper/

  8. + +
  9. +

    Start the profiler by executing: $METRON_HOME/bin/start_profiler_topology.sh

  10. + +
  11. +

    Start the parser topology by executing: $METRON_HOME/bin/start_parser_topology.sh -k node1:6667 -z node1:2181 -s mad

  12. + +
  13. +

    Ensure that the enrichment and indexing topologies are started. If not, then start those via monit or by hand.

  14. + +
  15. +

    Generate data into kafka by executing the following for at least 10 minutes: ~/rand_gen.py 0 1 1 | /usr/hdp/current/kafka-broker/bin/kafka-console-producer.sh --broker-list node1:6667 --topic mad Note: if you chose the use the t-distribution script above, you would adjust the parameters of the rand_gen.py script accordingly.

  16. + +
  17. +

    Stop the above with ctrl-c and send in an obvious outlier into kafka: echo "1000" | /usr/hdp/current/kafka-broker/bin/kafka-console-producer.sh --broker-list node1:6667 --topic mad

  18. +
+

You should be able to find the outlier via the elasticsearch head plugin by searching for the messages where is_alert is true.

+
+
+
+ +
+ +
+
+
Copyright © 2017. + All Rights Reserved. + +
+ + + +
+
+ + http://git-wip-us.apache.org/repos/asf/incubator-metron/blob/a055de44/current-book/metron-deployment/amazon-ec2/index.html ---------------------------------------------------------------------- diff --git a/current-book/metron-deployment/amazon-ec2/index.html b/current-book/metron-deployment/amazon-ec2/index.html new file mode 100644 index 0000000..8eab79a --- /dev/null +++ b/current-book/metron-deployment/amazon-ec2/index.html @@ -0,0 +1,521 @@ + + + + + + + + + Metron – Apache Metron on Amazon EC2 + + + + + + + + + + + + + + + + + +
+ + + + + +
+
+ +
+ + +
+ +

Apache Metron on Amazon EC2

+

This project fully automates the provisioning of Apache Metron on Amazon EC2 infrastructure. Starting with only your Amazon EC2 credentials, this project will create a fully-functioning, end-to-end, multi-node cluster running Apache Metron.

+

Warning: Amazon will charge for the use of their resources when running Apache Metron. The amount will vary based on the number and size of hosts, along with current Amazon pricing structure. Be sure to stop or terminate all of the hosts instantiated by Apache Metron when not in use to avoid unnecessary charges.

+
+

Getting Started

+
+

Prerequisites

+

The host used to deploy Apache Metron will need the following software tools installed. The following versions are known to work as of the time of this writing, but by no means are these the only working versions.

+ +
    + +
  • Ansible 2.0.0.2
  • + +
  • Python 2.7.11
  • + +
  • Maven 3.3.9
  • +
+

Any platform that supports these tools is suitable, but the following instructions cover only macOS. The easiest means of installing these tools on a Mac is to use the excellent Homebrew project.

+ +
    + +
  1. +

    Install Homebrew by running the following command in a terminal. Refer to the Homebrew home page for the latest installation instructions.

    + +
    +
    +
      /usr/bin/ruby -e "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install)"
    +
  2. + +
  3. +

    With Homebrew installed, run the following command in a terminal to install all of the required tools.

    + +
    +
    +
      brew cask install java
    +  brew install maven git
    +  brew install https://raw.githubusercontent.com/Homebrew/homebrew-core/ee1273bf919a5e4e50838513a9e55ea423e1d7ce/Formula/ansible.rb
    +  brew switch ansible 2.0.0.2
    +
  4. + +
  5. +

    Ensure that a public SSH key is located at ~/.ssh/id_rsa.pub.

    + +
    +
    +
      $ cat ~/.ssh/id_rsa.pub
    +  ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQChv5GJxPjR39UJV7VY17ivbLVlxFrH7UHwh1Jsjem4d1eYiAtde5N2y65/HRNxWbhYli9ED8k0/MRP92ejewucEbrPNq5mytPqdC4IvZ98Ln2GbqTDwvlP3T7xa/wYFOpFsOmXXql8216wSrnrS4f3XK7ze34S6/VmY+lsBYnr3dzyj8sG/mexpJgFS/w83mWJV0e/ryf4Hd7P6DZ5fO+nmTXfKNK22ga4ctcnbZ+toYcPL+ODCh8598XCKVo97XjwF5OxN3vl1p1HHguo3cHB4H1OIaqX5mUt59gFIZcAXUME89PO6NUiZDd3RTstpf125nQVkQAHu2fvW96/f037 nick@localhost
    +
    +

    If this file does not exist, run the following command at a terminal and accept all defaults. Only the public key, not the private key, will be uploaded to Amazon and configured on each host to enable SSH connectivity. While it is possible to create and use an alternative key those details will not be covered.

    + +
    +
    +
      ssh-keygen -t rsa
    +
  6. +
+
+

Amazon Web Services

+

If you already have an Amazon Web Services account that you have used to deploy EC2 hosts, then you should be able to skip the next few steps.

+ +
    + +
  1. +

    Head over to Amazon Web Services and create an account. As part of the account creation process you will need to provide a credit card to cover any charges that may apply.

  2. + +
  3. +

    Create a set of user credentials through Amazon’s Identity and Access Management (IAM) dashboard. On the IAM dashboard menu click “Users” and then “Create New User”. Provide a name and ensure that “Generate an access key for each user” remains checked. Download the credentials and keep them for later use.

  4. + +
  5. +

    While still in Amazon’s Identity and Access Management (IAM) dashboard, click on the user that was previously created. Click the “Permissions” tab and then the “Attach Policy” button. Attach the following policies to the user.

    + +
      + +
    • AmazonEC2FullAccess
    • + +
    • AmazonVPCFullAccess
    • +
  6. + +
  7. +

    Apache Metron uses the official, open source CentOS 6 Amazon Machine Image (AMI). If you have never used this AMI before then you will need to accept Amazon’s terms and conditions. Navigate to the web page for this AMI and click the “Continue” button. Choose the “Manual Launch” tab then click the “Accept Software Terms” button.

  8. +
+

Having successfully created your Amazon Web Services account, hopefully you will find that the most difficult tasks are behind us.

+
+

Deploy Metron

+ +
    + +
  1. +

    Use the Amazon access key by exporting its values via the shell’s environment. This allows Ansible to authenticate with Amazon EC2. For example:

    + +
    +
    +
      export AWS_ACCESS_KEY_ID="AKIAI6NRFEO27E5FFELQ"
    +  export AWS_SECRET_ACCESS_KEY="vTDydWJQnAer7OWauUS150i+9Np7hfCXrrVVP6ed"
    +
    +

    Notice: You must replace the access key values above with values from your own access key.

  2. + +
  3. +

    Start the Apache Metron deployment process. When prompted provide a unique name for your Metron environment or accept the default.

    + +
    +
    +
      $ ./run.sh
    +  Metron Environment [metron-test]: my-metron-env
    +  ...
    +
    +

    The process is likely to take between 70-90 minutes. Fortunately, everything is fully automated and you should feel free to grab a coffee.

  4. +
+
+

Explore Metron

+ +
    + +
  1. +

    After the deployment has completed successfully, a message like the following will be displayed. Navigate to the specified resources to explore your newly minted Apache Metron environment.

    + +
    +
    +
      TASK [debug] *******************************************************************
    +  ok: [localhost] => {
    +  "Success": [
    +      "Apache Metron deployed successfully",
    +      "   Metron  @  http://ec2-52-37-255-142.us-west-2.compute.amazonaws.com:5000",
    +      "   Ambari  @  http://ec2-52-37-225-202.us-west-2.compute.amazonaws.com:8080",
    +      "   Sensors @  ec2-52-37-225-202.us-west-2.compute.amazonaws.com on tap0",
    +      "For additional information, see https://metron.incubator.apache.org/'"
    +  ]
    +  }
    +
  2. + +
  3. +

    Each of the provisioned hosts will be accessible from the internet. Connecting to one over SSH as the user centos will not require a password as it will authenticate with the pre-defined SSH key.

    + +
    +
    +
      ssh centos@ec2-52-91-215-174.compute-1.amazonaws.com
    +
  4. +
+
+

Advanced Usage

+
+

Multiple Environments

+

This process can support provisioning of multiple, isolated environments. Simply change the env settings in conf/defaults.yml. For example, you might provision separate development, test, and production environments.

+ +
+
+
env: metron-test
+
+
+

Selective Provisioning

+

To provision only subsets of the entire Metron deployment, Ansible tags can be specified. For example, to only deploy the sensors on an Amazon EC2 environment, run the following command.

+ +
+
+
ansible-playbook -i ec2.py playbook.yml --tags "ec2,sensors"
+
+
+

Custom SSH Key

+

By default, the playbook will attempt to register your public SSH key ~/.ssh/id_rsa.pub with each provisioned host. This enables Ansible to communicate with each host using an SSH connection. If would prefer to use another key simply add the path to the public key file to the key_file property in conf/defaults.yml.

+

For example, generate a new SSH key for Metron that will be stored at ~/.ssh/my-metron-key.

+ +
+
+
$ ssh-keygen -q -f ~/.ssh/my-metron-key
+Enter passphrase (empty for no passphrase):
+Enter same passphrase again:
+
+

Add the path to the newly created SSH public key to conf/defaults.yml.

+ +
+
+
key_file: ~/.ssh/metron-private-key.pub
+
+
+

Common Errors

+
+

Error: [unsupported_operation_exception] custom format isn’t supported

+

This error might be seen within Metron’s default dashboard in Kibana 4. This occurs when the index templates do not exist for the Snort, Bro or YAF indices in Elasticsearch.

+

The dashboard expects fields to be of a certain type. If the index templates have not been loaded correctly, the data types for the fields in these indices will be incorrect and the dashboard will display this error.

+
+

Solution

+

If you see this error, please report your findings by creating a JIRA or dropping an email to the Metron Users mailing list. Follow these steps to work around the problem.

+

(1) Define which Elasticsearch host to interact with. Any Elasticsearch host should work.

+ +
+
+
export ES_HOST="http://ec2-52-25-237-20.us-west-2.compute.amazonaws.com:9200"
+
+

(2) Confirm the index templates are in fact missing.

+ +
+
+
curl -s -XGET $ES_HOST/_template
+
+

(3) Manually load the index templates.

+ +
+
+
cd metron-deployment
+curl -s -XPOST $ES_HOST/_template/bro_index -d @roles/metron_elasticsearch_templates/files/es_templates/bro_index.template
+curl -s -XPOST $ES_HOST/_template/snort_index -d @roles/metron_elasticsearch_templates/files/es_templates/snort_index.template
+curl -s -XPOST $ES_HOST/_template/yaf_index -d @roles/metron_elasticsearch_templates/files/es_templates/yaf_index.template
+
+

(4) Delete the existing indexes. Only a new index will use the templates defined in the previous step.

+ +
+
+
curl -s -XDELETE "$ES_HOST/yaf_index*"
+curl -s -XDELETE "$ES_HOST/bro_index*"
+curl -s -XDELETE "$ES_HOST/snort_index*"
+
+

(5) Open up Kibana and wait for the new indexes to be created. The dashboard should now work.

+
+

Error: ‘No handler was ready to authenticate…Check your credentials’

+ +
+
+
TASK [Define keypair] **********************************************************
+failed: [localhost] => (item=ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDXbcb1AlWsEPP
+  r9jEFrn0yun3PYNidJ/...david@hasselhoff.com) => {"failed": true, "item": "ssh-r
+  sa AAAAB3NzaC1yc2EAAAADAQABAAABAQDXbcb1AlWsEPPr9jEFr... david@hasselhoff.com",
+  "msg": "No handler was ready to authenticate. 1 handlers were checked.
+  ['HmacAuthV4Handler'] Check your credentials"}
+
+
+

Solution 1

+

This occurs when Ansible does not have the correct AWS access keys. The following commands must return a valid access key that is defined within Amazon’s Identity and Access Management console.

+ +
+
+
$ echo $AWS_ACCESS_KEY_ID
+AKIAI6NRFEO27E5FFELQ
+
+$ echo $AWS_SECRET_ACCESS_KEY
+vTDydWJQnAer7OWauUS150i+9Np7hfCXrrVVP6ed
+
+
+

Solution 2

+

This error can occur if you have exported the correct AWS access key, but you are using sudo to run the Ansible playbook. Do not use the sudo command when running the Ansible playbook.

+
+

Error: ‘OptInRequired: … you need to accept terms and subscribe’

+ +
+
+
TASK [metron-test: Instantiate 1 host(s) as sensors,ambari_master,metron,ec2] **
+fatal: [localhost]: FAILED! => {"changed": false, "failed": true, "msg":
+"Instance creation failed => OptInRequired: In order to use this AWS Marketplace
+product you need to accept terms and subscribe. To do so please visit
+http://aws.amazon.com/marketplace/pp?sku=6x5jmcajty9edm3f211pqjfn2"}
+to retry, use: --limit @playbook.retry
+
+
+

Solution

+

Apache Metron uses the official CentOS 6 Amazon Machine Image when provisioning hosts. Amazon requires that you accept certain terms and conditions when using any Amazon Machine Image (AMI). Follow the link provided in the error message to accept the terms and conditions then re-run the playbook.

+
+

Error: ‘PendingVerification: Your account is currently being verified’

+ +
+
+
TASK [metron-test: Instantiate 1 host(s) as sensors,ambari_master,metron,ec2] **
+fatal: [localhost]: FAILED! => {"changed": false, "failed": true, "msg":
+"Instance creation failed => PendingVerification: Your account is currently
+being verified. Verification normally takes less than 2 hours. Until your
+account is verified, you may not be able to launch additional instances or
+create additional volumes. If you are still receiving this message after more
+than 2 hours, please let us know by writing to aws-verification@amazon.com. We
+appreciate your patience."}
+to retry, use: --limit @playbook.retry
+
+
+

Solution

+

This will occur if you are attempting to deploy Apache Metron using a newly created Amazon Web Services account. Follow the advice of the message and wait until Amazon’s verification process is complete. Amazon has some additional advice for dealing with this error and more.

+ +
+

Your account is pending verification. Until the verification process is complete, you may not be able to carry out requests with this account. If you have questions, contact AWS Support.

+
+
+

Error: ‘Instance creation failed => InstanceLimitExceeded’

+ +
+
+
TASK [metron-test: Instantiate 3 host(s) as search,metron,ec2] *****************
+fatal: [localhost]: FAILED! => {"changed": false, "failed": true, "msg":
+"Instance creation failed => InstanceLimitExceeded: You have requested more
+instances (11) than your current instance limit of 10 allows for the specified
+instance type. Please visit http://aws.amazon.com/contact-us/ec2-request to
+request an adjustment to this limit."}
+to retry, use: --limit @playbook.retry
+
+
+

Solution

+

This will occur if Apache Metron attempts to deploy more host instances than allowed by your account. The total number of instances required for Apache Metron can be reduced by editing deployment/amazon-ec/playbook.yml. Perhaps a better alternative is to request of Amazon that this limit be increased. Amazon has some additional advice for dealing with this error and more.

+ +
+

You’ve reached the limit on the number of instances you can run concurrently. The limit depends on the instance type. For more information, see How many instances can I run in Amazon EC2. If you need additional instances, complete the Amazon EC2 Instance Request Form.

+
+
+

Error: ‘SSH encountered an unknown error during the connection’

+ +
+
+
TASK [setup] *******************************************************************
+fatal: [ec2-52-26-113-221.us-west-2.compute.amazonaws.com]: UNREACHABLE! => {
+  "changed": false, "msg": "SSH encountered an unknown error during the
+  connection. We recommend you re-run the command using -vvvv, which will enable
+  SSH debugging output to help diagnose the issue", "unreachable": true}
+
+
+

Solution

+

This most often indicates that Ansible cannot connect to the host with the SSH key that it has access to. This could occur if hosts are provisioned with one SSH key, but the playbook is executed subsequently with a different SSH key. The issue can be addressed by either altering the key_file variable to point to the key that was used to provision the hosts or by simply terminating all hosts and re-running the playbook.

+
+
+
+ +
+ +
+
+
Copyright © 2017. + All Rights Reserved. + +
+ + + +
+
+ + http://git-wip-us.apache.org/repos/asf/incubator-metron/blob/a055de44/current-book/metron-deployment/index.html ---------------------------------------------------------------------- diff --git a/current-book/metron-deployment/index.html b/current-book/metron-deployment/index.html new file mode 100644 index 0000000..43e6678 --- /dev/null +++ b/current-book/metron-deployment/index.html @@ -0,0 +1,434 @@ + + + + + + + + + Metron – Overview + + + + + + + + + + + + + + + + + +
+ + + + + +
+
+ +
+ + +
+ +

Overview

+

+

This set of playbooks can be used to deploy an Ambari-managed Hadoop cluster, Metron services, or both using ansible playbooks. These playbooks currently only target RHEL/CentOS 6.x operating systems.

+

In addition, an Ambari Management Pack can be built which can be deployed in conjuction with RPMs detailed in this README.

+
+

Prerequisites

+

The following tools are required to run these scripts:

+ + +

Currently Metron must be built from source. Before running these scripts perform the following steps:

+ +
    + +
  1. Clone the Metron git repository with git clone git@github.com:apache/incubator-metron.git
  2. + +
  3. Navigate to incubator-metron and run mvn clean package
  4. +
+

These scripts depend on two files for configuration:

+ +
    + +
  • hosts - declares which Ansible roles will be run on which hosts
  • + +
  • group_vars/all - various configuration settings needed to install Metron
  • +
+

Examples can be found in the incubator-metron/metron-deployment/inventory/metron_example directory and are a good starting point. Copy this directory into incubator-metron/metron-deployment/inventory/ and rename it to your project_name. More information about Ansible files and directory structure can be found at http://docs.ansible.com/ansible/playbooks_best_practices.html.

+
+

Ambari

+

The Ambari playbook will install a Hadoop cluster with all the services and configuration required by Metron. This section can be skipped if installing Metron on a pre-existing cluster.

+

Currently, this playbook supports building a local development cluster running on one node but options for other types of clusters will be added in the future.

+
+

Setting up your inventory

+

Make sure to update the hosts file in incubator-metron/metron-deployment/inventory/project_name/hosts or provide an alternate inventory file when you launch the playbooks, including the ssh user(s) and ssh keyfile location(s). These playbooks expect two host groups:

+ +
    + +
  • ambari_master
  • + +
  • ambari_slaves
  • +
+
+

Running the playbook

+

This playbook will install the Ambari server on the ambari_master, install the ambari agents on the ambari_slaves, and create a cluster in Ambari with a blueprint for the required Metron components.

+

Navigate to incubator-metron/metron-deployment/playbooks and run: ansible-playbook -i ../inventory/project_name ambari_install.yml

+
+

Metron

+

The Metron playbook will gather the necessary cluster settings from Ambari and install the Metron services.

+
+

Setting up your inventory

+

Edit the hosts file at incubator-metron/metron-deployment/inventory/project_name/hosts. Declare where which hosts the Metron services will be installed on by updating these groups:

+ +
    + +
  • enrichment - submits the topology code to Storm and requires a storm client
  • + +
  • search - host where Elasticsearch will be run
  • + +
  • web - host where the Metron UI and underlying services will run
  • + +
  • sensors - host where network data will be collected and published to Kafka
  • +
+

The Metron topologies depend on Kafka topics and HBase tables being created beforehand. Declare a host that has Kafka and HBase clients installed by updating these groups:

+ +
    + +
  • metron_kafka_topics
  • + +
  • metron_hbase_tables
  • +
+

If only installing Metron, these groups can be ignored:

+ +
    + +
  • ambari_master
  • + +
  • ambari_slaves
  • +
+
+

Configuring group variables

+

The Metron Ansible scripts depend on a set of variables. These variables can be found in the file at incubator-metron/metron-deployment/inventory/project_name/group_vars/all. Edit the ambari* variables to match your Ambari instance and update the java_home variable to match the java path on your hosts.

+
+

Running the playbook

+

Navigate to incubator-metron/metron-deployment/playbooks and run: ansible-playbook -i ../inventory/project_name metron_install.yml

+
+

Vagrant

+

A VagrantFile is included and will install a working version of the entire Metron stack. The following is required to run this:

+ +
    + +
  • Vagrant
  • + +
  • Hostmanager plugin for vagrant - Run vagrant plugin install vagrant-hostmanager on the machine where Vagrant is installed
  • +
+

Navigate to incubator-metron/metron-deployment/vagrant/full-dev-platform and run vagrant up. This also provides a good example of how to run a full end-to-end Metron install.

+
+

Ambari Management Pack

+

An Ambari Management Pack can be built in order to make the Metron service available on top of an existing stack, rather than needing a direct stack update.

+

This will set up

+ +
    + +
  • Metron Parsers
  • + +
  • Enrichment
  • + +
  • Indexing
  • + +
  • GeoIP data
  • + +
  • Optional Elasticsearch
  • + +
  • Optional Kibana
  • +
+
+

Prerequisites

+ +
    + +
  • A cluster managed by Ambari 2.4
  • + +
  • Metron RPMs available on the cluster in the /localrepo directory. See RPM for further information.
  • +
+
+

Building Management Pack

+

From metron-deployment run

+ +
+
+
mvn clean package
+
+

A tar.gz that can be used with Ambari can be found at metron-deployment/packaging/ambari/metron-mpack/target/

+
+

Installing Management Pack

+

Before installing the mpack, update Storm’s topology.classpath in Ambari to include ‘/etc/hbase/conf:/etc/hadoop/conf’. Restart Storm service.

+

Place the mpack’s tar.gz onto the node running Ambari Server. From the command line on this node, run

+ +
+
+
ambari-server install-mpack --mpack=<mpack_location> --verbose
+
+

This will make the services available in Ambari in the same manner as any services in a stack, e.g. through Add Services or during cluster install. The Indexing / Parsers/ Enrichment masters should be colocated with a Kafka Broker (to create topics) and HBase client (to create the enrichment and theatintel tables). This colocation is currently not enforced by Ambari, and should be managed by either a Service or Stack advisor as an enhancement.

+

Several configuration parameters will need to be filled in, and should be pretty self explanatory (primarily a couple of Elasticsearch configs, and the Storm REST URL). Examples are provided in the descriptions on Ambari. Notably, the URL for the GeoIP database that is preloaded (and is prefilled by default) can be set to use a file:/// location

+

After installation, a custom action is available in Ambari (where stop / start services are) to install Elasticsearch templates. Similar to this, a custom Kibana action to Load Template is available.

+

Another custom action is available in Ambari to import Zeppelin dashboards. See the metron-indexing documentation

+
+

Offline installation

+

Currently there is only one point that would reach out to the internet during an install. This is the URL for the GeoIP database information.

+

The RPMs DO NOT reach out to the internet (because there is currently no hosting for them). They look on the local filesystem in /localrepo.

+
+

Current Limitations

+

There are a set of limitations that should be addressed based to improve the current state of the mpacks.

+ +
    + +
  • There is currently no hosting for RPMs remotely. They will have to be built locally.
  • + +
  • Colocation of appropriate services should be enforced by Ambari. See [#Installing Management Pack] for more details.
  • + +
  • Storm’s topology.classpath is not updated with the Metron service install and needs to be updated separately.
  • + +
  • Several configuration parameters used when installing the Metron service could (and should) be grabbed from Ambari. Install will require them to be manually entered.
  • + +
  • Need to handle upgrading Metron
  • +
+
+

RPM

+

RPMs can be built to install the components in metron-platform. These RPMs are built in a Docker container and placed into target.

+

Components in the RPMs:

+ +
    + +
  • metron-common
  • + +
  • metron-data-management
  • + +
  • metron-elasticsearch
  • + +
  • metron-enrichment
  • + +
  • metron-parsers
  • + +
  • metron-pcap
  • + +
  • metron-solr
  • +
+
+

Prerequisites

+ +
    + +
  • Docker. The image detailed in: metron-deployment/packaging/docker/rpm-docker/README.md will automatically be built (or rebuilt if necessary).
  • + +
  • Artifacts for metron-platform have been produced. E.g. mvn clean package -DskipTests in metron-platform
  • +
+
+

Building RPMs

+

From metron-deployment run

+ +
+
+
mvn clean package -Pbuild-rpms
+
+

The output RPM files will land in target/RPMS/noarch. They can be installed with the standard

+ +
+
+
rpm -i <package>
+
+
+

TODO

+ +
    + +
  • Support Ubuntu deployments
  • +
+
+
+
+ +
+ + + +