From commits-return-3653-archive-asf-public=cust-asf.ponee.io@metron.apache.org Fri Aug 31 21:20:08 2018 Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by mx-eu-01.ponee.io (Postfix) with SMTP id 4595618072F for ; Fri, 31 Aug 2018 21:20:08 +0200 (CEST) Received: (qmail 51611 invoked by uid 500); 31 Aug 2018 19:19:59 -0000 Mailing-List: contact commits-help@metron.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@metron.apache.org Delivered-To: mailing list commits@metron.apache.org Received: (qmail 51070 invoked by uid 99); 31 Aug 2018 19:19:58 -0000 Received: from git1-us-west.apache.org (HELO git1-us-west.apache.org) (140.211.11.23) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 31 Aug 2018 19:19:58 +0000 Received: by git1-us-west.apache.org (ASF Mail Server at git1-us-west.apache.org, from userid 33) id B4D1AE11C9; Fri, 31 Aug 2018 19:19:57 +0000 (UTC) Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit From: nickallen@apache.org To: commits@metron.apache.org Date: Fri, 31 Aug 2018 19:20:34 -0000 Message-Id: <15c2f38f1a7c4cf6ac63deeeb89886b6@git.apache.org> In-Reply-To: <5c0bd5a5ccb04b2d9c067031d7b43fe7@git.apache.org> References: <5c0bd5a5ccb04b2d9c067031d7b43fe7@git.apache.org> X-Mailer: ASF-Git Admin Mailer Subject: [38/50] [abbrv] metron git commit: METRON-1737: Document Job cleanup (merrimanr via mmiklavc) closes apache/metron#1164 METRON-1737: Document Job cleanup (merrimanr via mmiklavc) closes apache/metron#1164 Project: http://git-wip-us.apache.org/repos/asf/metron/repo Commit: http://git-wip-us.apache.org/repos/asf/metron/commit/6b70571d Tree: http://git-wip-us.apache.org/repos/asf/metron/tree/6b70571d Diff: http://git-wip-us.apache.org/repos/asf/metron/diff/6b70571d Branch: refs/remotes/apache/feature/METRON-1699-create-batch-profiler Commit: 6b70571d6de3951c98269bbf5b38e8b69deddfab Parents: d9e1f38 Author: merrimanr Authored: Wed Aug 15 16:00:13 2018 -0600 Committer: Michael Miklavcic Committed: Wed Aug 15 16:00:13 2018 -0600 ---------------------------------------------------------------------- metron-interface/metron-rest/README.md | 11 +++++++++++ 1 file changed, 11 insertions(+) ---------------------------------------------------------------------- http://git-wip-us.apache.org/repos/asf/metron/blob/6b70571d/metron-interface/metron-rest/README.md ---------------------------------------------------------------------- diff --git a/metron-interface/metron-rest/README.md b/metron-interface/metron-rest/README.md index 080422d..2c216d1 100644 --- a/metron-interface/metron-rest/README.md +++ b/metron-interface/metron-rest/README.md @@ -222,6 +222,17 @@ Out of the box it is a simple wrapper around the tshark command to transform raw REST will supply the script with raw pcap data through standard in and expects PDML data serialized as XML. Pcap query jobs can be configured for submission to a YARN queue. This setting is exposed as the Spring property `pcap.yarn.queue`. If configured, the REST application will set the `mapreduce.job.queuename` Hadoop property to that value. +It is highly recommended that a dedicated YARN queue be created and configured for Pcap queries to prevent a job from consuming too many cluster resources. More information about setting up YARN queues can be found [here](https://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/CapacityScheduler.html#Setting_up_queues). + +Pcap query results are stored in HDFS. The location of query results when run through the REST app is determined by a couple factors. The root of Pcap query results defaults to `/apps/metron/pcap/output` but can be changed with the +Spring property `pcap.final.output.path`. Assuming the default Pcap query output directory, the path to a result page will follow this pattern: +``` +/apps/metron/pcap/output/{username}/MAP_REDUCE/{job id}/page-{page number}.pcap +``` +Over time Pcap query results will accumulate in HDFS. Currently these results are not cleaned up automatically so cluster administrators should be aware of this and monitor them. It is highly recommended that a process be put in place to +periodically delete files and directories under the Pcap query results root. + +Users should also be mindful of date ranges used in queries so they don't produce result sets that are too large. Currently there are no limits enforced on date ranges. Queries can also be configured on a global level for setting the number of results per page via a Spring property `pcap.page.size`. By default, this value is set to 10 pcaps per page, but you may choose to set this value higher based on observing frequenetly-run query result sizes. This setting works in conjunction with the property for setting finalizer threadpool size when optimizing query performance.