metron-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
Subject [38/50] [abbrv] metron git commit: METRON-1737: Document Job cleanup (merrimanr via mmiklavc) closes apache/metron#1164
Date Fri, 31 Aug 2018 19:20:34 GMT
METRON-1737: Document Job cleanup (merrimanr via mmiklavc) closes apache/metron#1164


Branch: refs/remotes/apache/feature/METRON-1699-create-batch-profiler
Commit: 6b70571d6de3951c98269bbf5b38e8b69deddfab
Parents: d9e1f38
Author: merrimanr <>
Authored: Wed Aug 15 16:00:13 2018 -0600
Committer: Michael Miklavcic <>
Committed: Wed Aug 15 16:00:13 2018 -0600

 metron-interface/metron-rest/ | 11 +++++++++++
 1 file changed, 11 insertions(+)
diff --git a/metron-interface/metron-rest/ b/metron-interface/metron-rest/
index 080422d..2c216d1 100644
--- a/metron-interface/metron-rest/
+++ b/metron-interface/metron-rest/
@@ -222,6 +222,17 @@ Out of the box it is a simple wrapper around the tshark command to transform
 REST will supply the script with raw pcap data through standard in and expects PDML data
serialized as XML.
 Pcap query jobs can be configured for submission to a YARN queue.  This setting is exposed
as the Spring property `pcap.yarn.queue`.  If configured, the REST application will set the
`mapreduce.job.queuename` Hadoop property to that value.
+It is highly recommended that a dedicated YARN queue be created and configured for Pcap queries
to prevent a job from consuming too many cluster resources.  More information about setting
up YARN queues can be found [here](
+Pcap query results are stored in HDFS.  The location of query results when run through the
REST app is determined by a couple factors.  The root of Pcap query results defaults to `/apps/metron/pcap/output`
but can be changed with the 
+Spring property ``.  Assuming the default Pcap query output directory,
the path to a result page will follow this pattern:
+/apps/metron/pcap/output/{username}/MAP_REDUCE/{job id}/page-{page number}.pcap
+Over time Pcap query results will accumulate in HDFS.  Currently these results are not cleaned
up automatically so cluster administrators should be aware of this and monitor them.  It is
highly recommended that a process be put in place to 
+periodically delete files and directories under the Pcap query results root.
+Users should also be mindful of date ranges used in queries so they don't produce result
sets that are too large.  Currently there are no limits enforced on date ranges.
 Queries can also be configured on a global level for setting the number of results per page
via a Spring property ``. By default, this value is set to 10 pcaps per page,
but you may choose to set this value higher
 based on observing frequenetly-run query result sizes. This setting works in conjunction
with the property for setting finalizer threadpool size when optimizing query performance.

View raw message