Return-Path: X-Original-To: apmail-hadoop-common-commits-archive@www.apache.org Delivered-To: apmail-hadoop-common-commits-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id A765E176C0 for ; Fri, 30 Jan 2015 21:43:46 +0000 (UTC) Received: (qmail 50605 invoked by uid 500); 30 Jan 2015 21:43:46 -0000 Delivered-To: apmail-hadoop-common-commits-archive@hadoop.apache.org Received: (qmail 50405 invoked by uid 500); 30 Jan 2015 21:43:46 -0000 Mailing-List: contact common-commits-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: common-dev@hadoop.apache.org Delivered-To: mailing list common-commits@hadoop.apache.org Received: (qmail 50080 invoked by uid 99); 30 Jan 2015 21:43:45 -0000 Received: from git1-us-west.apache.org (HELO git1-us-west.apache.org) (140.211.11.23) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 30 Jan 2015 21:43:45 +0000 Received: by git1-us-west.apache.org (ASF Mail Server at git1-us-west.apache.org, from userid 33) id 9B421E0120; Fri, 30 Jan 2015 21:43:45 +0000 (UTC) Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit From: zhz@apache.org To: common-commits@hadoop.apache.org Date: Fri, 30 Jan 2015 21:43:49 -0000 Message-Id: In-Reply-To: <057d332d65304d0ca22cc3456f016a22@git.apache.org> References: <057d332d65304d0ca22cc3456f016a22@git.apache.org> X-Mailer: ASF-Git Admin Mailer Subject: [5/9] hadoop git commit: MAPREDUCE-6150. Update document of Rumen (Masatake Iwasaki via aw) MAPREDUCE-6150. Update document of Rumen (Masatake Iwasaki via aw) Project: http://git-wip-us.apache.org/repos/asf/hadoop/repo Commit: http://git-wip-us.apache.org/repos/asf/hadoop/commit/d0f21bd9 Tree: http://git-wip-us.apache.org/repos/asf/hadoop/tree/d0f21bd9 Diff: http://git-wip-us.apache.org/repos/asf/hadoop/diff/d0f21bd9 Branch: refs/heads/HDFS-EC Commit: d0f21bd9e8ea14e55bef911d2b677d9d1486a752 Parents: afac550 Author: Allen Wittenauer Authored: Thu Jan 29 14:17:44 2015 -0800 Committer: Zhe Zhang Committed: Fri Jan 30 13:42:05 2015 -0800 ---------------------------------------------------------------------- hadoop-mapreduce-project/CHANGES.txt | 2 + hadoop-project/src/site/site.xml | 1 + .../hadoop-rumen/src/site/markdown/Rumen.md.vm | 135 ++++++++++++------- 3 files changed, 91 insertions(+), 47 deletions(-) ---------------------------------------------------------------------- http://git-wip-us.apache.org/repos/asf/hadoop/blob/d0f21bd9/hadoop-mapreduce-project/CHANGES.txt ---------------------------------------------------------------------- diff --git a/hadoop-mapreduce-project/CHANGES.txt b/hadoop-mapreduce-project/CHANGES.txt index 39ff8cc..496913f 100644 --- a/hadoop-mapreduce-project/CHANGES.txt +++ b/hadoop-mapreduce-project/CHANGES.txt @@ -264,6 +264,8 @@ Release 2.7.0 - UNRELEASED MAPREDUCE-6141. History server leveldb recovery store (jlowe) + MAPREDUCE-6150. Update document of Rumen (Masatake Iwasaki via aw) + OPTIMIZATIONS MAPREDUCE-6169. MergeQueue should release reference to the current item http://git-wip-us.apache.org/repos/asf/hadoop/blob/d0f21bd9/hadoop-project/src/site/site.xml ---------------------------------------------------------------------- diff --git a/hadoop-project/src/site/site.xml b/hadoop-project/src/site/site.xml index 6fa6648..113cb13 100644 --- a/hadoop-project/src/site/site.xml +++ b/hadoop-project/src/site/site.xml @@ -105,6 +105,7 @@ + http://git-wip-us.apache.org/repos/asf/hadoop/blob/d0f21bd9/hadoop-tools/hadoop-rumen/src/site/markdown/Rumen.md.vm ---------------------------------------------------------------------- diff --git a/hadoop-tools/hadoop-rumen/src/site/markdown/Rumen.md.vm b/hadoop-tools/hadoop-rumen/src/site/markdown/Rumen.md.vm index e25f3a7..bee976a 100644 --- a/hadoop-tools/hadoop-rumen/src/site/markdown/Rumen.md.vm +++ b/hadoop-tools/hadoop-rumen/src/site/markdown/Rumen.md.vm @@ -29,9 +29,7 @@ Rumen - [Components](#Components) - [How to use Rumen?](#How_to_use_Rumen) - [Trace Builder](#Trace_Builder) - - [Example](#Example) - [Folder](#Folder) - - [Examples](#Examples) - [Appendix](#Appendix) - [Resources](#Resources) - [Dependencies](#Dependencies) @@ -128,18 +126,21 @@ can use the `Folder` utility to fold the current trace to the desired length. The remaining part of this section explains these utilities in detail. -> Examples in this section assumes that certain libraries are present -> in the java CLASSPATH. See Section-3.2 for more details. +Examples in this section assumes that certain libraries are present +in the java CLASSPATH. See [Dependencies](#Dependencies) for more details. $H3 Trace Builder -`Command:` +$H4 Command - java org.apache.hadoop.tools.rumen.TraceBuilder [options] +``` +java org.apache.hadoop.tools.rumen.TraceBuilder [options] +``` -This command invokes the `TraceBuilder` utility of -*Rumen*. It converts the JobHistory files into a series of JSON +This command invokes the `TraceBuilder` utility of *Rumen*. + +TraceBuilder converts the JobHistory files into a series of JSON objects and writes them into the `` file. It also extracts the cluster layout (topology) and writes it in the`` file. @@ -169,7 +170,7 @@ Cluster topology is used as follows : * To extrapolate splits information for tasks with missing splits details or synthetically generated tasks. -`Options :` +$H4 Options @@ -204,33 +205,45 @@ Cluster topology is used as follows : $H4 Example - java org.apache.hadoop.tools.rumen.TraceBuilder file:///home/user/job-trace.json file:///home/user/topology.output file:///home/user/logs/history/done +*Rumen* expects certain library *JARs* to be present in the *CLASSPATH*. +One simple way to run Rumen is to use +`$HADOOP_HOME/bin/hadoop jar` command to run it as example below. -This will analyze all the jobs in +``` +java org.apache.hadoop.tools.rumen.TraceBuilder \ + file:///tmp/job-trace.json \ + file:///tmp/job-topology.json \ + hdfs:///tmp/hadoop-yarn/staging/history/done_intermediate/testuser +``` -`/home/user/logs/history/done` stored on the -`local` FileSystem and output the jobtraces in -`/home/user/job-trace.json` along with topology -information in `/home/user/topology.output`. +This will analyze all the jobs in +`/tmp/hadoop-yarn/staging/history/done_intermediate/testuser` +stored on the `HDFS` FileSystem +and output the jobtraces in `/tmp/job-trace.json` +along with topology information in `/tmp/job-topology.json` +stored on the `local` FileSystem. $H3 Folder -`Command`: +$H4 Command - java org.apache.hadoop.tools.rumen.Folder [options] [input] [output] - -> Input and output to `Folder` is expected to be a fully -> qualified FileSystem path. So use file:// to specify -> files on the `local` FileSystem and hdfs:// to -> specify files on HDFS. +``` +java org.apache.hadoop.tools.rumen.Folder [options] [input] [output] +``` This command invokes the `Folder` utility of *Rumen*. Folding essentially means that the output duration of the resulting trace is fixed and job timelines are adjusted to respect the final output duration. -`Options :` +> Input and output to `Folder` is expected to be a fully +> qualified FileSystem path. So use `file://` to specify +> files on the `local` FileSystem and `hdfs://` to +> specify files on HDFS. + + +$H4 Options
@@ -335,14 +348,28 @@ to respect the final output duration. $H4 Examples $H5 Folding an input trace with 10 hours of total runtime to generate an output trace with 1 hour of total runtime - - java org.apache.hadoop.tools.rumen.Folder -output-duration 1h -input-cycle 20m file:///home/user/job-trace.json file:///home/user/job-trace-1hr.json + +``` +java org.apache.hadoop.tools.rumen.Folder \ + -output-duration 1h \ + -input-cycle 20m \ + file:///tmp/job-trace.json \ + file:///tmp/job-trace-1hr.json +``` If the folded jobs are out of order then the command will bail out. $H5 Folding an input trace with 10 hours of total runtime to generate an output trace with 1 hour of total runtime and tolerate some skewness - java org.apache.hadoop.tools.rumen.Folder -output-duration 1h -input-cycle 20m -allow-missorting -skew-buffer-length 100 file:///home/user/job-trace.json file:///home/user/job-trace-1hr.json +``` +java org.apache.hadoop.tools.rumen.Folder \ + -output-duration 1h \ + -input-cycle 20m \ + -allow-missorting \ + -skew-buffer-length 100 \ + file:///tmp/job-trace.json \ + file:///tmp/job-trace-1hr.json +``` If the folded jobs are out of order, then atmost 100 jobs will be de-skewed. If the 101st job is @@ -350,23 +377,37 @@ If the folded jobs are out of order, then atmost $H5 Folding an input trace with 10 hours of total runtime to generate an output trace with 1 hour of total runtime in debug mode - java org.apache.hadoop.tools.rumen.Folder -output-duration 1h -input-cycle 20m -debug -temp-directory file:///tmp/debug file:///home/user/job-trace.json file:///home/user/job-trace-1hr.json +``` +java org.apache.hadoop.tools.rumen.Folder \ + -output-duration 1h \ + -input-cycle 20m \ + -debug -temp-directory file:///tmp/debug \ + file:///tmp/job-trace.json \ + file:///tmp/job-trace-1hr.json +``` This will fold the 10hr job-trace file -`file:///home/user/job-trace.json` to finish within 1hr +`file:///tmp/job-trace.json` to finish within 1hr and use `file:///tmp/debug` as the temporary directory. The intermediate files in the temporary directory will not be cleaned up. $H5 Folding an input trace with 10 hours of total runtime to generate an output trace with 1 hour of total runtime with custom concentration. - java org.apache.hadoop.tools.rumen.Folder -output-duration 1h -input-cycle 20m -concentration 2 file:///home/user/job-trace.json file:///home/user/job-trace-1hr.json +``` +java org.apache.hadoop.tools.rumen.Folder \ + -output-duration 1h \ + -input-cycle 20m \ + -concentration 2 \ + file:///tmp/job-trace.json \ + file:///tmp/job-trace-1hr.json +``` This will fold the 10hr job-trace file -`file:///home/user/job-trace.json` to finish within 1hr -with concentration of 2. `Example-2.3.2` will retain 10% -of the jobs. With *concentration* as 2, 20% of the total input -jobs will be retained. +`file:///tmp/job-trace.json` to finish within 1hr +with concentration of 2. +If the 10h job-trace is folded to 1h, it retains 10% of the jobs by default. +With *concentration* as 2, 20% of the total input jobs will be retained. Appendix @@ -377,21 +418,21 @@ $H3 Resources MAPREDUCE-751 is the main JIRA that introduced *Rumen* to *MapReduce*. Look at the MapReduce - -rumen-componentfor further details. +rumen-component +for further details. $H3 Dependencies -*Rumen* expects certain library *JARs* to be present in -the *CLASSPATH*. The required libraries are - -* `Hadoop MapReduce Tools` (`hadoop-mapred-tools-{hadoop-version}.jar`) -* `Hadoop Common` (`hadoop-common-{hadoop-version}.jar`) -* `Apache Commons Logging` (`commons-logging-1.1.1.jar`) -* `Apache Commons CLI` (`commons-cli-1.2.jar`) -* `Jackson Mapper` (`jackson-mapper-asl-1.4.2.jar`) -* `Jackson Core` (`jackson-core-asl-1.4.2.jar`) - -> One simple way to run Rumen is to use '$HADOOP_HOME/bin/hadoop jar' -> option to run it. +*Rumen* expects certain library *JARs* to be present in the *CLASSPATH*. +One simple way to run Rumen is to use +`hadoop jar` command to run it as example below. + +``` +$HADOOP_HOME/bin/hadoop jar \ + $HADOOP_HOME/share/hadoop/tools/lib/hadoop-rumen-2.5.1.jar \ + org.apache.hadoop.tools.rumen.TraceBuilder \ + file:///tmp/job-trace.json \ + file:///tmp/job-topology.json \ + hdfs:///tmp/hadoop-yarn/staging/history/done_intermediate/testuser +```