tez-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From hit...@apache.org
Subject svn commit: r1546845 - /incubator/tez/site/install.html
Date Sun, 01 Dec 2013 20:12:45 GMT
Author: hitesh
Date: Sun Dec  1 20:12:44 2013
New Revision: 1546845

URL: http://svn.apache.org/r1546845
Log:
Update install guide

Modified:
    incubator/tez/site/install.html

Modified: incubator/tez/site/install.html
URL: http://svn.apache.org/viewvc/incubator/tez/site/install.html?rev=1546845&r1=1546844&r2=1546845&view=diff
==============================================================================
--- incubator/tez/site/install.html (original)
+++ incubator/tez/site/install.html Sun Dec  1 20:12:44 2013
@@ -298,11 +298,19 @@
             <!-- Licensed to the Apache Software Foundation (ASF) under one or more --><!--
contributor license agreements.  See the NOTICE file distributed with --><!-- this work
for additional information regarding copyright ownership. --><!-- The ASF licenses this
file to You under the Apache License, Version 2.0 --><!-- (the "License"); you may not
use this file except in compliance with --><!-- the License.  You may obtain a copy
of the License at --><!--  --><!-- http://www.apache.org/licenses/LICENSE-2.0
--><!--  --><!-- Unless required by applicable law or agreed to in writing, software
--><!-- distributed under the License is distributed on an "AS IS" BASIS, --><!--
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. --><!-- See
the License for the specific language governing permissions and --><!-- limitations
under the License. --><!--  --><div class="section">
 <h2>Install/Deploy Instructions<a name="InstallDeploy_Instructions"></a></h2>
 <ol style="list-style-type: lower-roman">
-<li>Deploy Apache Hadoop using either the 2.1.0-beta release or build the 3.0.0-SNAPSHOT
from trunk.
+<li>Deploy Apache Hadoop using either the 2.2.0 release or build the 3.0.0-SNAPSHOT
from trunk.
 <ul>
 <li>One thing to note though when compiling Tez is that you will need to change the
value of the hadoop.version property in the top-level pom.xml to match the version of the
hadoop branch being used.</li></ul></li>
-<li>Copy the tez jars and their dependencies into HDFS.</li>
-<li>Configure tez-site.xml to set tez.lib.uris to point to the paths in HDFS containing
the jars. Please note that the paths are not searched recursively so for <i>basedir</i>
and <i>basedir</i>/lib/, you will need to configure the 2 paths as a comma-separated
list.</li>
+<li>Build tez using &quot;mvn clean install -DskipTests=true -Dmaven.javadoc.skip=true&quot;
+<ul>
+<li>If you prefer to run the unit tests, remove skipTests from the command above.</li>
+<li>If you would like to create a tarball of the release, use &quot;mvn clean package
-Dtar -DskipTests=true -Dmaven.javadoc.skip=true&quot;</li></ul></li>
+<li>Copy the tez jars and their dependencies into HDFS.
+<ul>
+<li>The tez jars and dependencies will be found in tez-dist/target/tez-0.2.0-SNAPSHOT/tez-0.2.0-SNAPSHOT
if you run the intial command mentioned in step 2.</li>
+<li>Assuming that the tez jars are put in /apps/ on HDFS, the command would be &quot;hadoop
dfs -put tez-dist/target/tez-0.2.0-SNAPSHOT/tez-0.2.0-SNAPSHOT /apps/&quot;</li>
+<li>Please do not upload the tarball to HDFS, upload only the jars.</li></ul></li>
+<li>Configure tez-site.xml to set tez.lib.uris to point to the paths in HDFS containing
the jars. Please note that the paths are not searched recursively so for <i>basedir</i>
and <i>basedir</i>/lib/, you will need to configure the 2 paths as a comma-separated
list. * Assuming you followed step 3, the value would be: &quot;${fs.default.name}/apps/tez-0.2.0-SNAPSHOT,${fs.default.name}/apps/tez-0.2.0-SNAPSHOT/lib/&quot;</li>
 <li>Modify mapred-site.xml to change &quot;mapreduce.framework.name&quot; property
from its default value of &quot;yarn&quot; to &quot;yarn-tez&quot;</li>
 <li>Set HADOOP_CLASSPATH to have the following paths in it:
 <ul>
@@ -310,12 +318,27 @@
 <li>TEZ_JARS and TEZ_JARS/libs - location of the tez jars and dependencies.</li></ul></li>
 <li>Submit a MR job as you normally would using something like:
 <div class="source">
-<pre>$HADOOP_PREFIX/bin/hadoop jar hadoop-mapreduce-client-jobclient-VERSION-tests.jar
sleep -mt 1 -rt 1 -m 1 -r 1</pre></div>
+<pre>
+$HADOOP_PREFIX/bin/hadoop jar hadoop-mapreduce-client-jobclient-3.0.0-SNAPSHOT-tests.jar
sleep -mt 1 -rt 1 -m 1 -r 1
+</pre></div>
 <p>This will use the TEZ DAG ApplicationMaster to run the MR job. This can be verified
by looking at the AM's logs from the YARN ResourceManager UI.</p></li>
 <li>There is a basic example of using an MRR job in the tez-mapreduce-examples.jar.
Refer to OrderedWordCount.java in the source code. To run this example:
 <div class="source">
-<pre>$HADOOP_PREFIX/bin/hadoop jar tez-mapreduce-examples.jar orderedwordcount &lt;input&gt;
&lt;output&gt;</pre></div>
-<p>This will use the TEZ DAG ApplicationMaster to run the ordered word count job. This
job is similar to the word count example except that it also orders all words based on the
frequency of occurrence.</p></li></ol></div>
+<pre>
+$HADOOP_PREFIX/bin/hadoop jar tez-mapreduce-examples.jar orderedwordcount &lt;input&gt;
&lt;output&gt;
+</pre></div>
+<p>This will use the TEZ DAG ApplicationMaster to run the ordered word count job. This
job is similar to the word count example except that it also orders all words based on the
frequency of occurrence.</p>
+<p>There are multiple variations to run orderedwordcount. You can use it to run multiple
DAGs serially on different inputs/outputs. These DAGs could be run separately as different
applications or serially within a single TEZ session.</p>
+<div class="source">
+<pre>
+$HADOOP_PREFIX/bin/hadoop jar tez-mapreduce-examples.jar orderedwordcount &lt;input1&gt;
&lt;output1&gt; &lt;input2&gt; &lt;output2&gt; &lt;input3&gt;
&lt;output3&gt; ...
+</pre></div>
+<p>The above will run multiple DAGs for each input-output pair.</p>
+<p>To use TEZ sessions, set -DUSE_TEZ_SESSION=true</p>
+<div class="source">
+<pre>
+$HADOOP_PREFIX/bin/hadoop jar tez-mapreduce-examples.jar orderedwordcount -DUSE_TEZ_SESSION=true
&lt;input1&gt; &lt;output1&gt; &lt;input2&gt; &lt;output2&gt;
+</pre></div></li></ol></div>
                   </div>
             </div>
           </div>



Mime
View raw message