tez-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From hit...@apache.org
Subject git commit: TEZ-610. Update INSTALL.txt. (hitesh)
Date Tue, 12 Nov 2013 12:22:45 GMT
Updated Branches:
  refs/heads/master 8aac5ba45 -> d4bb730ea


TEZ-610. Update INSTALL.txt. (hitesh)


Project: http://git-wip-us.apache.org/repos/asf/incubator-tez/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-tez/commit/d4bb730e
Tree: http://git-wip-us.apache.org/repos/asf/incubator-tez/tree/d4bb730e
Diff: http://git-wip-us.apache.org/repos/asf/incubator-tez/diff/d4bb730e

Branch: refs/heads/master
Commit: d4bb730eaa4b8b793b2c7f8b08e1faa50ed566e8
Parents: 8aac5ba
Author: Hitesh Shah <hitesh@apache.org>
Authored: Tue Nov 12 04:22:20 2013 -0800
Committer: Hitesh Shah <hitesh@apache.org>
Committed: Tue Nov 12 04:22:20 2013 -0800

----------------------------------------------------------------------
 INSTALL.txt | 53 ++++++++++++++++++++++++++++++++++-------------------
 1 file changed, 34 insertions(+), 19 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/incubator-tez/blob/d4bb730e/INSTALL.txt
----------------------------------------------------------------------
diff --git a/INSTALL.txt b/INSTALL.txt
index e3082ca..f60a487 100644
--- a/INSTALL.txt
+++ b/INSTALL.txt
@@ -1,39 +1,43 @@
-How to run MR on TEZ
+How to use TEZ
 =======================
 
-Tez provides an ApplicationMaster that can run MR or MRR jobs. There is a
-translation layer implemented ( may have bugs so please file JIRAs if you
-come across any issues ) that allows a user to run an MR job against the TEZ
-DAG ApplicationMaster.
-
-NOTE: Due to some quirks, all jobs will show an IOException and a failure at
-the end. Please look at the YARN ResourceManager UI to check the correct result
-of the job.
+Tez provides an ApplicationMaster that can run any arbritary DAG of tasks. It also
+provides a translation layer to run MR or MRR jobs using the MR APIs. This translation
+layer is not fully feature compatible so if you do see any issues with running your
+existing MR jobs on TEZ, please file jiras.
 
 Install/Deploy Instructions
 ===========================
 
-1) Deploy Apache Hadoop using the 3.0.0-SNAPSHOT from trunk.
-   - You can also use the 2.1.0 branch. For this, the only change will be to build
-     tez with the -Dhadoop.version set to the correct version matching the hadoop
-     branch being used.
-2) Copy the tez jars and their dependencies into HDFS.
-3) Configure tez-site.xml to set tez.lib.uris to point to the paths in HDFS containing
+1) Deploy Apache Hadoop either using the 2.2.0 release or use 3.0.0-SNAPSHOT from trunk.
+2) Build tez using "mvn clean install -DskipTests=true -Dmaven.javadoc.skip=true"
+   - If you prefer to run the unit tests, remove skipTests from the command above.
+   - If you would like to create a tarball of the release, use
+     "mvn clean package -Dtar -DskipTests=true -Dmaven.javadoc.skip=true"
+3) Copy the tez jars and their dependencies into HDFS.
+   - The tez jars and dependencies will be found in tez-dist/target/tez-0.2.0-SNAPSHOT/tez-0.2.0-SNAPSHOT
+     if you run the intial command mentioned in step 2.
+   - Assuming that the tez jars are put in /apps/ on HDFS, the command would be
+     "hadoop dfs -put tez-dist/target/tez-0.2.0-SNAPSHOT/tez-0.2.0-SNAPSHOT /apps/"
+   - Please do not upload the tarball to HDFS, upload only the jars.
+4) Configure tez-site.xml to set tez.lib.uris to point to the paths in HDFS containing
    the jars. Please note that the paths are not searched recursively so for <basedir>
    and <basedir>/lib/, you will need to configure the 2 paths as a comma-separated
list.
-3) Modify mapred-site.xml to change "mapreduce.framework.name" property from its
+   - Assuming you followed step 3, the value would be:
+      "${fs.default.name}/apps/tez-0.2.0-SNAPSHOT,${fs.default.name}/apps/tez-0.2.0-SNAPSHOT/lib/"
+5) Modify mapred-site.xml to change "mapreduce.framework.name" property from its
    default value of "yarn" to "yarn-tez"
-4) set HADOOP_CLASSPATH to have the following paths in it:
+6) set HADOOP_CLASSPATH to have the following paths in it:
    - TEZ_CONF_DIR - location of tez-site.xml
    - TEZ_JARS and TEZ_JARS/libs - location of the tez jars and dependencies.
-5) Submit a MR job as you normally would using something like:
+7) Submit a MR job as you normally would using something like:
 
 $HADOOP_PREFIX/bin/hadoop jar hadoop-mapreduce-client-jobclient-3.0.0-SNAPSHOT-tests.jar
sleep -mt 1 -rt 1 -m 1 -r 1
 
 This will use the TEZ DAG ApplicationMaster to run the MR job. This can be
 verified by looking at the AM's logs from the YARN ResourceManager UI.
 
-6) There is a basic example of using an MRR job in the tez-mapreduce-examples.jar. Refer
to OrderedWordCount.java
+8) There is a basic example of using an MRR job in the tez-mapreduce-examples.jar. Refer
to OrderedWordCount.java
 in the source code. To run this example:
 
 $HADOOP_PREFIX/bin/hadoop jar tez-mapreduce-examples.jar orderedwordcount <input> <output>
@@ -41,3 +45,14 @@ $HADOOP_PREFIX/bin/hadoop jar tez-mapreduce-examples.jar orderedwordcount
<input
 This will use the TEZ DAG ApplicationMaster to run the ordered word count job. This job is
similar
 to the word count example except that it also orders all words based on the frequency of
 occurrence.
+
+There are multiple variations to run orderedwordcount. You can use it to run multiple
+DAGs serially on different inputs/outputs. These DAGs could be run separately as
+different applications or serially within a single TEZ session.
+
+$HADOOP_PREFIX/bin/hadoop jar tez-mapreduce-examples.jar orderedwordcount <input1>
<output1> <input2> <output2> <input3> <output3> ...
+
+The above will run multiple DAGs for each input-output pair. To use TEZ sessions,
+set -DUSE_TEZ_SESSION=true
+
+$HADOOP_PREFIX/bin/hadoop jar tez-mapreduce-examples.jar orderedwordcount -DUSE_TEZ_SESSION=true
<input1> <output1> <input2> <output2>


Mime
View raw message