hadoop-common-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From cdoug...@apache.org
Subject svn commit: r722391 [1/2] - in /hadoop/core/trunk: ./ src/benchmarks/gridmix2/ src/benchmarks/gridmix2/src/ src/benchmarks/gridmix2/src/java/ src/benchmarks/gridmix2/src/java/org/ src/benchmarks/gridmix2/src/java/org/apache/ src/benchmarks/gridmix2/src...
Date Tue, 02 Dec 2008 07:03:09 GMT
Author: cdouglas
Date: Mon Dec  1 23:03:09 2008
New Revision: 722391

URL: http://svn.apache.org/viewvc?rev=722391&view=rev
Log:
HADOOP-3770. Add gridmix2, an iteration on the gridmix benchmark. Contributed by Runping Qi.

Added:
    hadoop/core/trunk/src/benchmarks/gridmix2/
    hadoop/core/trunk/src/benchmarks/gridmix2/README.gridmix2
    hadoop/core/trunk/src/benchmarks/gridmix2/build.xml
    hadoop/core/trunk/src/benchmarks/gridmix2/generateGridmix2data.sh
    hadoop/core/trunk/src/benchmarks/gridmix2/gridmix-env-2
    hadoop/core/trunk/src/benchmarks/gridmix2/gridmix_config.xml
    hadoop/core/trunk/src/benchmarks/gridmix2/rungridmix_2
    hadoop/core/trunk/src/benchmarks/gridmix2/src/
    hadoop/core/trunk/src/benchmarks/gridmix2/src/java/
    hadoop/core/trunk/src/benchmarks/gridmix2/src/java/org/
    hadoop/core/trunk/src/benchmarks/gridmix2/src/java/org/apache/
    hadoop/core/trunk/src/benchmarks/gridmix2/src/java/org/apache/hadoop/
    hadoop/core/trunk/src/benchmarks/gridmix2/src/java/org/apache/hadoop/mapred/
    hadoop/core/trunk/src/benchmarks/gridmix2/src/java/org/apache/hadoop/mapred/CombinerJobCreator.java
    hadoop/core/trunk/src/benchmarks/gridmix2/src/java/org/apache/hadoop/mapred/GenericMRLoadJobCreator.java
    hadoop/core/trunk/src/benchmarks/gridmix2/src/java/org/apache/hadoop/mapred/GridMixConfig.java
    hadoop/core/trunk/src/benchmarks/gridmix2/src/java/org/apache/hadoop/mapred/GridMixRunner.java
    hadoop/core/trunk/src/benchmarks/gridmix2/src/java/org/apache/hadoop/mapred/SortJobCreator.java
Modified:
    hadoop/core/trunk/CHANGES.txt
    hadoop/core/trunk/src/contrib/streaming/src/java/org/apache/hadoop/streaming/StreamJob.java

Modified: hadoop/core/trunk/CHANGES.txt
URL: http://svn.apache.org/viewvc/hadoop/core/trunk/CHANGES.txt?rev=722391&r1=722390&r2=722391&view=diff
==============================================================================
--- hadoop/core/trunk/CHANGES.txt (original)
+++ hadoop/core/trunk/CHANGES.txt Mon Dec  1 23:03:09 2008
@@ -149,6 +149,9 @@
 
     HADOOP-4649. Improve abstraction for spill indices. (cdouglas)
 
+    HADOOP-3770. Add gridmix2, an iteration on the gridmix benchmark. (Runping
+    Qi via cdouglas)
+
   OPTIMIZATIONS
 
     HADOOP-3293. Fixes FileInputFormat to do provide locations for splits

Added: hadoop/core/trunk/src/benchmarks/gridmix2/README.gridmix2
URL: http://svn.apache.org/viewvc/hadoop/core/trunk/src/benchmarks/gridmix2/README.gridmix2?rev=722391&view=auto
==============================================================================
--- hadoop/core/trunk/src/benchmarks/gridmix2/README.gridmix2 (added)
+++ hadoop/core/trunk/src/benchmarks/gridmix2/README.gridmix2 Mon Dec  1 23:03:09 2008
@@ -0,0 +1,136 @@
+### "Gridmix" Benchmark ###
+
+Contents:
+
+0 Overview
+1 Getting Started
+  1.0 Build
+  1.1 Configure
+  1.2 Generate test data
+2 Running
+  2.0 General
+  2.1 Non-Hod cluster
+  2.2 Hod
+    2.2.0 Static cluster
+    2.2.1 Hod cluster
+
+
+* 0 Overview
+
+The scripts in this package model a cluster workload. The workload is
+simulated by generating random data and submitting map/reduce jobs that
+mimic observed data-access patterns in user jobs. The full benchmark
+generates approximately 2.5TB of (often compressed) input data operated on
+by the following simulated jobs:
+
+1) Three stage map/reduce job
+	   Input:      500GB compressed (2TB uncompressed) SequenceFile
+                 (k,v) = (5 words, 100 words)
+                 hadoop-env: FIXCOMPSEQ
+     Compute1:   keep 10% map, 40% reduce
+	   Compute2:   keep 100% map, 77% reduce
+                 Input from Compute1
+     Compute3:   keep 116% map, 91% reduce
+                 Input from Compute2
+     Motivation: Many user workloads are implemented as pipelined map/reduce
+                 jobs, including Pig workloads
+
+2) Large sort of variable key/value size
+     Input:      500GB compressed (2TB uncompressed) SequenceFile
+                 (k,v) = (5-10 words, 100-10000 words)
+                 hadoop-env: VARCOMPSEQ
+     Compute:    keep 100% map, 100% reduce
+     Motivation: Processing large, compressed datsets is common.
+
+3) Reference select
+     Input:      500GB compressed (2TB uncompressed) SequenceFile
+                 (k,v) = (5-10 words, 100-10000 words)
+                 hadoop-env: VARCOMPSEQ
+     Compute:    keep 0.2% map, 5% reduce
+                 1 Reducer
+     Motivation: Sampling from a large, reference dataset is common.
+
+4) API text sort (java, streaming)
+     Input:      500GB uncompressed Text
+                 (k,v) = (1-10 words, 0-200 words)
+                 hadoop-env: VARINFLTEXT
+     Compute:    keep 100% map, 100% reduce
+     Motivation: This benchmark should exercise each of the APIs to
+                 map/reduce
+
+5) Jobs with combiner (word count jobs)
+
+A benchmark load is a mix of different numbers of small, medium, and large jobs of the above
types.
+The exact mix is specified in an xml file (gridmix_config.xml). We have a Java program to

+construct those jobs based on the xml file and put them under the control of a JobControl
object.
+The JobControl object then submitts the jobs to the cluster and monitors their progress until
all jobs complete.
+
+
+Notes(1-3): Since input data are compressed, this means that each mapper
+outputs a lot more bytes than it reads in, typically causing map output
+spills.
+
+
+
+* 1 Getting Started
+
+1.0 Build
+
+In the src/benchmarks/gridmix dir, type "ant".
+gridmix.jar will be created in the build subdir.
+copy gridmix.jar to gridmix dir.
+
+1.1 Configure environment variables
+
+One must modify gridmix-env-2 to set the following variables:
+
+HADOOP_HOME     The hadoop install location
+HADOOP_VERSION  The exact hadoop version to be used. e.g. hadoop-0.18.2-dev
+HADOOP_CONF_DIR The dir containing the hadoop-site.xml for teh cluster to be used.
+USE_REAL_DATA   A large data-set will be created and used by the benchmark if it is set to
true.
+
+
+1.2 Configure the job mixture
+
+A default gridmix_conf.xml file is provided.
+One may make appropriate changes as necessary on the number of jobs of various types
+and sizes. One can also change the number of reducers of each jobs, and specify whether 
+to compress the output data of a map/reduce job.
+Note that one can specify multiple numbers of in the 
+numOfJobs field and numOfReduces field, like:
+<property>
+  <name>javaSort.smallJobs.numOfJobs</name>
+  <value>8,2</value>
+  <description></description>
+</property>
+
+
+<property>
+  <name>javaSort.smallJobs.numOfReduces</name>
+  <value>15,70</value>
+  <description></description>
+</property>
+
+The above spec means that we will have 8 small java sort jobs with 15 reducers and 2 small
java sort 
+jobs with 17 reducers.
+
+1.3 Generate test data
+
+Test data is generated using the generateGridmix2Data.sh script. 
+        ./generateGridmix2Data.sh
+One may modify the structure and size of the data generated here. 
+
+It is sufficient to run the script without modification, though it may
+require up to 4TB of free space in the default filesystem. Changing the size
+of the input data (COMPRESSED_DATA_BYTES, UNCOMPRESSED_DATA_BYTES,
+INDIRECT_DATA_BYTES) is safe. A 4x compression ratio for generated, block
+compressed data is typical.
+
+* 2 Running
+
+You need to set HADOOP_CONF_DIR to the right directory where hadoop-site.xml exists.
+Then you just need to type 
+	./rungridmix_2
+It will create start.out to record the start time, and at the end, it will create end.out
to record the 
+endi time.
+

Added: hadoop/core/trunk/src/benchmarks/gridmix2/build.xml
URL: http://svn.apache.org/viewvc/hadoop/core/trunk/src/benchmarks/gridmix2/build.xml?rev=722391&view=auto
==============================================================================
--- hadoop/core/trunk/src/benchmarks/gridmix2/build.xml (added)
+++ hadoop/core/trunk/src/benchmarks/gridmix2/build.xml Mon Dec  1 23:03:09 2008
@@ -0,0 +1,67 @@
+<?xml version="1.0" ?>
+<project default="main" basedir=".">
+    <property name="Name" value="gridmix"/>
+    <property name="version" value="0.1"/>
+    <property name="final.name" value="${name}-${version}"/>
+    <property name="year" value="2008"/>	
+	<property name="hadoop.dir" value="${basedir}/../../../"/>
+    <property name="lib.dir" value="${hadoop.dir}/lib"/>
+    <property name="src.dir" value="${basedir}/src"/>
+    <property name="conf.dir" value="${basedir}/conf"/>
+    <property name="docs.dir" value="${basedir}/docs"/>
+    <property name="build.dir" value="${basedir}/build"/>
+    <property name="dist.dir" value="${basedir}/dist"/>
+    <property name="build.classes" value="${build.dir}/classes"/>
+	
+    <target name="init">
+        <mkdir dir="${build.dir}"/>
+        <mkdir dir="${dist.dir}"/>
+    </target>
+
+    <target name="main" depends="init, compile, compress" description="Main target">
+        <echo>
+            Building the .jar files.
+        </echo>
+    </target>
+  
+    <target name="compile" depends="init" description="Compilation target">
+        <javac srcdir="src/java/" destdir="${build.dir}">
+        	<classpath refid="classpath" />
+        </javac>
+    </target>
+	
+
+	 <target name="compress" depends="compile" description="Compression target">
+  	      <jar jarfile="${build.dir}/gridmix.jar" basedir="${build.dir}" includes="**/*.class"
/>
+                   
+
+        <copy todir="." includeEmptyDirs="false">
+            <fileset dir="${build.dir}">
+	        <exclude name="**" />
+	        <include name="**/*.jar" />
+            </fileset>
+        </copy>
+    </target>
+
+  
+    <!-- ================================================================== -->
+    <!-- Clean.  Delete the build files, and their directories              -->
+    <!-- ================================================================== -->
+    <target name="clean" description="Clean.  Delete the build files, and their directories">
+      <delete dir="${build.dir}"/>
+      <delete dir="${dist.dir}"/>
+    </target>
+
+    <!-- the normal classpath -->
+    <path id="classpath">
+	    <pathelement location="${build.classes}"/>
+	    <fileset dir="${lib.dir}">
+	       <include name="*.jar" />
+	       <exclude name="**/excluded/" />
+	    </fileset>
+	    <fileset dir="${hadoop.dir}/build">
+	       <include name="**.jar" />
+           <include name="contrib/streaming/*.jar" />
+	    </fileset>
+    </path>
+</project>

Added: hadoop/core/trunk/src/benchmarks/gridmix2/generateGridmix2data.sh
URL: http://svn.apache.org/viewvc/hadoop/core/trunk/src/benchmarks/gridmix2/generateGridmix2data.sh?rev=722391&view=auto
==============================================================================
--- hadoop/core/trunk/src/benchmarks/gridmix2/generateGridmix2data.sh (added)
+++ hadoop/core/trunk/src/benchmarks/gridmix2/generateGridmix2data.sh Mon Dec  1 23:03:09
2008
@@ -0,0 +1,94 @@
+#!/usr/bin/env bash
+ 
+##############################################################
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+#####################################################################
+
+GRID_DIR=`dirname "$0"`
+GRID_DIR=`cd "$GRID_DIR"; pwd`
+source $GRID_DIR/gridmix-env-2
+
+# Smaller data set is used by default.
+COMPRESSED_DATA_BYTES=2147483648
+UNCOMPRESSED_DATA_BYTES=536870912
+
+# Number of partitions for output data
+NUM_MAPS=100
+
+# If the env var USE_REAL_DATASET is set, then use the params to generate the bigger (real)
dataset.
+if [ ! -z ${USE_REAL_DATASET} ] ; then
+  echo "Using real dataset"
+  NUM_MAPS=492
+  # 2TB data compressing to approx 500GB
+  COMPRESSED_DATA_BYTES=2147483648000
+  # 500GB
+  UNCOMPRESSED_DATA_BYTES=536870912000
+fi
+
+## Data sources
+export GRID_MIX_DATA=/gridmix/data
+# Variable length key, value compressed SequenceFile
+export VARCOMPSEQ=${GRID_MIX_DATA}/WebSimulationBlockCompressed
+# Fixed length key, value compressed SequenceFile
+export FIXCOMPSEQ=${GRID_MIX_DATA}/MonsterQueryBlockCompressed
+# Variable length key, value uncompressed Text File
+export VARINFLTEXT=${GRID_MIX_DATA}/SortUncompressed
+# Fixed length key, value compressed Text File
+export FIXCOMPTEXT=${GRID_MIX_DATA}/EntropySimulationCompressed
+
+${HADOOP_HOME}/bin/hadoop jar \
+  ${EXAMPLE_JAR} randomtextwriter \
+  -D test.randomtextwrite.total_bytes=${COMPRESSED_DATA_BYTES} \
+  -D test.randomtextwrite.bytes_per_map=$((${COMPRESSED_DATA_BYTES} / ${NUM_MAPS})) \
+  -D test.randomtextwrite.min_words_key=5 \
+  -D test.randomtextwrite.max_words_key=10 \
+  -D test.randomtextwrite.min_words_value=100 \
+  -D test.randomtextwrite.max_words_value=10000 \
+  -D mapred.output.compress=true \
+  -D mapred.map.output.compression.type=BLOCK \
+  -outFormat org.apache.hadoop.mapred.SequenceFileOutputFormat \
+  ${VARCOMPSEQ} &
+
+
+${HADOOP_HOME}/bin/hadoop jar \
+  ${EXAMPLE_JAR} randomtextwriter \
+  -D test.randomtextwrite.total_bytes=${COMPRESSED_DATA_BYTES} \
+  -D test.randomtextwrite.bytes_per_map=$((${COMPRESSED_DATA_BYTES} / ${NUM_MAPS})) \
+  -D test.randomtextwrite.min_words_key=5 \
+  -D test.randomtextwrite.max_words_key=5 \
+  -D test.randomtextwrite.min_words_value=100 \
+  -D test.randomtextwrite.max_words_value=100 \
+  -D mapred.output.compress=true \
+  -D mapred.map.output.compression.type=BLOCK \
+  -outFormat org.apache.hadoop.mapred.SequenceFileOutputFormat \
+  ${FIXCOMPSEQ} &
+
+
+${HADOOP_HOME}/bin/hadoop jar \
+  ${EXAMPLE_JAR} randomtextwriter \
+  -D test.randomtextwrite.total_bytes=${UNCOMPRESSED_DATA_BYTES} \
+  -D test.randomtextwrite.bytes_per_map=$((${UNCOMPRESSED_DATA_BYTES} / ${NUM_MAPS})) \
+  -D test.randomtextwrite.min_words_key=1 \
+  -D test.randomtextwrite.max_words_key=10 \
+  -D test.randomtextwrite.min_words_value=0 \
+  -D test.randomtextwrite.max_words_value=200 \
+  -D mapred.output.compress=false \
+  -outFormat org.apache.hadoop.mapred.TextOutputFormat \
+  ${VARINFLTEXT} &
+
+

Added: hadoop/core/trunk/src/benchmarks/gridmix2/gridmix-env-2
URL: http://svn.apache.org/viewvc/hadoop/core/trunk/src/benchmarks/gridmix2/gridmix-env-2?rev=722391&view=auto
==============================================================================
--- hadoop/core/trunk/src/benchmarks/gridmix2/gridmix-env-2 (added)
+++ hadoop/core/trunk/src/benchmarks/gridmix2/gridmix-env-2 Mon Dec  1 23:03:09 2008
@@ -0,0 +1,35 @@
+#!/usr/bin/env bash
+
+##############################################################
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+#####################################################################
+
+
+## Environment configuration
+# Hadoop installation
+export HADOOP_VERSION=hadoop-0.18.2-dev
+export HADOOP_HOME=${HADOOP_INSTALL_HOME}/${HADOOP_VERSION}
+export HADOOP_CONF_DIR=
+export USE_REAL_DATASET=TRUE
+
+export APP_JAR=${HADOOP_HOME}/${HADOOP_VERSION}-test.jar
+export EXAMPLE_JAR=${HADOOP_HOME}/${HADOOP_VERSION}-examples.jar
+export STREAMING_JAR=${HADOOP_HOME}/contrib/streaming/${HADOOP_VERSION}-streaming.jar
+
+
+

Added: hadoop/core/trunk/src/benchmarks/gridmix2/gridmix_config.xml
URL: http://svn.apache.org/viewvc/hadoop/core/trunk/src/benchmarks/gridmix2/gridmix_config.xml?rev=722391&view=auto
==============================================================================
--- hadoop/core/trunk/src/benchmarks/gridmix2/gridmix_config.xml (added)
+++ hadoop/core/trunk/src/benchmarks/gridmix2/gridmix_config.xml Mon Dec  1 23:03:09 2008
@@ -0,0 +1,550 @@
+<?xml version="1.0"?>
+<?xml-stylesheet type="text/xsl" href="nutch-conf.xsl"?>
+
+<!-- Put site-specific property overrides in this file. -->
+
+<configuration>
+
+
+<property>
+  <name>GRID_MIX_DATA</name>
+  <value>/gridmix/data</value>
+  <description></description>
+</property>
+
+<property>
+  <name>FIXCOMPTEXT</name>
+  <value>${GRID_MIX_DATA}/EntropySimulationCompressed</value>
+  <description></description>
+</property>
+
+<property>
+  <name>VARINFLTEXT</name>
+  <value>${GRID_MIX_DATA}/SortUncompressed</value>
+  <description></description>
+</property>
+
+<property>
+  <name>FIXCOMPSEQ</name>
+  <value>${GRID_MIX_DATA}/MonsterQueryBlockCompressed</value>
+  <description></description>
+</property>
+
+<property>
+  <name>VARCOMPSEQ</name>
+  <value>${GRID_MIX_DATA}/WebSimulationBlockCompressed</value>
+  <description></description>
+</property>
+
+
+<property>
+  <name>streamSort.smallJobs.inputFiles</name>
+  <value>${VARINFLTEXT}/{part-00000,part-00001,part-00002}</value>
+  <description></description>
+</property>
+
+<property>
+  <name>streamSort.smallJobs.numOfJobs</name>
+  <value>40</value>
+  <description></description>
+</property>
+
+<property>
+  <name>streamSort.smallJobs.numOfReduces</name>
+  <value>15</value>
+  <description></description>
+</property>
+
+<property>
+  <name>streamSort.smallJobs.numOfMapoutputCompressed</name>
+  <value>40</value>
+  <description> </description>
+</property>
+
+<property>
+  <name>streamSort.smallJobs.numOfOutputCompressed</name>
+  <value>20</value>
+  <description> </description>
+</property>
+
+<property>
+  <name>streamSort.mediumJobs.numOfJobs</name>
+  <value>16</value>
+  <description></description>
+</property>
+<property>
+  <name>streamSort.mediumJobs.inputFiles</name>
+  <value>${VARINFLTEXT}/{part-000*0,part-000*1,part-000*2}</value>
+  <description></description>
+</property>
+<property>
+  <name>streamSort.mediumJobs.numOfReduces</name>
+  <value>170</value>
+  <description></description>
+</property>
+
+<property>
+  <name>streamSort.mediumJobs.numOfMapoutputCompressed</name>
+  <value>16</value>
+  <description> </description>
+</property>
+
+<property>
+  <name>streamSort.mediumJobs.numOfOutputCompressed</name>
+  <value>12</value>
+  <description> </description>
+</property>
+
+<property>
+  <name>streamSort.largeJobs.numOfJobs</name>
+  <value>5</value>
+  <description></description>
+</property>
+<property>
+  <name>streamSort.largeJobs.inputFiles</name>
+  <value>${VARINFLTEXT}</value>
+  <description></description>
+</property>
+<property>
+  <name>streamSort.largeJobs.numOfReduces</name>
+  <value>370</value>
+  <description></description>
+</property>
+
+<property>
+  <name>streamSort.largeJobs.numOfMapoutputCompressed</name>
+  <value>5</value>
+  <description> </description>
+</property>
+
+<property>
+  <name>streamSort.largeJobs.numOfOutputCompressed</name>
+  <value>3</value>
+  <description> </description>
+</property>
+
+
+<property>
+  <name>javaSort.smallJobs.numOfJobs</name>
+  <value>8,2</value>
+  <description></description>
+</property>
+<property>
+  <name>javaSort.smallJobs.inputFiles</name>
+  <value>${VARINFLTEXT}/{part-00000,part-00001,part-00002}</value>
+  <description></description>
+</property>
+<property>
+  <name>javaSort.smallJobs.numOfReduces</name>
+  <value>15,70</value>
+  <description></description>
+</property>
+
+<property>
+  <name>javaSort.smallJobs.numOfMapoutputCompressed</name>
+  <value>10</value>
+  <description> </description>
+</property>
+
+<property>
+  <name>javaSort.smallJobs.numOfOutputCompressed</name>
+  <value>3</value>
+  <description> </description>
+</property>
+
+
+<property>
+  <name>javaSort.mediumJobs.numOfJobs</name>
+  <value>4,2</value>
+  <description></description>
+</property>
+<property>
+  <name>javaSort.mediumJobs.inputFiles</name>
+  <value>${VARINFLTEXT}/{part-000*0,part-000*1,part-000*2}</value>
+  <description></description>
+</property>
+<property>
+  <name>javaSort.mediumJobs.numOfReduces</name>
+  <value>170,70</value>
+  <description></description>
+</property>
+
+<property>
+  <name>javaSort.mediumJobs.numOfMapoutputCompressed</name>
+  <value>6</value>
+  <description> </description>
+</property>
+
+<property>
+  <name>javaSort.mediumJobs.numOfOutputCompressed</name>
+  <value>4</value>
+  <description> </description>
+</property>
+
+<property>
+  <name>javaSort.largeJobs.numOfJobs</name>
+  <value>3</value>
+  <description></description>
+</property>
+<property>
+  <name>javaSort.largeJobs.inputFiles</name>
+  <value>${VARINFLTEXT}</value>
+  <description></description>
+</property>
+<property>
+  <name>javaSort.largeJobs.numOfReduces</name>
+  <value>370</value>
+  <description></description>
+</property>
+
+<property>
+  <name>javaSort.largeJobs.numOfMapoutputCompressed</name>
+  <value>3</value>
+  <description> </description>
+</property>
+
+<property>
+  <name>javaSort.largeJobs.numOfOutputCompressed</name>
+  <value>2</value>
+  <description> </description>
+</property>
+
+
+<property>
+  <name>combiner.smallJobs.numOfJobs</name>
+  <value>11,4</value>
+  <description></description>
+</property>
+<property>
+  <name>combiner.smallJobs.inputFiles</name>
+  <value>${VARINFLTEXT}/{part-00000,part-00001,part-00002}</value>
+  <description></description>
+</property>
+<property>
+  <name>combiner.smallJobs.numOfReduces</name>
+  <value>10,1</value>
+  <description></description>
+</property>
+
+<property>
+  <name>combiner.smallJobs.numOfMapoutputCompressed</name>
+  <value>15</value>
+  <description> </description>
+</property>
+
+<property>
+  <name>combiner.smallJobs.numOfOutputCompressed</name>
+  <value>0</value>
+  <description> </description>
+</property>
+
+
+<property>
+  <name>combiner.mediumJobs.numOfJobs</name>
+  <value>8</value>
+  <description></description>
+</property>
+<property>
+  <name>combiner.mediumJobs.inputFiles</name>
+  <value>${VARINFLTEXT}/{part-000*0,part-000*1,part-000*2}</value>
+  <description></description>
+</property>
+<property>
+  <name>combiner.mediumJobs.numOfReduces</name>
+  <value>100</value>
+  <description></description>
+</property>
+
+<property>
+  <name>combiner.mediumJobs.numOfMapoutputCompressed</name>
+  <value>8</value>
+  <description> </description>
+</property>
+
+<property>
+  <name>combiner.mediumJobs.numOfOutputCompressed</name>
+  <value>0</value>
+  <description> </description>
+</property>
+
+<property>
+  <name>combiner.largeJobs.numOfJobs</name>
+  <value>4</value>
+  <description></description>
+</property>
+<property>
+  <name>combiner.largeJobs.inputFiles</name>
+  <value>${VARINFLTEXT}</value>
+  <description></description>
+</property>
+<property>
+  <name>combiner.largeJobs.numOfReduces</name>
+  <value>360</value>
+  <description></description>
+</property>
+
+<property>
+  <name>combiner.largeJobs.numOfMapoutputCompressed</name>
+  <value>4</value>
+  <description> </description>
+</property>
+
+<property>
+  <name>combiner.largeJobs.numOfOutputCompressed</name>
+  <value>0</value>
+  <description> </description>
+</property>
+
+
+<property>
+  <name>monsterQuery.smallJobs.numOfJobs</name>
+  <value>7</value>
+  <description></description>
+</property>
+<property>
+  <name>monsterQuery.smallJobs.inputFiles</name>
+  <value>${FIXCOMPSEQ}/{part-00000,part-00001,part-00002}</value>
+  <description></description>
+</property>
+<property>
+  <name>monsterQuery.smallJobs.numOfReduces</name>
+  <value>5</value>
+  <description></description>
+</property>
+
+<property>
+  <name>monsterQuery.smallJobs.numOfMapoutputCompressed</name>
+  <value>7</value>
+  <description> </description>
+</property>
+
+<property>
+  <name>monsterQuery.smallJobs.numOfOutputCompressed</name>
+  <value>0</value>
+  <description> </description>
+</property>
+
+
+<property>
+  <name>monsterQuery.mediumJobs.numOfJobs</name>
+  <value>5</value>
+  <description></description>
+</property>
+<property>
+  <name>monsterQuery.mediumJobs.inputFiles</name>
+  <value>${FIXCOMPSEQ}/{part-000*0,part-000*1,part-000*2}</value>
+  <description></description>
+</property>
+<property>
+  <name>monsterQuery.mediumJobs.numOfReduces</name>
+  <value>100</value>
+  <description></description>
+</property>
+
+<property>
+  <name>monsterQuery.mediumJobs.numOfMapoutputCompressed</name>
+  <value>5</value>
+  <description> </description>
+</property>
+
+<property>
+  <name>monsterQuery.mediumJobs.numOfOutputCompressed</name>
+  <value>0</value>
+  <description> </description>
+</property>
+
+<property>
+  <name>monsterQuery.largeJobs.numOfJobs</name>
+  <value>3</value>
+  <description></description>
+</property>
+<property>
+  <name>monsterQuery.largeJobs.inputFiles</name>
+  <value>${FIXCOMPSEQ}</value>
+  <description></description>
+</property>
+<property>
+  <name>monsterQuery.largeJobs.numOfReduces</name>
+  <value>370</value>
+  <description></description>
+</property>
+
+<property>
+  <name>monsterQuery.largeJobs.numOfMapoutputCompressed</name>
+  <value>3</value>
+  <description> </description>
+</property>
+
+<property>
+  <name>monsterQuery.largeJobs.numOfOutputCompressed</name>
+  <value>0</value>
+  <description> </description>
+</property>
+
+
+<property>
+  <name>webdataScan.smallJobs.numOfJobs</name>
+  <value>24</value>
+  <description></description>
+</property>
+
+<property>
+  <name>webdataScan.smallJobs.inputFiles</name>
+  <value>${VARCOMPSEQ}/{part-00000,part-00001,part-00002}</value>
+  <description></description>
+</property>
+<property>
+  <name>webdataScan.smallJobs.numOfMapoutputCompressed</name>
+  <value>24</value>
+  <description> </description>
+</property>
+
+<property>
+  <name>webdataScan.smallJobs.numOfOutputCompressed</name>
+  <value>0</value>
+  <description> </description>
+</property>
+
+<property>
+  <name>webdataScan.mediumJobs.numOfJobs</name>
+  <value>12</value>
+  <description></description>
+</property>
+
+<property>
+  <name>webdataScan.mediumJobs.inputFiles</name>
+  <value>${VARCOMPSEQ}/{part-000*0,part-000*1,part-000*2}</value>
+  <description></description>
+</property>
+<property>
+  <name>webdataScan.mediumJobs.numOfMapoutputCompressed</name>
+  <value>12</value>
+  <description> </description>
+</property>
+<property>
+  <name>webdataScan.mediumJobs.numOfReduces</name>
+  <value>7</value>
+  <description></description>
+</property>
+
+<property>
+  <name>webdataScan.mediumJobs.numOfOutputCompressed</name>
+  <value>0</value>
+  <description> </description>
+</property>
+
+<property>
+  <name>webdataScan.largeJobs.numOfJobs</name>
+  <value>2</value>
+  <description></description>
+</property>
+
+<property>
+  <name>webdataScan.largeJobs.inputFiles</name>
+  <value>${VARCOMPSEQ}</value>
+  <description></description>
+</property>
+<property>
+  <name>webdataScan.largeJobs.numOfMapoutputCompressed</name>
+  <value>3</value>
+  <description> </description>
+</property>
+<property>
+  <name>webdataScan.largeJobs.numOfReduces</name>
+  <value>70</value>
+  <description></description>
+</property>
+
+<property>
+  <name>webdataScan.largeJobs.numOfOutputCompressed</name>
+  <value>3</value>
+  <description> </description>
+</property>
+
+
+<property>
+  <name>webdataSort.smallJobs.numOfJobs</name>
+  <value>7</value>
+  <description></description>
+</property>
+<property>
+  <name>webdataSort.smallJobs.inputFiles</name>
+  <value>${VARCOMPSEQ}/{part-00000,part-00001,part-00002}</value>
+  <description></description>
+</property>
+<property>
+  <name>webdataSort.smallJobs.numOfReduces</name>
+  <value>15</value>
+  <description></description>
+</property>
+
+<property>
+  <name>webdataSort.smallJobs.numOfMapoutputCompressed</name>
+  <value>7</value>
+  <description> </description>
+</property>
+
+<property>
+  <name>webdataSort.smallJobs.numOfOutputCompressed</name>
+  <value>7</value>
+  <description> </description>
+</property>
+
+
+<property>
+  <name>webdataSort.mediumJobs.numOfJobs</name>
+  <value>4</value>
+  <description></description>
+</property>
+<property>
+  <name>webdataSort.mediumJobs.inputFiles</name>
+  <value>${VARCOMPSEQ}/{part-000*0,part-000*1,part-000*2}</value>
+  <description></description>
+</property>
+<property>
+  <name>webdataSort.mediumJobs.numOfReduces</name>
+  <value>170</value>
+  <description></description>
+</property>
+
+<property>
+  <name>webdataSort.mediumJobs.numOfMapoutputCompressed</name>
+  <value>4</value>
+  <description> </description>
+</property>
+
+<property>
+  <name>webdataSort.mediumJobs.numOfOutputCompressed</name>
+  <value>4</value>
+  <description> </description>
+</property>
+
+<property>
+  <name>webdataSort.largeJobs.numOfJobs</name>
+  <value>1</value>
+  <description></description>
+</property>
+<property>
+  <name>webdataSort.largeJobs.inputFiles</name>
+  <value>${VARCOMPSEQ}</value>
+  <description></description>
+</property>
+<property>
+  <name>webdataSort.largeJobs.numOfReduces</name>
+  <value>800</value>
+  <description></description>
+</property>
+
+<property>
+  <name>webdataSort.largeJobs.numOfMapoutputCompressed</name>
+  <value>1</value>
+  <description> </description>
+</property>
+
+<property>
+  <name>webdataSort.largeJobs.numOfOutputCompressed</name>
+  <value>1</value>
+  <description> </description>
+</property>
+
+</configuration>

Added: hadoop/core/trunk/src/benchmarks/gridmix2/rungridmix_2
URL: http://svn.apache.org/viewvc/hadoop/core/trunk/src/benchmarks/gridmix2/rungridmix_2?rev=722391&view=auto
==============================================================================
--- hadoop/core/trunk/src/benchmarks/gridmix2/rungridmix_2 (added)
+++ hadoop/core/trunk/src/benchmarks/gridmix2/rungridmix_2 Mon Dec  1 23:03:09 2008
@@ -0,0 +1,37 @@
+#!/usr/bin/env bash
+
+##############################################################
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+#####################################################################
+
+## Environment configuration
+
+GRID_DIR=`dirname "$0"`
+GRID_DIR=`cd "$GRID_DIR"; pwd`
+source $GRID_DIR/gridmix-env-2
+
+Date=`date +%F-%H-%M-%S-%N`
+echo $Date >  $1_start.out
+
+export HADOOP_CLASSPATH=${APP_JAR}:${EXAMPLE_JAR}:${STREAMING_JAR}
+export LIBJARS=${APP_JAR},${EXAMPLE_JAR},${STREAMING_JAR}
+${HADOOP_HOME}/bin/hadoop jar  -libjars ${LIBJARS} ./gridmix.jar org.apache.hadoop.mapred.GridMixRunner
+
+Date=`date +%F-%H-%M-%S-%N`
+echo $Date >  $1_end.out
+

Added: hadoop/core/trunk/src/benchmarks/gridmix2/src/java/org/apache/hadoop/mapred/CombinerJobCreator.java
URL: http://svn.apache.org/viewvc/hadoop/core/trunk/src/benchmarks/gridmix2/src/java/org/apache/hadoop/mapred/CombinerJobCreator.java?rev=722391&view=auto
==============================================================================
--- hadoop/core/trunk/src/benchmarks/gridmix2/src/java/org/apache/hadoop/mapred/CombinerJobCreator.java
(added)
+++ hadoop/core/trunk/src/benchmarks/gridmix2/src/java/org/apache/hadoop/mapred/CombinerJobCreator.java
Mon Dec  1 23:03:09 2008
@@ -0,0 +1,70 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.mapred;
+
+import org.apache.hadoop.examples.WordCount;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.io.IntWritable;
+import org.apache.hadoop.io.Text;
+
+public class CombinerJobCreator extends WordCount {
+
+  public JobConf createJob(String[] args) throws Exception {
+    JobConf conf = new JobConf(WordCount.class);
+    conf.setJobName("GridmixCombinerJob");
+
+    // the keys are words (strings)
+    conf.setOutputKeyClass(Text.class);
+    // the values are counts (ints)
+    conf.setOutputValueClass(IntWritable.class);
+
+    conf.setMapperClass(MapClass.class);
+    conf.setCombinerClass(Reduce.class);
+    conf.setReducerClass(Reduce.class);
+    boolean mapoutputCompressed = false;
+    boolean outputCompressed = false;
+    // List<String> other_args = new ArrayList<String>();
+    for (int i = 0; i < args.length; ++i) {
+      try {
+        if ("-r".equals(args[i])) {
+          conf.setNumReduceTasks(Integer.parseInt(args[++i]));
+        } else if ("-indir".equals(args[i])) {
+          FileInputFormat.setInputPaths(conf, args[++i]);
+        } else if ("-outdir".equals(args[i])) {
+          FileOutputFormat.setOutputPath(conf, new Path(args[++i]));
+
+        } else if ("-mapoutputCompressed".equals(args[i])) {
+          mapoutputCompressed = Boolean.valueOf(args[++i]).booleanValue();
+        } else if ("-outputCompressed".equals(args[i])) {
+          outputCompressed = Boolean.valueOf(args[++i]).booleanValue();
+        }
+      } catch (NumberFormatException except) {
+        System.out.println("ERROR: Integer expected instead of " + args[i]);
+        return null;
+      } catch (ArrayIndexOutOfBoundsException except) {
+        System.out.println("ERROR: Required parameter missing from "
+            + args[i - 1]);
+        return null;
+      }
+    }
+    conf.setCompressMapOutput(mapoutputCompressed);
+    conf.setBoolean("mapred.output.compress", outputCompressed);
+    return conf;
+  }
+}

Added: hadoop/core/trunk/src/benchmarks/gridmix2/src/java/org/apache/hadoop/mapred/GenericMRLoadJobCreator.java
URL: http://svn.apache.org/viewvc/hadoop/core/trunk/src/benchmarks/gridmix2/src/java/org/apache/hadoop/mapred/GenericMRLoadJobCreator.java?rev=722391&view=auto
==============================================================================
--- hadoop/core/trunk/src/benchmarks/gridmix2/src/java/org/apache/hadoop/mapred/GenericMRLoadJobCreator.java
(added)
+++ hadoop/core/trunk/src/benchmarks/gridmix2/src/java/org/apache/hadoop/mapred/GenericMRLoadJobCreator.java
Mon Dec  1 23:03:09 2008
@@ -0,0 +1,98 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.mapred;
+
+import java.util.Random;
+import java.util.Stack;
+
+import org.apache.hadoop.fs.FileStatus;
+import org.apache.hadoop.fs.FileSystem;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.io.LongWritable;
+import org.apache.hadoop.io.SequenceFile;
+import org.apache.hadoop.io.Text;
+import org.apache.hadoop.mapred.GenericMRLoadGenerator;
+import org.apache.hadoop.mapred.lib.NullOutputFormat;
+import org.apache.hadoop.mapred.JobConf;
+
+public class GenericMRLoadJobCreator extends GenericMRLoadGenerator {
+
+  public JobConf createJob(String[] argv, boolean mapoutputCompressed,
+      boolean outputCompressed) throws Exception {
+
+    JobConf job = new JobConf();
+    job.setJarByClass(GenericMRLoadGenerator.class);
+    job.setMapperClass(SampleMapper.class);
+    job.setReducerClass(SampleReducer.class);
+    if (!parseArgs(argv, job)) {
+      return null;
+    }
+
+    if (null == FileOutputFormat.getOutputPath(job)) {
+      // No output dir? No writes
+      job.setOutputFormat(NullOutputFormat.class);
+    }
+
+    if (0 == FileInputFormat.getInputPaths(job).length) {
+      // No input dir? Generate random data
+      System.err.println("No input path; ignoring InputFormat");
+      confRandom(job);
+    } else if (null != job.getClass("mapred.indirect.input.format", null)) {
+      // specified IndirectInputFormat? Build src list
+      JobClient jClient = new JobClient(job);
+      Path sysdir = jClient.getSystemDir();
+      Random r = new Random();
+      Path indirInputFile = new Path(sysdir, Integer.toString(r
+          .nextInt(Integer.MAX_VALUE), 36)
+          + "_files");
+      job.set("mapred.indirect.input.file", indirInputFile.toString());
+      SequenceFile.Writer writer = SequenceFile.createWriter(sysdir
+          .getFileSystem(job), job, indirInputFile, LongWritable.class,
+          Text.class, SequenceFile.CompressionType.NONE);
+      try {
+        for (Path p : FileInputFormat.getInputPaths(job)) {
+          FileSystem fs = p.getFileSystem(job);
+          Stack<Path> pathstack = new Stack<Path>();
+          pathstack.push(p);
+          while (!pathstack.empty()) {
+            for (FileStatus stat : fs.listStatus(pathstack.pop())) {
+              if (stat.isDir()) {
+                if (!stat.getPath().getName().startsWith("_")) {
+                  pathstack.push(stat.getPath());
+                }
+              } else {
+                writer.sync();
+                writer.append(new LongWritable(stat.getLen()), new Text(stat
+                    .getPath().toUri().toString()));
+              }
+            }
+          }
+        }
+      } finally {
+        writer.close();
+      }
+    }
+
+    job.setCompressMapOutput(mapoutputCompressed);
+    job.setBoolean("mapred.output.compress", outputCompressed);
+    return job;
+
+  }
+
+}

Added: hadoop/core/trunk/src/benchmarks/gridmix2/src/java/org/apache/hadoop/mapred/GridMixConfig.java
URL: http://svn.apache.org/viewvc/hadoop/core/trunk/src/benchmarks/gridmix2/src/java/org/apache/hadoop/mapred/GridMixConfig.java?rev=722391&view=auto
==============================================================================
--- hadoop/core/trunk/src/benchmarks/gridmix2/src/java/org/apache/hadoop/mapred/GridMixConfig.java
(added)
+++ hadoop/core/trunk/src/benchmarks/gridmix2/src/java/org/apache/hadoop/mapred/GridMixConfig.java
Mon Dec  1 23:03:09 2008
@@ -0,0 +1,34 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.mapred;
+
+import org.apache.hadoop.conf.Configuration;
+
+public class GridMixConfig extends Configuration {
+
+  public int[] getInts(String name, int defautValue) {
+    String[] valuesInString = getStrings(name, String.valueOf(defautValue));
+    int[] results = new int[valuesInString.length];
+    for (int i = 0; i < valuesInString.length; i++) {
+      results[i] = Integer.parseInt(valuesInString[i]);
+    }
+    return results;
+
+  }
+}



Mime
View raw message