hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Kelly Burkhart <kelly.burkh...@gmail.com>
Subject Map reduce streaming unable to partition
Date Thu, 10 Feb 2011 17:45:37 GMT
Hi,

I'm trying to get partitioning working from a streaming map/reduce
job.  I'm using hadoop r0.20.2.

Consider the following files, both in the same hdfs directory:

f1:
01:01:01<TAB>a,a,a,a,a,1
01:01:02<TAB>a,a,a,a,a,2
01:02:01<TAB>a,a,a,a,a,3
01:02:02<TAB>a,a,a,a,a,4
02:01:01<TAB>a,a,a,a,a,5
02:01:02<TAB>a,a,a,a,a,6
02:02:01<TAB>a,a,a,a,a,7
02:02:02<TAB>a,a,a,a,a,8

f2:
01:01:01<TAB>b,b,b,b,b,1
01:01:02<TAB>b,b,b,b,b,2
01:02:01<TAB>b,b,b,b,b,3
01:02:02<TAB>b,b,b,b,b,4
02:01:01<TAB>b,b,b,b,b,5
02:01:02<TAB>b,b,b,b,b,6
02:02:01<TAB>b,b,b,b,b,7
02:02:02<TAB>b,b,b,b,b,8

I execute the following command:

hadoop jar /opt/hadoop-0.20.2/contrib/streaming/hadoop-0.20.2-streaming.jar \
  -D stream.map.output.field.separator=: \
  -D stream.num.map.output.key.fields=3 \
  -D map.output.key.field.separator=: \
  -D mapred.text.key.partitioner.options=-k1,1 \
  -input /tmp/krb/part \
  -output /tmp/krb/mp \
  -mapper /bin/cat \
  -reducer /bin/cat \
  -partitioner org.apache.hadoop.mapred.lib.KeyFieldBasedPartitioner

(actually I've executed about a zillion permutations of various -D arguments...)

I end up with a single file sorted by the entire key, exactly what I
expect if no partitioning at all is going on.  What I'm hoping to end
up with is two output files, each file has the first component of the
key in common:

01:01:01<TAB>a,a,a,a,a,1
01:01:01<TAB>b,b,b,b,b,1
01:01:02<TAB>a,a,a,a,a,2
01:01:02<TAB>b,b,b,b,b,2
01:02:01<TAB>a,a,a,a,a,3
01:02:01<TAB>b,b,b,b,b,3
01:02:02<TAB>a,a,a,a,a,4
01:02:02<TAB>b,b,b,b,b,4

Can anyone suggest a command that may partition files as I describe?

Also, it seems that the API has changed considerably from my version
0.20.x to the latest version r0.21.  Is 0.20 expected to work?  Or are
there some fatal issues that forced major work resulting in release
0.21.

Thanks,

-Kelly

Mime
View raw message