hadoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Aleksandr Elbakyan <ramal...@yahoo.com>
Subject Issue with partitioning using streaming
Date Tue, 15 Jan 2013 00:23:57 GMT
Hello All,

I am trying to partition data and sort it in hadoop streaming. 


Most of the time the data is sorted and partitioned correctly but if I run multiple times
sometimes data goes to other partition 




The data looks like

asdas 0 ada
asdas 1 asd
12123 1 ccc
12123 0 xxx



  hadoop  jar ${HADOOP_HOME}/contrib/streaming/hadoop-*streaming.jar \
        -D mapred.task.timeout=3600000 \
        -D mapred.map.tasks=${GD_NUM_MAP_TASKS}  \
        -D mapred.reduce.tasks=${GD_NUM_REDUCE_TASKS} \
        -D stream.non.zero.exit.is.failure=true \
        -D stream.num.map.output.key.fields=2 \
        -D mapred.text.key.partitioner.options="-k1,1" \
        -D mapred.output.key.comparator.class=org.apache.hadoop.mapred.lib.KeyFieldBasedComparator
\
        -D mapred.text.key.comparator.options=-k1,2n \
        -partitioner org.apache.hadoop.mapred.lib.KeyFieldBasedPartitioner \
        -input input \
        -output output \
        -mapper  "  cat" \
        -reducer " cat" \
        -verbose


in reducer code I have some logic which depend on correct partitioning and sorting.


Regards.


Mime
View raw message