hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Amogh Vasekar <am...@yahoo-inc.com>
Subject Re: Getting zero length files on the reduce output.
Date Thu, 03 Jun 2010 04:45:25 GMT
The default partitioner is - hashcode(key) MODULO number_of_reducers, so its pretty much possible.

>>Can I change this hash function in anyway?
Sure, any custom partitioner can be plugged in. Check o.a.h.mapreduce.partition or the secondary
sort example on mapred tutorial for more.

On a side note, if you don't want the zero output files to come up, use lazyoutputformat instead.


On 6/3/10 1:22 AM, "Raymond Jennings III" <raymondjiii@yahoo.com> wrote:

I have a cluster of 12 slave nodes.  I see that for some jobs the part-r-00000 type files,
half of them are zero in size after the job completes.  Does this mean the hash function that
splits the data to each reducer node is not working all that well?  On other jobs it's pretty
much even across all reducers but on certain jobs only half of the reducers have files bigger
than 0.  It is reproducible though.  Can I change this hash function in anyway?  Thanks.

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message