hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tim Wintle <tim.win...@teamrubber.com>
Subject Re: Why is large number of [(heavy) keys , (light) value] faster than (light)key , (heavy) value
Date Wed, 11 Mar 2009 20:25:45 GMT
On Tue, 2009-03-10 at 19:44 -0700, Gyanit wrote:
> I have large number of key,value pairs. I don't actually care if data goes in
> value or key. Let me be more exact. 
> (k,v) pair after combiner is about 1 mil. I have approx 1kb data for each
> pair. I can put it in keys or values.
> I have experimented with both options (heavy key , light value)  vs (light
> key, heavy value). It turns out that hk,lv option is much much better than
> (lk,hv). 
<snip>
> There is a difference of time in shuffle phase. Which is weird as amount of
> data transferred is same.

just an idea, but is this related to the hash function? are there the
same number of reducers no matter which you do?

As I understand it the reducers merge-sort the data while the shuffle is
happening, so if each reducer has to sort less data this could be part
of it.




Mime
View raw message