hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gyanit <gya...@gmail.com>
Subject Re: Why is large number of [(heavy) keys , (light) value] faster than (light)key , (heavy) value
Date Wed, 11 Mar 2009 19:51:05 GMT

I notices one more thing. Lighter keys tend to make smaller number of unique
keys.
For example (key,value) pairs may be 10Mil, but if key is lighter unique
keys might be just 1000.
In other case if keys are heavier unique keys might be 5 mil.
I think this might have something to do with it. 
Bottom line: If your reduce is simple dump and no combining, the put data in
keys than values.

I need to put data in values. Any suggestions on how to make it faster.

-Gyanit.


Scott Carey wrote:
> 
> That is a fascinating question.  I would also love to know the reason
> behind this.
> 
> If I were to guess I would have thought that smaller keys and heavier
> values would slightly outperform, rather than significantly underperform. 
> (assuming total pair count at each phase is the same).   Perhaps there is
> room for optimization here?
> 
> 
> 
> On 3/10/09 6:44 PM, "Gyanit" <gyanit@gmail.com> wrote:
> 
> 
> 
> I have large number of key,value pairs. I don't actually care if data goes
> in
> value or key. Let me be more exact.
> (k,v) pair after combiner is about 1 mil. I have approx 1kb data for each
> pair. I can put it in keys or values.
> I have experimented with both options (heavy key , light value)  vs (light
> key, heavy value). It turns out that hk,lv option is much much better than
> (lk,hv).
> Has someone else also noticed this?
> Is there a way to make things faster in light key , heavy value option. As
> some application will need that also.
> Remember in both cases we are talking about atleast dozen or so million
> pairs.
> There is a difference of time in shuffle phase. Which is weird as amount
> of
> data transferred is same.
> 
> -gyanit
> --
> View this message in context:
> http://www.nabble.com/Why-is-large-number-of---%28heavy%29-keys-%2C-%28light%29-value--faster-than-%28light%29key-%2C-%28heavy%29-value-tp22447877p22447877.html
> Sent from the Hadoop core-user mailing list archive at Nabble.com.
> 
> 
> 
> 

-- 
View this message in context: http://www.nabble.com/Why-is-large-number-of---%28heavy%29-keys-%2C-%28light%29-value--faster-than-%28light%29key-%2C-%28heavy%29-value-tp22447877p22463050.html
Sent from the Hadoop core-user mailing list archive at Nabble.com.


Mime
View raw message