hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gyanit <gya...@gmail.com>
Subject Why is large number of [(heavy) keys , (light) value] faster than (light)key , (heavy) value
Date Wed, 11 Mar 2009 02:44:17 GMT

I have large number of key,value pairs. I don't actually care if data goes in
value or key. Let me be more exact. 
(k,v) pair after combiner is about 1 mil. I have approx 1kb data for each
pair. I can put it in keys or values.
I have experimented with both options (heavy key , light value)  vs (light
key, heavy value). It turns out that hk,lv option is much much better than
(lk,hv). 
Has someone else also noticed this?
Is there a way to make things faster in light key , heavy value option. As
some application will need that also. 
Remember in both cases we are talking about atleast dozen or so million
pairs.
There is a difference of time in shuffle phase. Which is weird as amount of
data transferred is same.

-gyanit
-- 
View this message in context: http://www.nabble.com/Why-is-large-number-of---%28heavy%29-keys-%2C-%28light%29-value--faster-than-%28light%29key-%2C-%28heavy%29-value-tp22447877p22447877.html
Sent from the Hadoop core-user mailing list archive at Nabble.com.


Mime
View raw message