hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Dunning <ted.dunn...@gmail.com>
Subject Re: Need help understanding the source
Date Mon, 06 Jul 2009 18:42:09 GMT
I would consider this to be a very delicate optimization with little utility
in the real world.  It is very, very rare to reliably know how many records
the reducer will see.  Getting this wrong would be a disaster.  Getting it
right would be very difficult in almost all cases.

Moreover, this assumption is baked all through the map-reduce design and
thus doing a change to allow reduce to go ahead is likely to be really
tricky (not that I know this for a fact).

On Mon, Jul 6, 2009 at 11:14 AM, Naresh Rapolu <nareshreddy.rapolu@gmail.com
> wrote:

> My aim is to make the reduce move ahead with reduction as and when it gets
> the data required, instead of waiting for all the maps to complete.  If it
> knows how many records it needs and compares it with number of records it
> has got until now,  it can move on once they become equal without waiting
> for all the maps to finish.

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message