hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From anil gupta <anilgupt...@gmail.com>
Subject Re: Broad question on sorting of mapper outputs.
Date Wed, 24 Oct 2012 22:01:35 GMT
Hi Jay,

AFAIK, when the MR does not have a reducer phase(i.e. no. of reducer=0)
then the output from Mapper is not sorted.


On Fri, Oct 19, 2012 at 8:19 PM, Jay Vyas <jayunit100@gmail.com> wrote:

> IS there any documentation on the internals of the shuffle and sort phase?
> The elephant book seems to be the best source, but it appears to only
> lightly touch upon the "magic" part (i.e. the distributed merge sorting and
> mapper spilling).
> Also... What is the rationale behind the sortedness of mapper outputs?  Is
> the reason to optimize the streaming of mapper values to reducers?  In
> simple scenarios, i.e. when there is no reducing to be done, it seems that
> we may not care to have sorted mapper outputs : a random merge of all
> spilled records would be sufficient.
> I've noticed that the Shuffle and Sort classes in hadoop have almost no
> comments and appear to simply wrap other classes.
> --
> Jay Vyas
> http://jayunit100.blogspot.com

Thanks & Regards,
Anil Gupta

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message