hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Cui tony <tony.cui1...@gmail.com>
Subject Re: question on shuffle and sort
Date Wed, 31 Mar 2010 02:24:18 GMT
Consider this extreme situation:
The input data is very large, and also the map result. 90% of map result
have the same key, then all of them will be sent to one reducer tasknode.
So 90% of work of reduce phase have to been done on a single node, not the
cluster. That is very ineffective and less scalable.


2010/3/31 Jones, Nick <nick.jones@amd.com>

> Something to keep in mind though, sorting is appropriate to the key type.
> Text will be sorted lexicographically.
>
> Nick Jones
>
>
> ----- Original Message -----
> From: Ed Mazur <mazur@cs.umass.edu>
> To: common-user@hadoop.apache.org <common-user@hadoop.apache.org>
> Sent: Tue Mar 30 21:07:29 2010
> Subject: Re: question on shuffle and sort
>
> On Tue, Mar 30, 2010 at 9:56 PM, Cui tony wrote:
> >  Did all key-value pairs of the map output, which have the same key, will
> > be sent to the same reducer tasknode?
>
> Yes, this is at the core of the MapReduce model. There is one call to
> the user reduce function per unique map output key. This grouping is
> achieved by sorting which means you see keys in increasing order.
>
> Ed
>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message