hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Owen O'Malley" <omal...@apache.org>
Subject Re: Is the sort(in sort and shuffle) always required
Date Sat, 19 Jun 2010 20:28:20 GMT
On Sat, Jun 19, 2010 at 9:16 AM, Saptarshi Guha
<saptarshi.guha@gmail.com> wrote:
> My question: is the sort (in the sort and shuffle) absolutely required?
> If I wanted mapreduce to partition (using the map) and then aggregate(using
> reduce) without a need for the keys to be sorted
> is it possible to turn of the sorting? Or is the fact that keys come to the
> reducer in sorted order just a side effect of sorting and that
> the sorting is vital for the efficient operation of MapReduce?

If you have 0 reduces, you don't get any sorting or aggregation. It
isn't possible to turn off the sorting and leaving the aggregation. In
practice, the sort doesn't cost as much as the data transfer between
the map and reduce.

-- Owen

Mime
View raw message