hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From John Lilley <john.lil...@redpoint.net>
Subject RE: Why/When partitioner is used.
Date Fri, 07 Jun 2013 14:03:56 GMT
There are kind of two parts to this.  The semantics of MapReduce promise that all tuples sharing
the same key value are sent to the same reducer, so that you can write useful MR applications
that do things like “count words” or “summarize by date”.  In order to accomplish
that, the shuffle phase of MR performs a partitioning by key to move tuples sharing the same
key to the same node where they can be processed together.  You can think of key-partitioning
as a strategy that assists in parallel distributed sorting.

From: Sai Sai [mailto:saigraph@yahoo.in]
Sent: Friday, June 07, 2013 5:17 AM
To: user@hadoop.apache.org
Subject: Re: Why/When partitioner is used.

I always get confused why we should partition and what is the use of it.
Why would one want to send all the keys starting with A to Reducer1 and B to R2 and so on...
Is it just to parallelize the reduce process.
Please help.
View raw message