hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Owen O'Malley" <omal...@apache.org>
Subject Re: the same key in different reducers
Date Thu, 10 Jun 2010 02:30:38 GMT
On Wed, Jun 9, 2010 at 3:15 PM, Alex Kozlov <alexvk@cloudera.com> wrote:
> So I assume it is entirely possible to write a partitioner that distributes
> the same key to multiple reducers and it does not have to be
> non-deterministic.  It can assign the partition based on the value.
> Is this correct?

Yes. I've never liked the fact that Partitioners get the value for
exactly that reason. It was originally put in for some obscure corner
case in Nutch. Fixing it now would be difficult.

Also note that "non-deterministic" doesn't imply using Random. You
could just fail to overload the hashcode method and take the default
from Object. That would cause you to hash based on the object's
address, which is different for each jvm.

-- Owen

View raw message