hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Owen O'Malley <omal...@apache.org>
Subject Re: What's a valid combiner?
Date Sat, 12 Sep 2009 07:57:03 GMT
On Sep 11, 2009, at 11:11 PM, Harish Mallipeddi wrote:

> 1) A combiner does not change the key.

Please file a jira asking for this to be documented. You are right,  
this is a requirement on combiners and it is not currently stated.

> 2) Since the combiner can be run multiple times or not be run at  
> all, it
> directly implies that the operation that is used to combine multiple  
> values
> into one value should be "associative & commutative"?

Clearly that is the intent. However "associative and commutative" only  
have meaning for us math geeks. In non-math terms, telling the user  
that it may be run an indeterminate number of times implies that it  
must have those properties in order to do anything reasonable.

> Why isn't there a separate Combiner interface? If not, maybe some of  
> these
> other gotchas should be documented somewhere?

Such a class wouldn't be very interesting:

class Combiner extends Reducer {

isn't very interesting. Furthermore, it is often the case that the  
Combiner is the same as the Reducer....

I did toy with the idea of making an attribute such as @SideEffectFree  
that could mark classes that are acceptable as combiners. There is an  
old jira about this somewhere...

> I ran a simple experiment - I took the wordcount example and just  
> modified
> the Reducer such that it outputs the string key "reversed" and then  
> used
> this reducer as the combiner. It led to some funny results.

No doubt.

-- Owen

View raw message