hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Arun C Murthy (JIRA)" <j...@apache.org>
Subject [jira] Created: (HADOOP-1827) Reducer.reduce method's OutputCollector is too strict, it shoudn't need the key to be WritableComparable
Date Sat, 01 Sep 2007 19:23:19 GMT
Reducer.reduce method's OutputCollector is too strict, it shoudn't need the key to be WritableComparable
--------------------------------------------------------------------------------------------------------

                 Key: HADOOP-1827
                 URL: https://issues.apache.org/jira/browse/HADOOP-1827
             Project: Hadoop
          Issue Type: Bug
          Components: mapred
    Affects Versions: 0.14.0
            Reporter: Arun C Murthy


The output of the {{Reducer}}'s reduce method is *not* sorted, hence the {{OutputCollector}}
passed to it shouldn't require the *key* to be {{WritableComparable}}; passing a {{Writable}}
should suffice.

Thus

{code: title=Reducer.java}
public interface Reducer<K2 extends WritableComparable, V2 extends Writable, 
                         K3 extends WritableComparable, V3 extends Writable> 
extends JobConfigurable, Closeable {

  void reduce(K2 key, Iterator<V2> values, OutputCollector<K3, V3> output, Reporter
reporter) 
  throws IOException;

}
{code}

should, technically, be:

{code: title=Reducer.java}
public interface Reducer<K2 extends WritableComparable, V2 extends Writable, 
                         K3 extends Writable, V3 extends Writable> 
extends JobConfigurable, Closeable {

  void reduce(K2 key, Iterator<V2> values, OutputCollector<K3, V3> output, Reporter
reporter) 
  throws IOException;

}
{code}



Pros:
It removes an artificial limitation where it forces applications to emit <{{WritableComparable}},
{{Writable}}> pair, rather than a <{{Writable}}, {{Writable}}> pair, there-by easing
some applications (I ran into a few recently... admittedly trivial ones).

Cons:
1. We now need a separate {{Combiner}} interface, since the combiner's {{OutputCollector}}
*needs* to be able to sort keys, hence requires a {{WritableComparable}} - same as the {{Mapper}}.
2. We need a separate {{SortableOutputCollector}} (for {{Mapper}}/{{Combiner}}) and a {{NonSortableOutputCollector}}
(for {{Reducer}}).
3. Alas! As a consequence of (1) & (2)we cannot use the same class as both a {{Reducer}}
and {{Combiner}} anymore, a serious compatibility issue.



The purpose of this issue is two-fold:
1. Spark a discussion among folks, both hadoop-dev & hadoop-users, to figure if this really
is a problem i.e. do folks really care about this anomaly in the existing {{Reducer}} interface?
Also, is it worth the pain (@see 'Cons') to go fix it.
2. Even if we decide to live with it, this issue could record for posterity why we love hadoop,
warts and all. *smile*

Lets discuss...


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message