hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Owen O'Malley (JIRA)" <j...@apache.org>
Subject [jira] Created: (HADOOP-485) allow a different comparator for grouping keys in calls to reduce
Date Sat, 26 Aug 2006 07:52:22 GMT
allow a different comparator for grouping keys in calls to reduce
-----------------------------------------------------------------

                 Key: HADOOP-485
                 URL: http://issues.apache.org/jira/browse/HADOOP-485
             Project: Hadoop
          Issue Type: New Feature
    Affects Versions: 0.5.0
            Reporter: Owen O'Malley
         Assigned To: Owen O'Malley


Some algorithms require that the values to the reduce be sorted in a particular order, but
extending the key with the additional fields causes  them to be handled by different calls
to reduce. (The user then collects the values until they detect a "real" key change and then
processes them.)

It would be much easier if the framework let you define a second comparator that did the grouping
of values for reduces. So your reduce inputs look like:

A1, V1
A2, V2
A3, V3
B1, V4
B2, V5

instead of getting calls to reduce that look like:

reduce(A1, {V1}); reduce(A2, {V2}); reduce(A3, {V3}); reduce(B1, {V4}); reduce(B2, {V5});

you could define the grouping comparator to just compare the letters and end up with:

reduce(A1, {V1,V2,V3}); reduce(B1, {V4,V5});

which is the desired outcome. Note that this assumes that the "extra" part of the key is just
for sorting because the reduce will only see the first representative of each equivalence
class.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message