hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jingguo Yao (JIRA)" <j...@apache.org>
Subject [jira] Updated: (MAPREDUCE-2148) More precise documentation for setOutputValueGroupingComparator
Date Fri, 22 Oct 2010 14:02:17 GMT

     [ https://issues.apache.org/jira/browse/MAPREDUCE-2148?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Jingguo Yao updated MAPREDUCE-2148:
-----------------------------------

    Description: 
The Javadoc of JobConf#setOutputValueGroupingComparator method explains the usage of a comparator
for grouping keys. org.apache.hadoop.examples.SecondarySort uses such a comparator. In SecondarySort,
all the 2 parts of IntPair is used for key sorting. The first part of IntPair is used for
partition and grouping. When the first parts of several IntPairs are equal to each other,
it is very possible that these IntPairs are not equal to each other. These IntPairs will be
grouped in a single invocation of reduce method since group comparator only use the first
part of IntPairs. However, reduce method only accepts a single key object. In such kind of
situations, the first IntPair is used as the key in reduce method.  I have checked the source
code of Task.ValuesIterator whose logic is consistent with the above behaviour.

I think that such behavior of grouping comparator should be documented in JobConf#setOutputValueGroupingComparator.

I am happy to provide  a patch if some committer think that this is an issue.

  was:
The Javadoc of JobConf#setOutputValueGroupingComparator method explains the usage of a comparator
for grouping keys. org.apache.hadoop.examples.SecondarySort uses such a comparator. In SecondarySort,
all the 2 parts of IntPair is used for key sorting. The first part of IntPair is used for
partition and grouping. When the first parts of several IntPairs are equal to each other,
it is very possible that these IntPairs are not equal to each other. These IntPairs will be
grouped in a single invocation of reduce method since group comparator only use the first
part of IntPairs. However, reduce method only accepts a single key object. In such kind of
situations, the first IntPair is used as the key in reduce method.  I have checked the source
code of Task.ValuesIterator whose logic is consistent with the above behaviour.

I think that if such behavior of grouping comparator should be documented in JobConf#setOutputValueGroupingComparator.

I am happy to provide  a patch if some committer think that this is an issue.


> More precise documentation for setOutputValueGroupingComparator
> ---------------------------------------------------------------
>
>                 Key: MAPREDUCE-2148
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2148
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: task
>    Affects Versions: 0.20.2
>            Reporter: Jingguo Yao
>   Original Estimate: 0.5h
>  Remaining Estimate: 0.5h
>
> The Javadoc of JobConf#setOutputValueGroupingComparator method explains the usage of
a comparator for grouping keys. org.apache.hadoop.examples.SecondarySort uses such a comparator.
In SecondarySort, all the 2 parts of IntPair is used for key sorting. The first part of IntPair
is used for partition and grouping. When the first parts of several IntPairs are equal to
each other, it is very possible that these IntPairs are not equal to each other. These IntPairs
will be grouped in a single invocation of reduce method since group comparator only use the
first part of IntPairs. However, reduce method only accepts a single key object. In such kind
of situations, the first IntPair is used as the key in reduce method.  I have checked the
source code of Task.ValuesIterator whose logic is consistent with the above behaviour.
> I think that such behavior of grouping comparator should be documented in JobConf#setOutputValueGroupingComparator.
> I am happy to provide  a patch if some committer think that this is an issue.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message