hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paweł Łoziński <pawel.lozin...@gmail.com>
Subject Re: Sorting/Grouping
Date Tue, 26 Oct 2010 11:07:16 GMT
2010/10/26 Johannes.Lichtenberger <Johannes.Lichtenberger@uni-konstanz.de>:
> On 10/26/2010 07:39 AM, Paweł Łoziński wrote:
>> Hi,
>>
>> the framework doesn't give you the first/last information about reduce
>> job you perform in your reducer. Just as the mapper doesn't give you
>> information whether the (key, value) pair passed to map function is
>> first/last for a given key. However you can workaround this by adding
>> special values to your data, e.g. <page><id>0</id>... and
>> <page><id>Long.MAX_VALUE</id>.... When you encounter those in your
>> reducer, you know you are at the beginning/end of your data and you
>> can emit <root> and </root>.
>
> This wouldn't work, since it might as well be possible that the last
> value isn't Long.MAX_VALUE.
>

The idea is to choose such a special-value, that the last value in
your data will be definitely smaller. In case of 64bit numerical
values this would be Long.MAX_VALUE, generally speaking - the last
value in the possible range of values (or better: the last value +1).
Then you can be sure the reducer will process it as the last value,
and emit </root> to your output. Of course, if you have multiple
reducers, the closing tag will appear only in the output of one of
them.

Mime
View raw message