hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Johannes.Lichtenberger" <Johannes.Lichtenber...@uni-konstanz.de>
Subject Re: Sorting/Grouping
Date Tue, 26 Oct 2010 13:07:18 GMT
On 10/26/2010 01:07 PM, Paweł Łoziński wrote:
> 2010/10/26 Johannes.Lichtenberger <Johannes.Lichtenberger@uni-konstanz.de>:
>> On 10/26/2010 07:39 AM, Paweł Łoziński wrote:
>>> Hi,
>>>
>>> the framework doesn't give you the first/last information about reduce
>>> job you perform in your reducer. Just as the mapper doesn't give you
>>> information whether the (key, value) pair passed to map function is
>>> first/last for a given key. However you can workaround this by adding
>>> special values to your data, e.g. <page><id>0</id>... and
>>> <page><id>Long.MAX_VALUE</id>.... When you encounter those
in your
>>> reducer, you know you are at the beginning/end of your data and you
>>> can emit <root> and </root>.
>>
>> This wouldn't work, since it might as well be possible that the last
>> value isn't Long.MAX_VALUE.
>>
> 
> The idea is to choose such a special-value, that the last value in
> your data will be definitely smaller. In case of 64bit numerical
> values this would be Long.MAX_VALUE, generally speaking - the last
> value in the possible range of values (or better: the last value +1).
> Then you can be sure the reducer will process it as the last value,
> and emit </root> to your output. Of course, if you have multiple
> reducers, the closing tag will appear only in the output of one of
> them.

Hm, that's valuable, but I think I leave the ID's unchanged and add the
tags afterwards, even if it needs two more I/O operations (read/write),
but that carries no weight.

regards,
johannes

Mime
View raw message