hadoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dave Beech <dbe...@apache.org>
Subject Re: GroupingComparator
Date Mon, 15 Oct 2012 20:49:23 GMT
Well, if all you need is the tag (the 1 or 2), why not just use a Text
or IntWritable instance variable. You wouldn't need to clone the whole
key.

Then, instead of tag = key.getSecondField() you'd say
tag.set(key.getSecondField().get());
I don't know what type of object tag is (if it's Text you'll say
toString() rather than get()), but you see what I mean.

Also - just a tip - try to avoid creating new objects wherever
possible. You'll get better performance if you create one Text object
as an instance variable and re-use it by setting the value instead of
calling new Text("") on every output.

Thanks,
Dave

On 15 October 2012 21:39, Alberto Cordioli <cordioli.alberto@gmail.com> wrote:
> Hi Dave,
>
> thanks for your reply. Now it's more clear; in fact the code that I
> wrote is inspired to the old api, where the behavior is another.
> So, how can I achieve the same behavior as the old api? I need the
> second field of the first key object to stay the same among the
> iterations, in order to compare it with other objects. Do I have to
> clone the object?
>
>
> Thanks.
>
> On 15 October 2012 21:27, Dave Beech <dbeech@apache.org> wrote:
>> Hi Alberto
>>
>> The iterator you are looping over in your reduce method isn't a
>> self-contained list of values. What's actually happening is that
>> you're iterating through *part* of the sorted key/value set that was
>> sent to that reduce node, and it is the grouping comparator that
>> decides when to break that loop and call reduce again on the next key.
>>
>> Moreover, the "key" object is re-used. So, as you're iterating through
>> the values, what's actually happening is this pointer to the
>> associated key data moves with it - and you're seeing it change.
>>
>> This only happens in the new "mapreduce" API - in the older "mapred"
>> API you get the first key, and it appears to stay the same during the
>> loop.
>>
>> It's sometimes useful behaviour, but it's confusing how the two APIs
>> don't act the same.
>>
>> Hope that helps,
>> Dave
>>
>> On 15 October 2012 20:11, Alberto Cordioli <cordioli.alberto@gmail.com> wrote:
>>> Hi all,
>>>
>>> a very strange thing is happening with my hadoop program.
>>> My map simply emits tuples with a custom object as key (which
>>> implement WritableComparable).
>>> The object is made of 2 fields, and I implement my partitioner and
>>> groupingclass in such a way that only the first field is taken into
>>> account.
>>> The second field is just a tag and could be 1 or 2.
>>>
>>> This is the reducer's snippet:
>>>
>>> tag = key.getSecondField();
>>> Iterator it1 = values.iterator();
>>> while(it1.hasNext()){
>>>         it1.next();
>>>         collector.emit(new Text("dummy"), tag);
>>> }
>>>
>>> I would expect in my output all the lines with:
>>> dummy       1
>>> ...
>>> dummy       1
>>>
>>> but actually the value of tag changes in time and I obtain this type of output:
>>>
>>> dummy    1
>>> ...
>>> dummy    1
>>> dummy    2
>>> ...
>>> dummy    2
>>>
>>>
>>> Someone could explain me way, please?
>>>
>>>
>>> Thanks.
>>>
>>>
>>>
>>>
>>>
>>> --
>>> Alberto Cordioli
>
>
>
> --
> Alberto Cordioli

Mime
View raw message