avro-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Scott Carey <scottca...@apache.org>
Subject Re: Map output records/reducer input records mismatch
Date Wed, 17 Aug 2011 17:06:38 GMT
On 8/17/11 1:32 AM, "Vyacheslav Zholudev" <vyacheslav.zholudev@gmail.com>
wrote:

> Hi Scott,
> 
> The pair types are Pair<CharSequence, SomeSpecificJavaClass>, but in essence
> when I call "collect()" then I always provide a java.lang.String object.
> 
> The reduce method is
> reduce(CharSequence key, Iterable<SomeSpecificJavaClass> values, .....)

What happens if you change it to Pair<String, SomeSpecificJavaClass> or
<Utf8, SomeSpecificJavaClass> ?  Does the problem persist?

> 
> Some more detailed info:
> the jobtracker and namenode run with:
> java version "1.6.0_22"
> Java(TM) SE Runtime Environment (build 1.6.0_22-b04)
> Java HotSpot(TM) 64-Bit Server VM (build 17.1-b03, mixed mode)
> 
> the tasktrackers and datanodes run with:
> java version "1.6.0_24"
> Java(TM) SE Runtime Environment (build 1.6.0_24-b07)
> Java HotSpot(TM) 64-Bit Server VM (build 19.1-b02, mixed mode)
> 
> Hadoop version is:
> cdh3u1
> 
> Thanks for suggestions,
> Vyacheslav
> 
> 
> 
> 
> On Aug 17, 2011, at 3:56 AM, Scott Carey wrote:
> 
>> On 8/16/11 3:56 PM, "Vyacheslav Zholudev" <vyacheslav.zholudev@gmail.com>
>> wrote:
>> 
>>> Hi, Scott,
>>> 
>>> thanks for your reply.
>>> 
>>>> What Avro version is this happening with? What JVM version?
>>> 
>>> We are using Avro 1.5.1 and Sun JDK 6, but the exact version I will have
>>> to look up.
>>> 
>>>> 
>>>> On a hunch, have you tried adding -XX:-UseLoopPredicate to the JVM args
>>>> if
>>>> it is Sun and JRE 6u21 or later? (some issues in loop predicates affect
>>>> Java 6 too, just not as many as the recent news on Java7).
>>>> 
>>>> Otherwise, it may likely be the same thing as AVRO-782.  Any extra
>>>> information related to that issue would be welcome.
>>> 
>>> I will have to collect it. In the meanwhile, do you have any reasonable
>>> explanations of the issue besides it being something like AVRO-782?
>> 
>> What is your key type (map output schema, first type argument of Pair)?
>> Is your key a Utf8 or String?  I don't have a reasonable explanation at
>> this point, I haven't looked into it in depth with a good reproducible
>> case.  I have my suspicions with how recycling of the key works since Utf8
>> is mutable and its backing byte[] can end up shared.
>> 
>> 
>> 
>>> 
>>> Thanks a lot,
>>> Vyacheslav
>>> 
>>>> 
>>>> Thanks!
>>>> 
>>>> -Scott
>>>> 
>>>> 
>>>> 
>>>> On 8/16/11 8:39 AM, "Vyacheslav Zholudev"
>>>> <vyacheslav.zholudev@gmail.com>
>>>> wrote:
>>>> 
>>>>> Hi,
>>>>> 
>>>>> I'm having multiple hadoop jobs that use the avro mapred API.
>>>>> Only in one of the jobs I have a visible mismatch between a number of
>>>>> map
>>>>> output records and reducer input records.
>>>>> 
>>>>> Does anybody encountered such a behavior? Can anybody think of possible
>>>>> explanations of this phenomenon?
>>>>> 
>>>>> Any pointers/thoughts are highly appreciated!
>>>>> 
>>>>> Best,
>>>>> Vyacheslav
>>>> 
>>>> 
>>> 
>>> Best,
>>> Vyacheslav
>>> 
>>> 
>>> 
>> 
>> 
> 



Mime
View raw message