incubator-mrunit-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Brock Noland <br...@cloudera.com>
Subject Re: Deserializer used for both Map and Reducer context.write()
Date Wed, 09 May 2012 14:17:42 GMT
Hi,

As Jim says, I wonder if MRUNIT-101 will help.  Would it possible to
share the exception/error you saw?  If you have time, I'd enjoy seeing
a small example of the code in question so we can add that to our test
suite.

Cheers,
Brock

On Wed, May 9, 2012 at 8:02 AM, Jim Donofrio <donofrio111@gmail.com> wrote:
> I am not too familar with Avro, maybe someone else can respond but if the
> AvroKeyOutputFormat does the serialization then MRUNIT-101 [1] should fix
> your problem. I am just finishing this JIRA up, it works under Hadoop 1+, I
> am having issues with TaskAttemptContext and JobContext changing from
> classes to interfaces in the mapreduce api in Hadoop 0.23.
>
> I should resolve this over the next few days. In the meantime if you can
> post your code I can test against it. It may also be worth the MRUnit
> project exploring having Jenkins deploy a snapshot to Nexus so you can
> easily test against the trunk without having to build it or download the jar
> from Jenkins.
>
> [1]: https://issues.apache.org/jira/browse/MRUNIT-101
>
>
> On 05/09/2012 03:15 AM, Jacob Metcalf wrote:
>>
>>
>> I am trying to integrate Avro-1.7 (specifically the new MR2 extensions),
>> MRUnit-0.9.0 and Hadoop-0.23. Assuming I have not made any mistakes my
>> question is should MRUnit be using the Serialization factory when I call
>> context.write() in a reducer.
>>
>> I am using MapReduceDriver and my mapper has output signature:
>>
>> <AvroKey<SpecificKey1>,AvroValue<SpecificValue1>>
>>
>> My reducer has a different outputt signature:
>>
>> <AvroKey<SpecificValue2>, Null>.
>>
>> I am using Avro specific serialization so I set my Avro schemas like this:
>>
>> AvroSerialization.addToConfiguration( configuration );
>> AvroSerialization.setKeyReaderSchema(configuration,  SpecificKey1.SCHEMA$
>> );
>> AvroSerialization.setKeyWriterSchema(configuration,   SpecificKey1.SCHEMA$
>> );
>>        AvroSerialization.setValueReaderSchema(configuration,
>> SpecificValue1.SCHEMA$ );
>> AvroSerialization.setValueWriterSchema(configuration,
>> SpecificValue1.SCHEMA$ );
>>
>> My understanding of Avro MR is that the Serialization class is intended to
>> be invoked between the map and reduce phase.
>>
>> However my test fails at reduce stage. Debugging I realised the mock
>> reducer context is using the serializer to copy objects:
>>
>>
>> https://github.com/apache/mrunit/blob/trunk/src/main/java/org/apache/hadoop/mrunit/internal/mapreduce/MockContextWrapper.java
>>
>> Looking at the AvroSerialization object it only expects one set of
>> schemas:
>>
>>
>> http://svn.apache.org/viewvc/avro/trunk/lang/java/mapred/src/main/java/org/apache/avro/hadoop/io/AvroSerialization.java?view=markup
>>
>> So when my reducer tries to write SpecificValue2 to the context, MRUnit's
>> mock then tries to serialise SpecificValue2 with Value1.SCHEMA$ and as a
>> result fails.
>>
>> I have yet debugged Hadoop itself but I did read some comments (which I
>> since cannot locate) which says that the Serialization class is typically
>> not used for the output of the reduce stage. My limited understanding is
>> that the OutputFormat (e.g. AvroKeyOutputFormat) will act as the
>> deserializer when you are running in Hadoop.
>>
>> I can spend some time distilling my code into a simple example but
>> wondered if anyone had any pointers - or an Avro + MR2 + MRUnit example.
>>
>> Jacob
>>
>>
>>
>



-- 
Apache MRUnit - Unit testing MapReduce - http://incubator.apache.org/mrunit/

Mime
View raw message