incubator-mrunit-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jim Donofrio <donofrio...@gmail.com>
Subject Re: Deserializer used for both Map and Reducer context.write()
Date Wed, 09 May 2012 13:02:24 GMT
I am not too familar with Avro, maybe someone else can respond but if 
the AvroKeyOutputFormat does the serialization then MRUNIT-101 [1] 
should fix your problem. I am just finishing this JIRA up, it works 
under Hadoop 1+, I am having issues with TaskAttemptContext and 
JobContext changing from classes to interfaces in the mapreduce api in 
Hadoop 0.23.

I should resolve this over the next few days. In the meantime if you can 
post your code I can test against it. It may also be worth the MRUnit 
project exploring having Jenkins deploy a snapshot to Nexus so you can 
easily test against the trunk without having to build it or download the 
jar from Jenkins.

[1]: https://issues.apache.org/jira/browse/MRUNIT-101

On 05/09/2012 03:15 AM, Jacob Metcalf wrote:
>
> I am trying to integrate Avro-1.7 (specifically the new MR2 
> extensions), MRUnit-0.9.0 and Hadoop-0.23. Assuming I have not made 
> any mistakes my question is should MRUnit be using the Serialization 
> factory when I call context.write() in a reducer.
>
> I am using MapReduceDriver and my mapper has output signature:
>
> <AvroKey<SpecificKey1>,AvroValue<SpecificValue1>>
>
> My reducer has a different outputt signature:
>
> <AvroKey<SpecificValue2>, Null>.
>
> I am using Avro specific serialization so I set my Avro schemas like this:
>
> AvroSerialization.addToConfiguration( configuration );
> AvroSerialization.setKeyReaderSchema(configuration, 
>  SpecificKey1.SCHEMA$ );
> AvroSerialization.setKeyWriterSchema(configuration,   
> SpecificKey1.SCHEMA$ );
>         AvroSerialization.setValueReaderSchema(configuration, 
> SpecificValue1.SCHEMA$ );
> AvroSerialization.setValueWriterSchema(configuration, 
> SpecificValue1.SCHEMA$ );
>
> My understanding of Avro MR is that the Serialization class is 
> intended to be invoked between the map and reduce phase.
>
> However my test fails at reduce stage. Debugging I realised the mock 
> reducer context is using the serializer to copy objects:
>
> https://github.com/apache/mrunit/blob/trunk/src/main/java/org/apache/hadoop/mrunit/internal/mapreduce/MockContextWrapper.java

>
>
> Looking at the AvroSerialization object it only expects one set of 
> schemas:
>
> http://svn.apache.org/viewvc/avro/trunk/lang/java/mapred/src/main/java/org/apache/avro/hadoop/io/AvroSerialization.java?view=markup

>
>
> So when my reducer tries to write SpecificValue2 to the context, 
> MRUnit's mock then tries to serialise SpecificValue2 with 
> Value1.SCHEMA$ and as a result fails.
>
> I have yet debugged Hadoop itself but I did read some comments (which 
> I since cannot locate) which says that the Serialization class is 
> typically not used for the output of the reduce stage. My limited 
> understanding is that the OutputFormat (e.g. AvroKeyOutputFormat) will 
> act as the deserializer when you are running in Hadoop.
>
> I can spend some time distilling my code into a simple example but 
> wondered if anyone had any pointers - or an Avro + MR2 + MRUnit example.
>
> Jacob
>
>
>

Mime
View raw message