avro-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Matt Pouttu-Clarke <Matt.Pouttu-Cla...@icrossing.com>
Subject Re: Multiple input schemas in MapReduce?
Date Wed, 11 May 2011 21:18:34 GMT
Hi Markus,

You could use Cascading.  The Cascading.Avro extension automatically
transforms the Avro data into a TupleEntry (a generic object similar to
java.util.Map).  Then you can combine and process data as however you wish
downstream.

Please check this entry for more info:
http://mpouttuclarke.wordpress.com/2011/01/13/cascading-avro/

Cheers,
Matt

On 5/11/11 11:44 AM, "Markus Weimer" <weimer@yahoo-inc.com> wrote:

> Hi,
> 
> I'd like to write a mapreduce job that uses avro throughout, but the map phase
> would need to read files with two different schemas, similar to what the
> MultipleInputFormat does in stock hadoop. Is this a supported use case?
> 
> A work-around would be to create a union schema that has both fields as
> optional and to convert all data into it, but that seems clumsy.
> 
> Has anyone done this before?
> 
> Thanks for any suggestion you can give,
> 
> Markus
> 


iCrossing Privileged and Confidential Information
This email message is for the sole use of the intended recipient(s) and may contain confidential
and privileged information of iCrossing. Any unauthorized review, use, disclosure or distribution
is prohibited. If you are not the intended recipient, please contact the sender by reply email
and destroy all copies of the original message.



Mime
View raw message