avro-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From ey-chih chow <eyc...@hotmail.com>
Subject RE: how to specify MultipleOutputs, MultipleInputs in using Avro mapred API
Date Wed, 18 Aug 2010 13:15:12 GMT

Hi,
Let me rephrase my question to see if anybody is interested in answering it.  For the new
version of Avro 1.4.0, the class hierarchy of AvroMapper and AvroReducer have been changed
to subclass from Configured, rather than from MapReduceBase to implement the interfaces Mapper
and Reducer respectively.  The configuration of Avro mapred jobs are also different from that
of the other mapred jobs.  Furthermore, text log files have to be imported to become Avro
formats for Avro mapred jobs to process.  If I get a mapred job that requires a reducer-side
join of a two inputs, one from HBase and the other from an imported log file with the Avro
format, how can I configure the two mappers to process inputs from HBase and the log file
respectively?  Also how can I configure an Avro reducer to generate multiple outputs?  For
multiple inputs and outputs, I got some examples programs from Tom White's Hadoop book.  But
I simply don't know what kind of changes I should make for the Avro case.   
Ey-Chih  

From: eychih@hotmail.com
To: user@avro.apache.org
Subject: how to specify MultipleOutputs, MultipleInputs in using Avro mapred API
Date: Mon, 16 Aug 2010 18:22:24 -0700








Hi,
I got a Map/Reduce job that require multiple inputs and outputs.  One of the inputs will be
processed by a mapper and a reducer that are subclasses of AvroMapper/AvroReducer respectively.
 And the reducer has multiple outputs.  I appreciate if anybody could let me know how to configure
the job to do this.
Ey-Chih    		 	   		  
Mime
View raw message