avro-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From ey-chih chow <eyc...@hotmail.com>
Subject RE: how to specify MultipleOutputs, MultipleInputs in using Avro mapred API
Date Wed, 18 Aug 2010 17:18:24 GMT

Thanks.  But by doing this way, what kind of advantage we can get from Avro?
Ey-Chih

> From: qwertymaniac@gmail.com
> Date: Wed, 18 Aug 2010 19:39:17 +0530
> Subject: Re: how to specify MultipleOutputs, MultipleInputs in using Avro mapred API
> To: user@avro.apache.org
> 
> If I got your issue right, all you need to ensure is that both your
> mappers emit the same "type" of keys and values out. This can easily
> be done by implementing a custom Avro Mapper [which reads records from
> avro files, processes them and spews out legal K/V types instead of
> avro datums, such that they match your HBase mapper's collected
> outputs].
> 
> Your reducer shouldn't be bothered about avro/etc then.
> 
> * Note: You may also use avro as intermediate K/V format, but it might
> require some extra work to do so :)
> 
> On Wed, Aug 18, 2010 at 6:45 PM, ey-chih chow <eychih@hotmail.com> wrote:
> > Hi,
> > Let me rephrase my question to see if anybody is interested in answering it.
> >  For the new version of Avro 1.4.0, the class hierarchy of AvroMapper and
> > AvroReducer have been changed to subclass from Configured, rather than from
> > MapReduceBase to implement the interfaces Mapper and Reducer respectively.
> >  The configuration of Avro mapred jobs are also different from that of the
> > other mapred jobs.  Furthermore, text log files have to be imported to
> > become Avro formats for Avro mapred jobs to process.  If I get a mapred job
> > that requires a reducer-side join of a two inputs, one from HBase and the
> > other from an imported log file with the Avro format, how can I configure
> > the two mappers to process inputs from HBase and the log file respectively?
> >  Also how can I configure an Avro reducer to generate multiple outputs?  For
> > multiple inputs and outputs, I got some examples programs from Tom White's
> > Hadoop book.  But I simply don't know what kind of changes I should make for
> > the Avro case.
> > Ey-Chih
> >
> > ________________________________
> > From: eychih@hotmail.com
> > To: user@avro.apache.org
> > Subject: how to specify MultipleOutputs, MultipleInputs in using Avro mapred
> > API
> > Date: Mon, 16 Aug 2010 18:22:24 -0700
> >
> > Hi,
> > I got a Map/Reduce job that require multiple inputs and outputs.  One of the
> > inputs will be processed by a mapper and a reducer that are subclasses of
> > AvroMapper/AvroReducer respectively.  And the reducer has multiple outputs.
> >  I appreciate if anybody could let me know how to configure the job to do
> > this.
> > Ey-Chih
> 
> 
> 
> -- 
> Harsh J
> www.harshj.com
 		 	   		  
Mime
View raw message