avro-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From ed <edor...@gmail.com>
Subject Re: Avro MapReduce (MR1): Prevent Key from being output by reducer when using Pair schema
Date Fri, 17 Jan 2014 00:47:35 GMT
Hi Harsh,

I'd be happy to do that.  Thank you for your help!

Best,

Ed


On Thu, Jan 16, 2014 at 10:05 PM, Harsh J <harsh@cloudera.com> wrote:

> Thanks Ed! Can you also file an improvement JIRA under
> https://issues.apache.org/jira/browse/AVRO with a patch that changes
> it to make more sense?
>
> On Thu, Jan 16, 2014 at 5:14 PM, ed <edorsey@gmail.com> wrote:
> > Hi Harsh,
> >
> > Thank you for your response which was invaluable in helping me to figure
> out
> > my issue.  The Java-Doc is in fact incorrect when it states that
> > AvroJob.setOutputSchema cannot accept non-Pair configs as it turns out it
> > can.  What was throwing me off is that if you use
> AvroJob.setOutputSchema to
> > set a non-Pair config, then you also need to call
> AvroJob.setMapOutputSchema
> > (which does require the use of Pair).  Otherwise, by default, the map
> output
> > schema gets set to whatever you set in setOutputSchema and if that is
> > non-pair you'll get an error at runtime.
> >
> > Maybe the JavaDoc should say something along the lines of:
> >
> >> Configure a job's output schema. If this is a not a Pair-schema then you
> >> must explicitly set the job's map output schema using setMapOutputSchema
> >
> >
> > Thank you!
> >
> > Best Regards,
> >
> > Ed
> >
> >
> >
> >
> > On Thu, Jan 16, 2014 at 6:47 PM, Harsh J <harsh@cloudera.com> wrote:
> >>
> >> Hello Ed,
> >>
> >> The AvroReducer per
> >>
> >>
> http://avro.apache.org/docs/1.7.4/api/java/org/apache/avro/mapred/AvroReducer.html
> >> has a simple spec of <K,V,OUT>, where OUT can be any record type and
> >> not necessarily a Pair<KO,VO> type.
> >>
> >> AvroJob.setOutputSchema(…) should accept non-pair configs. I think its
> >> java-doc is incorrect though. I wrote a test case yesterday at
> >> http://issues.apache.org/jira/browse/AVRO-1439, in which I set a
> >> non-Pair schema via the same call without any trouble. We could get
> >> the java-doc fixed, if it is indeed wrong.
> >>
> >> On Thu, Jan 16, 2014 at 2:14 PM, ed <edorsey@gmail.com> wrote:
> >> > Hello,
> >> >
> >> > I am currently reading in lots of small avro files and then writing
> them
> >> > out
> >> > into one large avro file using Map Reduce MR1.  I'm trying to do this
> >> > using
> >> > the AvroMapper and AvroReducer and it's almost working how I want.
> >> >
> >> > The problem right now is that it looks like I have to use
> >> > "org.apache.avro.mapred.Pair" if I use "AvroJob.setOutputSchema".  Is
> >> > there
> >> > a way to output a Pair schema from AvroReducer and have the "key" in
> >> > that
> >> > schema be ignored (i.e., not included in the output from the reducer)?
> >> > Right now when I check the Reducer output there is an added field in
> >> > each
> >> > record called "key" which I'd like to not have there.
> >> >
> >> > Essentially I'm looking for something like NullWritable where the key
> >> > will
> >> > just be ignored in the final output.
> >> >
> >> > Thank you for any assistance or guidance you can provide!
> >> >
> >> > Best Regards,
> >> >
> >> > Ed
> >>
> >>
> >>
> >> --
> >> Harsh J
> >
> >
>
>
>
> --
> Harsh J
>

Mime
View raw message