avro-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Harsh J <ha...@cloudera.com>
Subject Re: Avro MapReduce (MR1): Prevent Key from being output by reducer when using Pair schema
Date Thu, 16 Jan 2014 09:47:26 GMT
Hello Ed,

The AvroReducer per
http://avro.apache.org/docs/1.7.4/api/java/org/apache/avro/mapred/AvroReducer.html
has a simple spec of <K,V,OUT>, where OUT can be any record type and
not necessarily a Pair<KO,VO> type.

AvroJob.setOutputSchema(…) should accept non-pair configs. I think its
java-doc is incorrect though. I wrote a test case yesterday at
http://issues.apache.org/jira/browse/AVRO-1439, in which I set a
non-Pair schema via the same call without any trouble. We could get
the java-doc fixed, if it is indeed wrong.

On Thu, Jan 16, 2014 at 2:14 PM, ed <edorsey@gmail.com> wrote:
> Hello,
>
> I am currently reading in lots of small avro files and then writing them out
> into one large avro file using Map Reduce MR1.  I'm trying to do this using
> the AvroMapper and AvroReducer and it's almost working how I want.
>
> The problem right now is that it looks like I have to use
> "org.apache.avro.mapred.Pair" if I use "AvroJob.setOutputSchema".  Is there
> a way to output a Pair schema from AvroReducer and have the "key" in that
> schema be ignored (i.e., not included in the output from the reducer)?
> Right now when I check the Reducer output there is an added field in each
> record called "key" which I'd like to not have there.
>
> Essentially I'm looking for something like NullWritable where the key will
> just be ignored in the final output.
>
> Thank you for any assistance or guidance you can provide!
>
> Best Regards,
>
> Ed



-- 
Harsh J

Mime
View raw message