avro-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bertrand Dechoux <decho...@gmail.com>
Subject Re: Generating snappy compressed avro files as hadoop map reduce input files
Date Sun, 13 Oct 2013 15:36:28 GMT
I am not sure to understand the relation between your problem and the way
the temporary data are stored after the map phase.

However, I guess you are looking for a DataFileWriter and its setCodec
function.
http://avro.apache.org/docs/current/api/java/org/apache/avro/file/DataFileWriter.html#setCodec%28org.apache.avro.file.CodecFactory%29

Regards

Bertrand

PS : A snappy-compressed avro file is not a standard file which has been
compressed afterwards but really a specific file containing compressed
blocks. This principle is similar to the SequenceFile's. Maybe that's what
you mean by different snappy codec?

On Sun, Oct 13, 2013 at 5:16 PM, David Ginzburg <davidg@inner-active.com>wrote:

>  Hi,
>
> I am writing an application that produces avro record files , to be stored
> on AWS S3 as possible input to EMR.
> I would like to compress with snappy codec before storing them on S3.
> It is my understanding that hadoop currently uses a different snappy
> codec, mostly used as intermediate map output format .
> My question is how can I generate within my application logic (not MR)
> snappy compressed avro files?
>
>
>
>

Mime
View raw message