avro-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Martin Kleppmann <mar...@rapportive.com>
Subject Re: map/reduce of compressed Avro
Date Tue, 23 Apr 2013 10:38:05 GMT
To my knowledge, LZO is not a supported codec for Avro data files. It's
possible that you have a LZO-compressed Hadoop sequence file containing
Avro records, but that would be a format you defined yourself, and not the
same as an Avro data file.

Avro data files are designed to be splittable regardless of the codec they
use, so you can have multiple mappers that each consume a portion of the
input file. The format achieves that by breaking the data into blocks, and
compressing each block separately; hence it can be split at block
boundaries.

Best,
Martin


On 22 April 2013 23:47, nir_zamir <nir.zamir@gmail.com> wrote:

> Thanks Martin.
>
> What will happen if I try to use an indexed LZO-compressed avro file? Will
> it work and utilize the index to allow multiple mappers?
>
> I think that for Snappy for example, the file is splittable and can use
> multiple mappers, but I haven't tested it yet - would be glad if anyone has
> any experience with that.
>
> Thanks!
> Nir.
>
>
>
> --
> View this message in context:
> http://apache-avro.679487.n3.nabble.com/map-reduce-of-compressed-Avro-tp4026947p4027009.html
> Sent from the Avro - Users mailing list archive at Nabble.com.
>

Mime
View raw message