pig-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Lin Guo <guolin2...@gmail.com>
Subject Re: comments appreciated for pig AvroStorage
Date Thu, 09 Dec 2010 08:20:06 GMT
Hi, Jeff,

We did some comparison of avro vs binary json (linkedin's
serialization system, it uses a JSON data model but a more compact
byte format; details in
https://github.com/voldemort/voldemort/wiki/Binary-JSON-Serialization)
before:

1. avro's in-memory serialization perf is 71% of binary json's;
2. avro's in-memory deserialization perf is 76% of binary json's;
3. on-disk serialization performance highly depends on compression algorithms;
4. when uncompressed, avro is more space efficient than binary json (I
didn't do many experiments in this case and got ratio 62.5% using a
couple sets of data).

Best,
Lin

On Tue, Nov 30, 2010 at 9:42 PM, Jeff Zhang <zjffdu@gmail.com> wrote:
> Lin,
>
> Great work. So you've already use it in Linkedin ? And how about the
> performance of AvroStorage compared to other Storage implementation ?
>
> On Wed, Dec 1, 2010 at 1:05 PM, Lin Guo <guolin2001@gmail.com> wrote:
>> Hi,
>>
>> We'd like to patch our pig AvroStorage function and
>> would highly appreciate any kinds of comments.
>>
>> doc:
>> http://snaprojects.jira.com/wiki/display/HTOOLS/AvroStorage+-+Pig+support+for+Avro+data
>>
>> jira:
>> https://issues.apache.org/jira/browse/PIG-1748
>>
>> Many thanks,
>> Lin
>>
>
>
>
> --
> Best Regards
>
> Jeff Zhang
>

Mime
View raw message