hadoop-pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dmitriy Ryaboy <dvrya...@gmail.com>
Subject Re: Avoiding serialization/de-serialization in pig
Date Tue, 29 Jun 2010 00:51:12 GMT
For what it's worth, I saw very significant speed improvements (order of
magnitude for wide tables with few projected columns) when I implemented (2)
for our protocol buffer - based loaders.

I have a feeling that propagating schemas when known, and using them to for
(de)serialization instead of reflecting every field, would also be a big
win.

Thoughts on just using Avro for the internal PigStorage?

-D

On Mon, Jun 28, 2010 at 5:08 PM, Thejas Nair <tejas@yahoo-inc.com> wrote:

> I have created a wiki which puts together some ideas that can help in
> improving performance by avoiding/delaying serialization/de-serialization .
>
> http://wiki.apache.org/pig/AvoidingSedes
>
> These are ideas that don't involve changes to optimizer. Most of them
> involve changes in the load/store functions.
>
> Your feedback is welcome.
>
> Thanks,
> Thejas
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message