hadoop-pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Scott Carey (JIRA)" <j...@apache.org>
Subject [jira] Commented: (PIG-794) Use Avro serialization in Pig
Date Tue, 31 Aug 2010 16:36:56 GMT

    [ https://issues.apache.org/jira/browse/PIG-794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12904674#action_12904674
] 

Scott Carey commented on PIG-794:
---------------------------------

AVRO-592 creates an AvroStorage class for writing and reading M/R inputs and outputs but does
not deal with intermediate M/R output.  I have some updates to that in progress that simplify
it more.   Some aspects may be re-usable for this too.   

One thing to note is that Avro cannot be completely optimal for intermediate M/R output because
the Hadoop API for this has a performance flaw that prevents efficient use of buffers and
input/output streams there.  This would affect InterStorage as well though.

I'll take a look at the patch here and see if I can see any performance optimizations.
Note, that there are still several performance optimizations left to do in Avro itself.  For
example, the BinaryDecoder has been optimized, but not the Encoder yet.

Also, I'm somewhat blocked with AVRO-592 due to lack of Pig 0.7 maven availability. 



> Use Avro serialization in Pig
> -----------------------------
>
>                 Key: PIG-794
>                 URL: https://issues.apache.org/jira/browse/PIG-794
>             Project: Pig
>          Issue Type: Improvement
>          Components: impl
>    Affects Versions: 0.2.0
>            Reporter: Rakesh Setty
>            Assignee: Dmitriy V. Ryaboy
>         Attachments: avro-0.1-dev-java_r765402.jar, AvroStorage.patch, AvroStorage_2.patch,
AvroStorage_3.patch, AvroTest.java, jackson-asl-0.9.4.jar, PIG-794.patch
>
>
> We would like to use Avro serialization in Pig to pass data between MR jobs instead of
the current BinStorage. Attached is an implementation of AvroBinStorage which performs significantly
better compared to BinStorage on our benchmarks.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message