hadoop-pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Scott Carey (JIRA)" <j...@apache.org>
Subject [jira] Commented: (PIG-794) Use Avro serialization in Pig
Date Tue, 31 Aug 2010 17:03:58 GMT

    [ https://issues.apache.org/jira/browse/PIG-794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12904683#action_12904683
] 

Scott Carey commented on PIG-794:
---------------------------------

bq.  The performance of InterRecordWriter is much better than AvroRecordWriter, internally
they use DataFileWriter (avro) and FSDataOutputStream (inter). And both of them use BufferedOutputStream
as one buffer layer. The difference is that DataFileWriter (avro) has another buffer layer,
it will first write contents to an in-memory block and then write it to BufferedOutputStream
when the block is full. Not sure whether this layer have overhead.

I've tested this a bit before, the extra block copy is minor overhead.  How the BufferedOutputStream
is used is the problem.  We have not yet optimized the write side of Avro completely -- there
are enhancements to the serialization process that can be done.

> Use Avro serialization in Pig
> -----------------------------
>
>                 Key: PIG-794
>                 URL: https://issues.apache.org/jira/browse/PIG-794
>             Project: Pig
>          Issue Type: Improvement
>          Components: impl
>    Affects Versions: 0.2.0
>            Reporter: Rakesh Setty
>            Assignee: Dmitriy V. Ryaboy
>         Attachments: avro-0.1-dev-java_r765402.jar, AvroStorage.patch, AvroStorage_2.patch,
AvroStorage_3.patch, AvroTest.java, jackson-asl-0.9.4.jar, PIG-794.patch
>
>
> We would like to use Avro serialization in Pig to pass data between MR jobs instead of
the current BinStorage. Attached is an implementation of AvroBinStorage which performs significantly
better compared to BinStorage on our benchmarks.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message