hadoop-pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jeff Zhang (JIRA)" <j...@apache.org>
Subject [jira] Commented: (PIG-794) Use Avro serialization in Pig
Date Tue, 31 Aug 2010 08:19:00 GMT

    [ https://issues.apache.org/jira/browse/PIG-794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12904555#action_12904555
] 

Jeff Zhang commented on PIG-794:
--------------------------------

Besides the above experiment, I also did a experiment to compare AvroRecordWriter and InterRecordWriter
in local environment. You can see the attached file AvroTest.java
I write 50,000,000 records using these two RecordWriter, and time spent on AvroRecordWriter
is 70 seconds while it is 29 seconds using InterRecordWriter. 

The performance of InterRecordWriter is much better than AvroRecordWriter, internally they
use DataFileWriter (avro) and FSDataOutputStream (inter).  And both of them use BufferedOutputStream
as one buffer layer. The difference is that DataFileWriter (avro) has another buffer layer,
it will first write contents to an in-memory block and then write it to BufferedOutputStream
when the block is full. Not sure whether this layer have overhead.




> Use Avro serialization in Pig
> -----------------------------
>
>                 Key: PIG-794
>                 URL: https://issues.apache.org/jira/browse/PIG-794
>             Project: Pig
>          Issue Type: Improvement
>          Components: impl
>    Affects Versions: 0.2.0
>            Reporter: Rakesh Setty
>            Assignee: Dmitriy V. Ryaboy
>         Attachments: avro-0.1-dev-java_r765402.jar, AvroStorage.patch, AvroStorage_2.patch,
AvroTest.java, jackson-asl-0.9.4.jar, PIG-794.patch
>
>
> We would like to use Avro serialization in Pig to pass data between MR jobs instead of
the current BinStorage. Attached is an implementation of AvroBinStorage which performs significantly
better compared to BinStorage on our benchmarks.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message