pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Cheolsoo Park (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (PIG-3015) Rewrite of AvroStorage
Date Fri, 16 Nov 2012 03:46:14 GMT

    [ https://issues.apache.org/jira/browse/PIG-3015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13498579#comment-13498579
] 

Cheolsoo Park commented on PIG-3015:
------------------------------------

Hi Joseph,

First of all, thank you so much!

Secondly, considering the size of the patch, would you mind uploading it to the RB? This will
encourage more people to review it.
https://reviews.apache.org/

You can choose pig-git to upload a diff file from the github repository.

Thirdly, I haven't fully read the patch yet and will do once it's uploaded on the RB. But
I have a few minor comments as below:
- Can you please add the Apache license header to every new file?
- Can you please remove @author tags?
- Can you please replace {{System.err.println()}} with {{common.logging.log}}?
- Our indentation convention is 4 spaces and no tabs. You used 2 spaces, and I see 2 tabs
in {{directory_test.pig}}.

Lastly, your bash script probably should be replaced by a python script (or another cross-platform
script) because there is an on-going effort of porting Pig to Windows (PIG-2793). In particular,
TestAvroStorage is added to the unit test suites, this will be an issue. Please feel free
to open a sub-task for converting it to Python if you'd like to get help.
                
> Rewrite of AvroStorage
> ----------------------
>
>                 Key: PIG-3015
>                 URL: https://issues.apache.org/jira/browse/PIG-3015
>             Project: Pig
>          Issue Type: Improvement
>          Components: piggybank
>            Reporter: Joseph Adler
>            Assignee: Joseph Adler
>         Attachments: PIG-3015.patch
>
>
> The current AvroStorage implementation has a lot of issues: it requires old versions
of Avro, it copies data much more than needed, and it's verbose and complicated. (One pet
peeve of mine is that old versions of Avro don't support Snappy compression.)
> I rewrote AvroStorage from scratch to fix these issues. In early tests, the new implementation
is significantly faster, and the code is a lot simpler. Rewriting AvroStorage also enabled
me to implement support for Trevni.
> I'm opening this ticket to facilitate discussion while I figure out the best way to contribute
the changes back to Apache.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message