hadoop-pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Olga Natkovich (JIRA)" <j...@apache.org>
Subject [jira] Commented: (PIG-794) Use Avro serialization in Pig
Date Sun, 03 May 2009 00:17:30 GMT

    [ https://issues.apache.org/jira/browse/PIG-794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12705351#action_12705351

Olga Natkovich commented on PIG-794:

Hi Rakesh,

Thanks for the patch. A few comments below.

First, a few general comments:

(1) AVROBinStorage should not be in builtins. We don't want to expose to the end user because
in the past we had issues with backward compatibility (with BinStorage) when the same function
was used both internally ane externally.
(2) Every new file needs to have an apache license header. You can get one from a file in
(3) I would just call the class AVROStorage
(4) Once we are fully integrated with AVRO, we should at unit tests but for now this is fine
(5) It would be nice to have javadoc comments in the data. At a minimum a header for each
class on what it does and each public method. Also, it would be good to document any non-obvious

Now, code related comments: what is the reason for having AVROValueReader. It seems to be
a streight wrapper around ValueReader + position which we can keep track separately. I am
concerned with the performance overhad that happens on each call. 

> Use Avro serialization in Pig
> -----------------------------
>                 Key: PIG-794
>                 URL: https://issues.apache.org/jira/browse/PIG-794
>             Project: Pig
>          Issue Type: Improvement
>          Components: impl
>    Affects Versions: 0.2.0
>            Reporter: Rakesh Setty
>         Attachments: AvroBinStorage.patch
> We would like to use Avro serialization in Pig to pass data between MR jobs instead of
the current BinStorage. Attached is an implementation of AvroBinStorage which performs significantly
better compared to BinStorage on our benchmarks.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message