hadoop-pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Doug Cutting (JIRA)" <j...@apache.org>
Subject [jira] Commented: (PIG-794) Use Avro serialization in Pig
Date Wed, 08 Sep 2010 16:11:36 GMT

    [ https://issues.apache.org/jira/browse/PIG-794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12907280#action_12907280

Doug Cutting commented on PIG-794:

Jeff, please instead use current trunk or the 1.4.0 build that I expect to be released tomorrow
(http://people.apache.org/~cutting/avro-1.4.0-rc4/).  There was a bug that caused a similar
failure in the snapshot you're using, but that should only happen in multi-threaded applications,
which I doubt yours is, but it's better to either test against trunk or a release so we don't
chase ghosts.

Further, while debugging a DatumWriter and DatumReader, you might use a ValidatingEncoder
and ValidatingDecoder to ensure that what you write and read conforms to your schema.  You
might also test by reading and printing your data with GenericDatumReader to see that you've
written what you meant to write.  If you've written data that does not conform to your declared
schema then it cannot be read correctly.  If this is the case, we should attempt to improve
the error message here.

> Use Avro serialization in Pig
> -----------------------------
>                 Key: PIG-794
>                 URL: https://issues.apache.org/jira/browse/PIG-794
>             Project: Pig
>          Issue Type: Improvement
>          Components: impl
>    Affects Versions: 0.2.0
>            Reporter: Rakesh Setty
>            Assignee: Dmitriy V. Ryaboy
>         Attachments: avro-0.1-dev-java_r765402.jar, AvroStorage.patch, AvroStorage_2.patch,
AvroStorage_3.patch, AvroStorage_4.patch, AvroTest.java, jackson-asl-0.9.4.jar, PIG-794.patch
> We would like to use Avro serialization in Pig to pass data between MR jobs instead of
the current BinStorage. Attached is an implementation of AvroBinStorage which performs significantly
better compared to BinStorage on our benchmarks.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message