hadoop-pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Doug Cutting (JIRA)" <j...@apache.org>
Subject [jira] Commented: (PIG-794) Use Avro serialization in Pig
Date Tue, 05 May 2009 21:21:36 GMT

    [ https://issues.apache.org/jira/browse/PIG-794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12706197#action_12706197
] 

Doug Cutting commented on PIG-794:
----------------------------------

> I think we have to ask the Avro team to support this (current position in the stream)
for us to proceed with this. 

ValueReader performs no buffering, so its position is always the same as the InputStream that
it wraps.  See DataFileReader#SeekableBufferedInput for an example of an input stream that
tracks its position.

Note that AVRO-25 proposes to add buffering to ValueWriter, so that the position of the underlying
stream might be different than that of the ValueWriter, but I do not forsee a need to add
this to ValueReader, the concern here.

> Use Avro serialization in Pig
> -----------------------------
>
>                 Key: PIG-794
>                 URL: https://issues.apache.org/jira/browse/PIG-794
>             Project: Pig
>          Issue Type: Improvement
>          Components: impl
>    Affects Versions: 0.2.0
>            Reporter: Rakesh Setty
>         Attachments: AvroBinStorage.patch
>
>
> We would like to use Avro serialization in Pig to pass data between MR jobs instead of
the current BinStorage. Attached is an implementation of AvroBinStorage which performs significantly
better compared to BinStorage on our benchmarks.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message