hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tom White (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-3788) Add serialization for Protocol Buffers
Date Thu, 11 Sep 2008 08:57:44 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-3788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12630130#action_12630130

Tom White commented on HADOOP-3788:

bq. PBs do not provide a mechanism to limit the amount of data read from a stream, so your
solution of breaking key, value pairs into two streams is the approach we should take.

Of course the other option is to propose changes to PB (which is open source) to limit the
amount of data read. I think a change to CodedInputStream would be relatively simple.

As a quick experiment I modified a working MapReduce program so that the deserializer read
to the end of the stream. It failed in ReduceValuesIterator. So to make this work would require
changing more than just SequenceFile. Perhaps this reveals a bug in the MR system - one that
has been masked because existing serializers only consume as much as they need. (So if they
are given more than they need it's not a problem.) Either way I worry about defining the contract
for deserializers so that the end of the stream marks the end of the object being read as
it might limit optimizations we may make in the future. What do others think?

> Add serialization for Protocol Buffers
> --------------------------------------
>                 Key: HADOOP-3788
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3788
>             Project: Hadoop Core
>          Issue Type: Wish
>          Components: examples, mapred
>    Affects Versions: 0.19.0
>            Reporter: Tom White
>            Assignee: Alex Loddengaard
>             Fix For: 0.19.0
>         Attachments: hadoop-3788-v1.patch, protobuf-java-2.0.1.jar
> Protocol Buffers (http://code.google.com/p/protobuf/) are a way of encoding data in a
compact binary format. This issue is to write a ProtocolBuffersSerialization to support using
Protocol Buffers types in MapReduce programs, including an example program. This should probably
go into contrib. 

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message