hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Alex Loddengaard (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-3788) Add serialization for Protocol Buffers
Date Tue, 16 Sep 2008 03:17:44 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-3788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12631239#action_12631239

Alex Loddengaard commented on HADOOP-3788:

FYI: from the PB discussion group [here|http://groups.google.com/group/protobuf/browse_thread/thread/19ab6bbb364fef35]:
At Google, we have lots of various container formats, for streaming, record-based files, database
etc., where each record is a protocol buffer.  All of these formats store the size of the
message before the
message itself.  Our philosophy is that because we have protocol buffers, all of these *other*
formats and
protocols can be designed to pass around arbitrary byte blobs, which greatly simplifies them.
 An arbitrary
byte blob is not necessarily self-delimiting, so it's up to these container formats to keep
track of the
size separately.

A possible solution would be to change the interface between _Message_ instances and the _PBSerialization_
framework such that a wrapping class, call it _PBMessageWrapper_ contains the length and logic
to delimit the stream.  Instances of this interface could create a new stream for deserializing,
though serializing would now become more tricky -- the _OutputStream_ when serializing would
need meta data included in it.  It might also be possible to create a general instance of
_PBMessageWrapper_, instead of creating wrappers for each _Message_.


> Add serialization for Protocol Buffers
> --------------------------------------
>                 Key: HADOOP-3788
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3788
>             Project: Hadoop Core
>          Issue Type: Wish
>          Components: examples, mapred
>    Affects Versions: 0.19.0
>            Reporter: Tom White
>            Assignee: Alex Loddengaard
>             Fix For: 0.19.0
>         Attachments: hadoop-3788-v1.patch, hadoop-3788-v2.patch, protobuf-java-2.0.1.jar
> Protocol Buffers (http://code.google.com/p/protobuf/) are a way of encoding data in a
compact binary format. This issue is to write a ProtocolBuffersSerialization to support using
Protocol Buffers types in MapReduce programs, including an example program. This should probably
go into contrib. 

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message