[ https://issues.apache.org/jira/browse/HADOOP-3788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12631239#action_12631239
]
Alex Loddengaard commented on HADOOP-3788:
------------------------------------------
FYI: from the PB discussion group [here|http://groups.google.com/group/protobuf/browse_thread/thread/19ab6bbb364fef35]:
{noformat}
At Google, we have lots of various container formats, for streaming, record-based files, database
tables,
etc., where each record is a protocol buffer. All of these formats store the size of the
message before the
message itself. Our philosophy is that because we have protocol buffers, all of these *other*
formats and
protocols can be designed to pass around arbitrary byte blobs, which greatly simplifies them.
An arbitrary
byte blob is not necessarily self-delimiting, so it's up to these container formats to keep
track of the
size separately.
{noformat}
A possible solution would be to change the interface between _Message_ instances and the _PBSerialization_
framework such that a wrapping class, call it _PBMessageWrapper_ contains the length and logic
to delimit the stream. Instances of this interface could create a new stream for deserializing,
though serializing would now become more tricky -- the _OutputStream_ when serializing would
need meta data included in it. It might also be possible to create a general instance of
_PBMessageWrapper_, instead of creating wrappers for each _Message_.
Thoughts?
> Add serialization for Protocol Buffers
> --------------------------------------
>
> Key: HADOOP-3788
> URL: https://issues.apache.org/jira/browse/HADOOP-3788
> Project: Hadoop Core
> Issue Type: Wish
> Components: examples, mapred
> Affects Versions: 0.19.0
> Reporter: Tom White
> Assignee: Alex Loddengaard
> Fix For: 0.19.0
>
> Attachments: hadoop-3788-v1.patch, hadoop-3788-v2.patch, protobuf-java-2.0.1.jar
>
>
> Protocol Buffers (http://code.google.com/p/protobuf/) are a way of encoding data in a
compact binary format. This issue is to write a ProtocolBuffersSerialization to support using
Protocol Buffers types in MapReduce programs, including an example program. This should probably
go into contrib.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
|