hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Todd Lipcon (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-2060) DFS client RPCs using protobufs
Date Wed, 20 Jul 2011 20:34:00 GMT

    [ https://issues.apache.org/jira/browse/HDFS-2060?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13068614#comment-13068614

Todd Lipcon commented on HDFS-2060:

We had a bit of discussion about this at the contributors meeting a few weeks ago (the week
of the summit). My takeaways from that meeting were:

- Several people expressed an opinion that it would be nicer to not have protobuf-specific
code in any HDFS classes. Sidd described the approach used in MR2. If I understood him correctly,
it uses a class structure like:

interface FooWireType {
  long getBlah();
  void setBlah(long x);
  ... getters and setters ...
  ... serialization/deseriailization stuff?...

class FooWireTypeProtoImpl implements FooWireType {
  // wraps FooWireProto, which is the generated class

interface WireTypeFactory {
  FooWireType createFooType();
  BarWireType createBarWireType();

class WireTypeProtoFactory implements WireTypeFactory {
  // returns *ProtoImpl implementations

The upside of this approach is that it would be possible to switch serialization mechanisms
(eg to avro or thrift) without changing any of the code in the DFS layer -- just need to implement
a different WireTypeFactory. The downside of this approach is that it requires a bunch of
boilerplate interfaces and classes to be constructed. It would be possible to do this via
code-gen, but no one has a working code generator at this point.

- I argued that, while the above is nicer, it would be more expedient in the short term to
just implement this based on protobufs. I already summarized my reasoning [in this comment|https://issues.apache.org/jira/browse/HDFS-2058?focusedCommentId=13047289&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13047289].
The one-sentence version is that we need to move forward ASAP on this, and having something
that works now is better than taking months to do something slightly more general.

So, I would like to propose moving forward with the approach I outlined in this JIRA and the
demonstration patch. I can commit time to doing this. If others find the approach unsatisfactory
and can commit time to doing the more general mechanism on trunk in the short term, that would
be great, but I don't want to put off client compatibility much longer. I also don't think
we should move forward with the general mechanism until we have a reasonable code-gen infrastructure
ready -- it's just too much boilerplate to write and maintain.

> DFS client RPCs using protobufs
> -------------------------------
>                 Key: HDFS-2060
>                 URL: https://issues.apache.org/jira/browse/HDFS-2060
>             Project: Hadoop HDFS
>          Issue Type: New Feature
>    Affects Versions: 0.23.0
>            Reporter: Todd Lipcon
>            Assignee: Todd Lipcon
>         Attachments: hdfs-2060-getblocklocations.txt
> The most important place for wire-compatibility in DFS is between clients and the cluster,
since lockstep upgrade is very difficult and a single client may want to talk to multiple
server versions. So, I'd like to focus this JIRA on making the RPCs between the DFS client
and the NN/DNs wire-compatible using protocol buffer based serialization.

This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


View raw message