hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Doug Cutting <cutt...@apache.org>
Subject Re: Hadoop Network Protocols
Date Tue, 26 Sep 2006 00:18:07 GMT
Andrew McNabb wrote:
> I was just curious whether Hadoop's network protocols are documented
> anywhere.  I think someone mentioned today that Hadoop was designed so
> that implementations could be written in other languages.  I was
> wondering how hard that would be and what the protocols are like.

They're not well documented.  And it would be over-stating it to say 
that they're designed to be implemented in other languages.  Rather I'd 
claim that we've tried to keep language-independence in mind, but, as we 
all know, things aren't portable until they've been ported.

There are folks who've stated that they intend to port these to C or C++ 
someday.  The first step would be to change all RPC parameters and 
return values to be defined with the record API, which does have 
documented and implemented C bindings:


Hadoop's network code has two more layers: IPC and RPC.

IPC permits one to send an object implementing Writable as a request, 
and return a single object that also implements Writable.  The class of 
requests on a port is assumed to be known by the server, and the class 
of responses is assumed to be known by the client.  The protocol is 
roughly as follows:
  - a TCP connection is opened from client to server
  - [ we should add version negotiation here ]
  - the client loops, sending:
      <CallId> a four-byte integer naming the call
      <Length> the four-byte length of the request
      <byte>* the request payload
  - asynchronously, the server loops, sending:
     <CallId> the call whose response is ready
     <Error> a boolean indicating whether the call succeeded
        if true, the error data follows
        otherwise, the response data follows.

Responses are not always returned in the order requests are sent. 
Response data is not yet length-prefixed, but should be.  There's 
nothing Java-specific in the IPC layer.

RPC layers methods on this.  This uses lots of Java-specific stuff, like 
method names and class names.  The request Writable implementation is 
the private class RPC.Invocation.  This just writes the method name as a 
  string, the number of parameters, then writes each parameter using 
ObjectWritable.  Responses are passed using ObjectWritable.

ObjectWritable is mostly simply a class name followed by class-specific 
instance data.  There's a little more to it, since Java's primitive 
types are not classes.

So the RPC layer assumes that you can, given a class name, instantiate 
an instance and call its readFields method.  It also assumes that, given 
a method name and a parameter list, you can call a method.  Java's 
reflection makes this easy.  But to do this in C or C++ would probably 
require moving the specification of protocols out of Java.  Currently we 
use Java interfaces for protocols, but we should instead use a language 
that builds on Hadoop's record API and that can generate Java 
interfaces, as well as C++ client and server stubs.


View raw message