hadoop-common-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Hadoop Wiki] Update of "Release1.0Requirements" by SanjayRadia
Date Thu, 09 Oct 2008 21:31:21 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change notification.

The following page has been changed by SanjayRadia:

  ''Sanjay, about versioning RPC parameters: On the mailing list I proposed a mechanism that,
with a small change to only the RPC mechanism itself, we could start manually versioning parameters
as they are modified.  Under this proposal, existing parameters implementations would not
need to be altered until they next change incompatibly.  It's perhaps not the best long-term
solution, but it would, if we wanted, permit us to start requiring back-compatible protocols
soon. --DougCutting''
+ '' Yes I saw that and am evaluating it in the context of Hadoop. I have done similar things
in the past and so know that it does work. I will comment further in that email thread. Thanks
for starting that thread. --SanjayRadia''
+ '''Doug has initiated a discussion on RPC versioning in a email thread sent to ''core-dev@hadoop.apache.org''
with the subject ''RPC versioning'' - please read and comment there.'''
  === Time frame for 1.0 to 2.0 ===
  What is the expectation for life of 1.0 before it goes to 2.0 Clearly if we switch from
1.0 to 2.0 in 3 months the compatibility benefit of 1.0 does not deliver much value for Hadoop
customers. A time frame of 12 months is probably the minimum.
@@ -74, +78 @@

  ''What's missing from the current RPC landscape?  Mostly transport-layer stuff.  (1) Transport
versioning.  Thrift doesn't provide transport-level handshakes, so we'd probably need to implement
our own transport.  This is possible, and we'd have to do it for protocol buffers too, but
we might not with Etch.  (2) Async transport.  For performance we need async servers at least,
and probably async clients.  Requests and responses must be multiplexed over shared connections.
 Thrift doesn't yet provide this for Java.  Etch may solve both of these or none or have other
problems.  It would be nice to get as much as possible from an external project, reinventing
the minimum.  So we should certainly start experimenting now.  Someone could, e.g., port Thrift
and/or protocol buffers to run on top of Hadoop's existing transport layer.  We could immediately
incorporate any improvements that make the transport more easily usable for Thrift and Protocol
Buffers, and we'd probably identi
 fy other issues in the process.  Fundamentally, I don't think switching the RPC is a move
we can schedule without more work up front.  But we should certainly start experimenting now.
+ ''RPC systems are usually 2 layers: a RPC type system (where the rpc interfaces,  parameter
types, and serialization are define) and a RPC transport. We should not pick a rpc system
that does not allow one to slip in arbitrary transports. Hadoop rcp transport is fairly good
now and in the very short term we should consider sticking to it. The additional transports
features you have mentioned further argues for this separation and independence. So if there
is a RPC system that offers a pluggable transports AND also has a suitable RPC type system,
we can safely pick that RPC system. We stick with Hadoop transport initially and later consider
extending it or switching to another (possibly the native transport of the rpc system). For
example Protocol Buffers does not even offer a transport - you *have* to plug one in. Having
said that I do share your view that it would be nice to wait for Etch to become public and
see if it has all or most of what we want. What we can't afford 
 to do is to wait too long ... there will always be another new rpc system just around the
corner. Further if our goals are a prefect RPC type system and a perfect RPC transport as
single package, then we risk waiting forever. Pick a good enough one and we can work to extend
it. The key to succeeding in picking one  (now or waiting and deciding in 6 months) is to
focus on the choosing one that has a good rpc type system AND allows a pluggable transport.
  ''Language Neutral - yes it will mean duplicated client-side and hence more work. From what
I have observed in the design discussion,  keeping the client side small was a criteria because
we were expecting language neutral protocols down the road. Do you feel that we should not
bother with language neutral protocols at all? --SanjayRadia''
  ''I think we should be very careful about which network protocols we publicly expose.  Currently
we expose none.  I do not think we should attempt to expose all soon.  A first obvious candidate
to expose might be the job submission protocol.  Before we do so we should closely revisit
its design, since it was not designed as an API with long-term, multi-language access in mind.
 Any logic that we can easily move server-side, we should, to minimize duplicated code.  Etc.
 The HDFS protocols will require more scrutiny, since they involve more client side logic.
 It would be simpler if all of HDFS was implemented using RPC, not a mix of RPC and raw sockets.
 So we might decide to delay publicly exposing the HDFS protocols until we have made that
switch, should it prove feasable.  I think we could reasonably have a 1.0 release that exposed
none.  I do not see this as a gating issue for a 1.0 release.  We could reasonably expose
such protocols in the course of 1.x releases, no?  

View raw message