hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sanjay Radia <sra...@yahoo-inc.com>
Subject Re: RPC versioning
Date Thu, 09 Oct 2008 22:26:39 GMT

On Oct 3, 2008, at 10:06 PM, Raghu Angadi wrote:

>
> If version handling is required, I think Doug's approach will work  
> well
> for current RPC.
>
> Extra complexity of handling different versions in object  
> serialization
> might be easily over estimated (for a duration of 1 year, say). I  
> would
> think easily more than 90% of objects' serialization has not changed  
> in
> last 1 to 2 years.
>
> As long as the innocent is protected (i.e. no existing write() method
> needs to change unless the fields change), it will be fine.
>
> Many times effective serialization changes mainly because of new sub
> classes and not the actual serialization method themselves.
>
> Do we handle change of arguments to a method similarly? How are
> subclasses handled?
>
Change to method arguments? -
   One possible solution: create a new method instead -- this will be  
good enough if it is for a short time.

Subclassing? - don't; instead  add or delete fields.

>
>
> Raghu.
>
> Doug Cutting wrote:
> > It has been proposed in the discussions defining Hadoop 1.0 that we
> > extend our back-compatibility policy.
> >
> > http://wiki.apache.org/hadoop/Release1.0Requirements
> >
> > Currently we only attempt to promise that application code will run
> > without change against compatible versions of Hadoop.  If one has
> > clusters running different yet compatible versions, then one must  
> use a
> > different classpath for each cluster to pick up the appropriate  
> version
> > of Hadoop's client libraries.
> >
> > The proposal is that we extend this, so that a client library from  
> one
> > version of Hadoop will operate correctly with other compatible  
> Hadoop
> > versions, i.e., one need not alter one's classpath to contain the
> > identical version, only a compatible version.
> >
> > Question 1: Do we need to solve this problem soon, for release 1.0,
> > i.e., in order to provide a release whose compatibility lifetime  
> is ~1
> > year, instead of the ~4months of 0. releases?  This is not clear  
> to me.
> >  Can someone provide cases where using the same classpath when  
> talking
> > to multiple clusters is critical?
> >
> > Assuming it is, to implement this requires RPC-level support for
> > versioning.  We could add this by switching to an RPC mechanism with
> > built-in, automatic versioning support, like Thrift, Etch or  
> Protocol
> > Buffers.  But none of these is a drop-in replacement for Hadoop RPC.
> > They will probably not initially meet our performance and  
> scalability
> > requirements.  Their adoption will also require considerable and
> > destabilizing changes to Hadoop.  Finally, it is not today clear  
> which
> > of these would be the best candidate.  If we move too soon, we might
> > regret our choice and wish to move again later.
> >
> > So, if we answer yes to (1) above, wishing to provide RPC
> > back-compatibility in 1.0, but do not want to hold up a 1.0  
> release, is
> > there an alternative to switching?  Can we provide incremental
> > versioning support to Hadoop's existing RPC mechanism that will  
> suffice
> > until a clear replacement is available?
> >
> > Below I suggest a simple versioning style that Hadoop might use to
> > permit its RPC protocols to evolve compatibly until an RPC system  
> with
> > built-in versioning support is selected.  This is not intended to  
> be a
> > long-term solution, but rather something that would permit us to  
> more
> > flexibly evolve Hadoop's protocols over the next year or so.
> >
> > This style assumes a globally increasing Hadoop version number.  For
> > example, this might be the subversion repository version of trunk  
> when
> > a change is first introduced.
> >
> > When an RPC client and server handshake, they exchange version
> > numbers.  The lower of their two version numbers is selected as the
> > version for the connection.
> >
> > Let's walk through an example.  We start with a class that contains
> > no versioning information and a single field, 'a':
> >
> >   public class Foo implements Writable {
> >     int a;
> >
> >     public void write(DataOutput out) throws IOException {
> >       out.writeInt(a);
> >     }
> >
> >     public void readFields(DataInput in) throws IOException {
> >       a = in.readInt();
> >     }
> >   }
> >
> > Now, in version 1, we add a second field, 'b' to this:
> >
> >   public class Foo implements Writable {
> >     int a;
> >     float b;                                        // new field
> >
> >     public void write(DataOutput out) throws IOException {
> >       int version = RPC.getVersion(out);
> >       out.writeInt(a);
> >       if (version >= 1) {                           // peer  
> supports b
> >     out.writeFloat(b);                          // send it
> >       }
> >     }
> >
> >     public void readFields(DataInput in) throws IOException {
> >       int version = RPC.getVersion(in);
> >       a = in.readInt();
> >       if (version >= 1) {                           // if supports b
> >     b = in.readFloat();                         // read it
> >       }
> >     }
> >   }
> >
> > Next, in version 2, we remove the first field, 'a':
> >
> >   public class Foo implements Writable {
> >     float b;
> >
> >     public void write(DataOutput out) throws IOException {
> >       int version = RPC.getVersion(out);
> >       if (version < 2) {                            // peer wants a
> >     out.writeInt(0);                            // send it
> >       }
> >       if (version >= 1) {
> >     out.writeFloat(b);
> >       }
> >     }
> >
> >     public void readFields(DataInput in) throws IOException {
> >       int version = RPC.getVersion(in);
> >       if (version < 2) {                            // peer writes a
> >     in.readInt();                               // ignore it
> >       }
> >       if (version >= 1) {
> >     b = in.readFloat();
> >       }
> >     }
> >   }
> >
> > Could something like this work?  It would require just some minor
> > changes to Hadoop's RPC mechanism, to support the version handshake.
> > Beyond that, it could be implemented incrementally as RPC protocols
> > evolve.  It would require some vigilance, to make sure that  
> versioning
> > logic is added when classes change, but adding automated tests  
> against
> > prior versions would identify lapses  here.
> >
> > This may appear to add a lot of version-related logic, but with
> > automatic versioning, in many cases, some version-related logic is  
> still
> > required.  In simple cases, one adds a completely new field with a
> > default value and is done, with automatic versioning handling much  
> of
> > the work.  But in many other cases an existing field is changed  
> and the
> > application must translate old values to new, and vice versa.  These
> > cases still require application logic, even with automatic  
> versioning.
> > So automatic versioning is certainly less intrusive, but not as  
> much as
> > one might first assume.
> >
> > The fundamental question is how soon we need to address inter- 
> version
> > RPC compatibility.  If we wish to do it soon, I think we'd be wise  
> to
> > consider a solution that's less invasive and that does not force  
> us into
> > a potentially premature decision.
> >
> > Doug
>
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message