hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sanjay Radia <sra...@yahoo-inc.com>
Subject Re: Hadoop 1.0 Compatibility Discussion.
Date Wed, 22 Oct 2008 18:37:58 GMT

On Oct 21, 2008, at 5:23 PM, Konstantin Shvachko wrote:

> Sanjay Radia wrote:
>  >>          o  Can't remove deprecated classes or methods until 2.0
>
> Dhruba Borthakur wrote:
> > 1. APIs that are deprecated in x.y release can be removed in (x+1). 
> 0 release.
>
> Current rule is that apis deprecated in M.x.y can be remove in M.(x 
> +2).0
> I don't think we want neither relax nor stiffen this requirement.
>

I think we want to strengthen this to :  removal of deprecated methods/ 
classes only on major releases
Isn't this what major and minor releases mean?
I believe that is what customers will expect from a 1.0 release -  
stability till 2.0.
Are you worried that maintaining old methods is too much burden  
because there will be too many of them?


sanjay
>
>
> > 2.  Old 1.x clients can connect to new 1.y servers, where x <= y but
> > the old clients might get reduced functionality or performance. 1.x
> > clients might not be able to connect to 2.z servers.
> >
> > 3. HDFS disk format can change from 1.x to 1.y release and is
> > transparent to user-application. A cluster when rolling back to 1.x
> > from 1,y will revert back to the old disk format.
> >
> >>  * In a major release transition [ ie from a release x.y to a  
> release
> >> (x+1).0], a user should be able to read data from the cluster  
> running the
> >> old version.
> >
> > I think this is a good requirement to have. This will be very useful
> > when we run multiple clusters, especially across data centers
> > (HADOOP-4058 is a use-case).
>
> I don't see anything about compatibility model going from 1.*.* to  
> 2.0.0.
> Does that mean we do not provide compatibility between those?
> Does that mean compatibility between 1.*.* and 2.*.* is provided by  
> distcp?
> Or another way to ask the same question: will HDFS-1 and HDFS-2 be
> as different as ext2 and ext3?
> I am not saying this is bad just want it to be clarified.
>
> May be we should somehow structure this discussion into sections,  
> e.g.:
> - deprecation rules;
> - client/server communication compatibility;
> - inter version data format compatibility;
>     = meta-data compatibility
>     = block data compatibility
>
> --Konstantin
>
> >> --------
> >> What does Hadoop 1.0 mean?
> >>    * Standard release numbering: Only bug fixes in 1.x.y releases  
> and new
> >> features in 1.x.0 releases.
> >>    * No need for client recompilation when upgrading from 1.x to  
> 1.y, where
> >> x <= y
> >>          o  Can't remove deprecated classes or methods until 2.0
> >>     * Old 1.x clients can connect to new 1.y servers, where x <= y
> >>    * New FileSystem clients must be able to call old methods when  
> talking to
> >> old servers. This generally will be done by having old methods  
> continue to
> >> use old rpc methods. However, it is legal to have new  
> implementations of old
> >> methods call new rpcs methods, as long as the library  
> transparently handles
> >> the fallback case for old servers.
> >> -----------------
> >>
> >> A couple of  additional compatibility requirements:
> >>
> >> * HDFS metadata and data is preserved across release changes,  
> both major and
> >> minor. That is,
> >> whenever a release is upgraded, the HDFS metadata from the old  
> release will
> >> be converted automatically
> >> as needed.
> >>
> >> The above has been followed so far in Hadoop; I am just  
> documenting it in
> >> the 1.0 requirements list.
> >>
> >>  * In a major release transition [ ie from a release x.y to a  
> release
> >> (x+1).0], a user should be able to read data from the cluster  
> running the
> >> old version.  (OR shall we generalize this to: from x.y to (x 
> +i).z ?)
> >>
> >> The motivation: data copying across clusters is a common  
> operation for many
> >> customers
> >> (for example this is routinely at done at Yahoo.). Today, http  
> (or hftp)
> >> provides a guaranteed compatible way of copying data across  
> versions.
> >>  Clearly one cannot force a customer to simultaneously update all  
> its hadoop
> >> clusters on to
> >> a new major release. The above documents this requirement; we can  
> satisfy it
> >> via the http/hftp mechanism or some other mechanism.
> >>
> >> Question: is one is willing to break applications that operate  
> across
> >> clusters (ie an application that accesses data across clusters  
> that cross a
> >> major release boundary? I asked the operations team at Yahoo that  
> run our
> >> hadoop clusters. We currently do not have any applicaions that  
> access data
> >> across clusters as part  of a MR job. The reason being that  
> Hadoop routinely
> >> breaks  wire compatibility across releases and so such apps would  
> be very
> >> unreliable. However, the copying of data across clusters is t is  
> crucial and
> >> needs to be supported.
> >>
> >> Shall we add a stronger requirement for 1.0:  wire compatibility  
> across
> >> major versions? This can be supported by class loading or other  
> games. Note
> >> we can wait to provide this when 2.0 happens. If Hadoop provided  
> this
> >> guarantee then it would allow customers to partition their data  
> across
> >> clusters without risking apps breaking across major releases due  
> to wire
> >> incompatibility issues.
> >>
> >>
> >>
> >
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message