hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Steve Loughran (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-5071) Hadoop 1.0 Compatibility Requirements
Date Thu, 05 Mar 2009 11:53:56 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-5071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12679166#action_12679166

Steve Loughran commented on HADOOP-5071:

5b,Intra-service wire-protocol compatibility

That's really hard to achieve. Even if you keep the wire format the same, its keeping the
explicit and implicit semantics consistent that's tricky. I've been there too many times with
web service protocols whose adoption of XML was meant to handle versioning well, but turned
out not to.

I'd be worried about running multiple datanode versions in a single cluster, for example.

What I'd be happier with would be for a long-haul API build on something like JAX-RS that
had enough of a stability guarantee that client apps -be they IDE plugins, command line tools,
firefox addons- could talk the far end through whatever proxies and firewalls got in the way,
with the Java-level API compatibility rules helping the uploaded code to work.

As we don't yet have a JAX-RS based long-haul API yet, we could design in some of this stuff
from the outset.

> Hadoop 1.0 Compatibility Requirements
> -------------------------------------
>                 Key: HADOOP-5071
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5071
>             Project: Hadoop Core
>          Issue Type: Sub-task
>            Reporter: Sanjay Radia
>            Assignee: Sanjay Radia
> The purpose of this Jira is to decide on  Hadoop 1.0 Compatibility requirements
> A proposal is described below that was discussed on email alias core-dev@hadoop.apache.org
> Release terminology used below:
> *Standard release numbering: major, minor, dot releases*
> * Only bug fixes in dot releases: m.x.y
> ** no changes to API, disk format, protocols or config etc. in a dot release
> * new features in major (m.0) and minor (m.x.0) releases
> *Hadoop Compatibility Proposal*
> - *1 API Compatibility*
> No need for client recompilation when upgrading across minor releases (ie. from m.x to
m.y, where x <= y)
> Classes or methods deprecated in m.x can be removed in (m+1).0
> Note that this is stronger than what we have been doing in Hadoop 0.x releases.
> 	This is fairly standard compatibility rules for major and minor releases.
> - *2 Data Compatibility*
> -- Motivation: Users expect File systems preserve data transparently across releases.
> -- 2.a HDFS metadata and data can change across minor or major releases , but such changes
are transparent to user application. That is release upgrade must automatically convert the
metadata and data as needed. Further, a release upgrade must allow a cluster to roll back
to the older version and its older disk format. (rollback needs to restore the orignal data
not any updated data).
> -- 2.a-WeakerAutomaticConversion:
> Automatic conversion is support across a small number of releases. If a user wants to
jump across multiple releases he may be forced to go through a few intermediate release to
get to the final desired release.
> - *3 Wire Protocol Compatibility*
> We offer no wire compatibility in our 0.x release today.
> -- Motivation: The motivation *isn't* to make the hadoop protocols public. Applications
will not call the protocol directly but through a library (in our case FileSystem class and
its implementations). Instead the motivation is that customers run multiple clusters and have
apps that access data across clusters. Customers cannot be expected to update all clusters
> -- 3.a Old m.x clients can connect to new m.y servers, where x <= y but the old clients
might get reduced functionality or performance. m.x clients might not be able to connect to
(m+1).z servers
> -- 3.b. New m.y clients must be able to connect to old m.x server, where x< y but
only for old m.x functionality.
> Comment: Generally old API methods continue to use old rpc methods. However, it is legal
to have new implementations of old API methods call new
> rpcs methods, as long as the library transparently handles the fallback case for old
> -- 3.c. At any major release transition [ ie from a release m.x to a release (m+1).0],
a user should be able to read data from the cluster running the old version.
> --- Motivation: data copying across clusters is a common operation for many customers.
For example this is routinely at done at Yahoo; another use case is HADOOP-4058. Today, http
(or hftp) provides a guaranteed compatible way of copying data across versions. Clearly one
cannot force a customer to simultaneously update all its Hadoop clusters on to a new major
release.  We can satisfy this requirement via the http/hftp mechanism or some other mechanism.
> -- 3.c-Stronger
> Shall we add a stronger requirement for 1. 0 : wire compatibility across major versions?
That is not just for reading but for all operations. This can be supported by class loading
or other games.
> Note we can wait to provide this when 2. 0 happens. If Hadoop provided this guarantee
then it would allow customers to partition their data across clusters without risking apps
breaking across major releases due to wire incompatibility issues.
> --- Motivation: Data copying is a compromise. Customers really want to run apps across
clusters running different versions.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message