hadoop-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Steve Loughran <ste...@apache.org>
Subject Re: [DISCUSS] Apache Hadoop 1.0?
Date Thu, 17 Nov 2011 10:45:47 GMT
On 17/11/11 02:06, Scott Carey wrote:
> On 11/16/11 3:51 PM, "Nathan Roberts"<nroberts@yahoo-inc.com>  wrote:
>> On 11/16/11 4:43 PM, "Arun C Murthy"<acm@hortonworks.com>  wrote:
>>> I propose we adopt the convention that a new major version should be a
>>> superset of the previous major version, features-wise.
>> Just so I'm clear. This is only guaranteed at the time the new major
>> version is started. A day later a previous major line may merge a feature
>>from trunk and then it's no longer the case that 2.x.y is a superset. If
>> that's the case I'm not sure of the value of the convention. We could say
>> that new major versions always start from trunk, but that doesn't have
>> meaning outside of the developer community.
> I don't think in general one can say that major versions are a superset of
> previous major versions.  Then you would need to have a SuperMajor version
> number for the (rare) times that this was broken.
> In other words, the major version number really can't have any
> restrictions.
> Perhaps however, one can say that minor versions are supersets of prior
> minor version if one were to define 'superset'.
> Its going to be hard to claim that the 0.23 branch is a superset of 0.22
> -- After all, there is no JobTracker and all sorts of stuff has been
> removed or replaced with something else.  Whether that defines a superset
> or not gets into a lot of semantics of what we mean by 'superset'.

> Perhaps like 'feature' or 'bug fix', it is best not to get into the
> semantics of defining what we mean by 'superset' and rather define version
> number meaning only in terms of compatibility classifications.  Especially
> since the compatibility classification has implications for all of these
> other things  -- and IMO more clearly useful ones.  For example, consider
> that a "bug fix" may break wire compatibility, that a tiny harmless change
> can be considered a "new feature", or that replacing a single link in a UI
> could be considered breaking a "superset" rule.

I think it would be good to distinguish user-API supersets/subsets with 
internal superset/subsets

-0.23 is a superset of the MR and HDFS APIs compatible with previous 
versions (I don't know or care whether or not it is a proper superset or 
not). The goal here is that end user apps and higher levels in the stack 
(in-ASF and out-ASF) should work, though testing is required to verify 

A failure of the layers above to work with 0.23+ is something that 
should be considered a regression, looked at and then either dismissed 
as "you weren't meant to do that" or triggers a fix.

-0.23 has changed the back end means by which jobs are scheduled; the 
monitoring APIs have changed, etc, etc. Where people will see a visible 
difference is in the JT Web UI. That's not an API-level change

A failure of any code that goes into this bit of the system to compile 
or run against 0.23 is something people can feel slightly sorry about, 
but not enough to trigger reversions.

What I will miss in 0.23 is the MiniMRCluster, which I consider to be 
part of the API. Certainly its why I pull in 
hadoop-common-test-0.20.20x.jar into downstream builds, because it is 
the simplest way to do basic tests in junit of MR operations. It's also 
the most lightweight way to do single-machine Hadoop runs over small 

View raw message