hadoop-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Steve Loughran <ste...@apache.org>
Subject Re: Defining Compatibility
Date Mon, 31 Jan 2011 15:59:06 GMT
On 31/01/11 14:32, Chris Douglas wrote:
> Steve-
>
> It's hard to answer without more concrete criteria. Is this a
> trademark question affecting the marketing of a product? A
> cross-compatibility taxonomy for users? The minimum criteria to
> publish a paper/release a product without eye-rolling? The particular
> compatibility claims made by a system will be nuanced and specific; a
> runtime that executes MapReduce jobs as they would run in Hadoop can
> simply make that claim, whether it uses parts of MapReduce, HDFS, or
> neither.

No, I'm thinking more about what large scale tests are needed to be run 
against the codebase before you can say "it works", and then how to say 
some changes means that it still works.

>
> For the various distributions "Powered by Apache Hadoop," one would
> assume that compatibility will vary depending on the featureset and
> the audience. A distribution that runs MapReduce applications
> as-written for Apache Hadoop may be incompatible with a user's
> deployed metrics/monitoring system. Some random script to scrape the
> UI may not work. The product may only scale to 20 nodes. Whether these
> are "compatible with Apache Hadoop" is awkward to answer generally,
> unless we want to define the semantics of that phrase by policy.
>
> To put it bluntly, why would we bother to define such a policy? One
> could assert that a fully-compatible system would implement all the
> public/stable APIs as defined in HADOOP-5073, but who would that help?
> And though interoperability is certainly relevant to systems built on
> top of Hadoop, is there a reason the Apache project needs to be
> involved in defining the standards for compatibility among them?

Agreed, I'm just thinking about namings and definitions. Even with the 
stable/unstable internal/external split, there's still the question as 
to what the semantics of operations are, both explicit (this operation 
does X) and implicit (and it takes less than Y seconds to do it). It's 
those implicit things that always catch you out (indeed, they are the 
argument points in things like Java and Java EE compatibility test kits)

Mime
View raw message