hadoop-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tom White <...@cloudera.com>
Subject Re: Defining Compatibility
Date Tue, 01 Feb 2011 17:50:27 GMT
FWIW the FileSystemContractBaseTest class and the FileContext*BaseTest
classes (and their concrete subclasses) are probably the closest thing
we have to compatibility tests for FileSystem and FileContext
implementations in Hadoop.


On Mon, Jan 31, 2011 at 7:59 AM, Steve Loughran <stevel@apache.org> wrote:
> On 31/01/11 14:32, Chris Douglas wrote:
>> Steve-
>> It's hard to answer without more concrete criteria. Is this a
>> trademark question affecting the marketing of a product? A
>> cross-compatibility taxonomy for users? The minimum criteria to
>> publish a paper/release a product without eye-rolling? The particular
>> compatibility claims made by a system will be nuanced and specific; a
>> runtime that executes MapReduce jobs as they would run in Hadoop can
>> simply make that claim, whether it uses parts of MapReduce, HDFS, or
>> neither.
> No, I'm thinking more about what large scale tests are needed to be run
> against the codebase before you can say "it works", and then how to say some
> changes means that it still works.
>> For the various distributions "Powered by Apache Hadoop," one would
>> assume that compatibility will vary depending on the featureset and
>> the audience. A distribution that runs MapReduce applications
>> as-written for Apache Hadoop may be incompatible with a user's
>> deployed metrics/monitoring system. Some random script to scrape the
>> UI may not work. The product may only scale to 20 nodes. Whether these
>> are "compatible with Apache Hadoop" is awkward to answer generally,
>> unless we want to define the semantics of that phrase by policy.
>> To put it bluntly, why would we bother to define such a policy? One
>> could assert that a fully-compatible system would implement all the
>> public/stable APIs as defined in HADOOP-5073, but who would that help?
>> And though interoperability is certainly relevant to systems built on
>> top of Hadoop, is there a reason the Apache project needs to be
>> involved in defining the standards for compatibility among them?
> Agreed, I'm just thinking about namings and definitions. Even with the
> stable/unstable internal/external split, there's still the question as to
> what the semantics of operations are, both explicit (this operation does X)
> and implicit (and it takes less than Y seconds to do it). It's those
> implicit things that always catch you out (indeed, they are the argument
> points in things like Java and Java EE compatibility test kits)

View raw message