hadoop-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jacob R Rideout <apa...@jacobrideout.net>
Subject Re: Defining Hadoop Compatibility -revisiting-
Date Wed, 11 May 2011 22:56:33 GMT
What about defining compatibility as fully implementing all the
public-stable annotated interfaces for a particular release?

Jacob Rideout

On Wed, May 11, 2011 at 4:42 PM, Ian Holsman <hadoop@holsman.net> wrote:
> For apache (httpd I'm assuming you mean). we define compatibility as adherence to the
set of RFC's that define the HTTP protocol.
> I'm no expert in this (Roy is though), but we could attempt to do something similar when
it comes to HDFS/Map-Reduce protocols. I'm not sure what benefit there would be to going to
a RFC, as opposed to documenting the API on our site.
> On May 12, 2011, at 7:24 AM, Eric Baldeschwieler wrote:
>> This is a really interesting topic!  I completely agree that we need to get ahead
of this.
>> I would be really interested in learning of any experience other apache projects,
such as apache or tomcat have with these issues.
>> ---
>> E14 - typing on glass
>> On May 10, 2011, at 6:31 AM, "Steve Loughran" <stevel@apache.org> wrote:
>>> Back in Jan 2011, I started a discussion about how to define Apache
>>> Hadoop Compatibility:
>>> http://mail-archives.apache.org/mod_mbox/hadoop-general/201101.mbox/%3C4D46B6AD.2020802@apache.org%3E
>>> I am now reading EMC HD "Enterprise Ready" Apache Hadoop datasheet
>>> http://www.greenplum.com/sites/default/files/EMC_Greenplum_HD_DS_Final_1.pdf
>>> It claims that their implementations are 100% compatible, even though
>>> the Enterprise edition uses a C filesystem. It also claims that both
>>> their software releases contain "Certified Stacks", without defining
>>> what Certified means, or who does the certification -only that it is an
>>> improvement.
>>> I think we should revisit this issue before people with their own
>>> agendas define what compatibility with Apache Hadoop is for us
>>> Licensing
>>> -Use of the Hadoop codebase must follow the Apache License
>>> http://www.apache.org/licenses/LICENSE-2.0
>>> -plug in components that are dynamically linked to (Filesystems and
>>> schedulers) don't appear to be derivative works on my reading of this,
>>> Naming
>>> -this is something for branding@apache, they will have their opinions.
>>> The key one is that the name "Apache Hadoop" must get used, and it's
>>> important to make clear it is a derivative work.
>>> -I don't think you can claim to have a Distribution/Fork/Version of
>>> Apache Hadoop if you swap out big chunks of it for alternate
>>> filesystems, MR engines, etc. Some description of this is needed
>>> "Supports the Apache Hadoop MapReduce engine on top of Filesystem XYZ"
>>> Compatibility
>>> -the definition of the Hadoop interfaces and classes is the Apache
>>> Source tree,
>>> -the definition of semantics of the Hadoop interfaces and classes is
>>> the Apache Source tree, including the test classes.
>>> -the verification that the actual semantics of an Apache Hadoop
>>> release is compatible with the expected semantics is that current and
>>> future tests pass
>>> -bug reports can highlight incompatibility with expectations of
>>> community users, and once incorporated into tests form part of the
>>> compatibility testing
>>> -vendors can claim and even certify their derivative works as
>>> compatible with other versions of their derivative works, but cannot
>>> claim compatibility with Apache Hadoop unless their code passes the
>>> tests and is consistent with the bug reports marked as ("by design").
>>> Perhaps we should have tests that verify each of these "by design"
>>> bugreps to make them more formal.
>>> Certification
>>> -I have no idea what this means in EMC's case, they just say "Certified"
>>> -As we don't do any certification ourselves, it would seem impossible
>>> for us to certify that any derivative work is compatible.
>>> -It may be best to state that nobody can certify their derivative as
>>> "compatible with Apache Hadoop" unless it passes all current test suites
>>> -And require that anyone who declares compatibility define what they
>>> mean by this
>>> This is a good argument for getting more functional tests out there
>>> -whoever has more functional tests needs to get them into a test module
>>> that can be used to test real deployments.

View raw message