hadoop-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sanjay Radia <sra...@yahoo-inc.com>
Subject Re: Defining Hadoop Compatibility -revisiting-
Date Mon, 23 May 2011 16:27:31 GMT

On May 12, 2011, at 11:16 PM, Doug Cutting wrote:

> Certification semms like mission creep.  Our mission is to produce
> open-source software.  If we wish to produce testing software, that
> seems fine.  But running a certification program for non-open-source
> software seems like a different task.
> The Hadoop mark should only be used to refer to open-source software
> produced by the ASF.  If other folks wish to make factual statements
> concerning our software, e.g., that their proprietary software passes
> tests that we've created, that may be fine, but I don't think we  
> should
> validate those claims by granting certifications to institutions.   
> That
> ventures outside the mission of the ASF.  We are not an accrediting
> organization.
> Doug
> On 05/10/2011 12:29 PM, Steve Loughran wrote:
>> Back in Jan 2011, I started a discussion about how to define Apache
>> Hadoop Compatibility:
>> http://mail-archives.apache.org/mod_mbox/hadoop-general/201101.mbox/%3C4D46B6AD.2020802@apache.org%3E
>> I am now reading EMC HD "Enterprise Ready" Apache Hadoop datasheet
>> http://www.greenplum.com/sites/default/files/EMC_Greenplum_HD_DS_Final_1.pdf
>> It claims that their implementations are 100% compatible, even though
>> the Enterprise edition uses a C filesystem. It also claims that both
>> their software releases contain "Certified Stacks", without defining
>> what Certified means, or who does the certification -only that it  
>> is an
>> improvement.
>> I think we should revisit this issue before people with their own
>> agendas define what compatibility with Apache Hadoop is for us
>> Licensing
>> -Use of the Hadoop codebase must follow the Apache License
>> http://www.apache.org/licenses/LICENSE-2.0
>> -plug in components that are dynamically linked to (Filesystems and
>> schedulers) don't appear to be derivative works on my reading of  
>> this,
>> Naming
>> -this is something for branding@apache, they will have their  
>> opinions.
>> The key one is that the name "Apache Hadoop" must get used, and it's
>> important to make clear it is a derivative work.
>> -I don't think you can claim to have a Distribution/Fork/Version of
>> Apache Hadoop if you swap out big chunks of it for alternate
>> filesystems, MR engines, etc. Some description of this is needed
>> "Supports the Apache Hadoop MapReduce engine on top of Filesystem  
>> XYZ"
>> Compatibility
>> -the definition of the Hadoop interfaces and classes is the Apache
>> Source tree,
>> -the definition of semantics of the Hadoop interfaces and classes is
>> the Apache Source tree, including the test classes.
>> -the verification that the actual semantics of an Apache Hadoop  
>> release
>> is compatible with the expected semantics is that current and future
>> tests pass
>> -bug reports can highlight incompatibility with expectations of
>> community users, and once incorporated into tests form part of the
>> compatibility testing
>> -vendors can claim and even certify their derivative works as
>> compatible with other versions of their derivative works, but cannot
>> claim compatibility with Apache Hadoop unless their code passes the
>> tests and is consistent with the bug reports marked as ("by design").
>> Perhaps we should have tests that verify each of these "by design"
>> bugreps to make them more formal.
>> Certification
>> -I have no idea what this means in EMC's case, they just say  
>> "Certified"
>> -As we don't do any certification ourselves, it would seem impossible
>> for us to certify that any derivative work is compatible.
>> -It may be best to state that nobody can certify their derivative as
>> "compatible with Apache Hadoop" unless it passes all current test  
>> suites
>> -And require that anyone who declares compatibility define what they
>> mean by this
>> This is a good argument for getting more functional tests out there
>> -whoever has more functional tests needs to get them into a test  
>> module
>> that can be used to test real deployments.

View raw message