hadoop-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Milind Bhandarkar <mbhandar...@linkedin.com>
Subject Re: Defining Hadoop Compatibility -revisiting-
Date Fri, 13 May 2011 07:24:43 GMT
+1.

Apache foundation or contributors to Apache should not waste their energy
providing such certification.

Compatibility claims should be easily verifiable by users of these
proprietary systems or independent observers, if a test-suite were readily
available to run.

>The Hadoop mark should only be used to refer to open-source software
>produced by the ASF.


IANAL, but Steve is questioning usage of "Apache Hadoop Compatible" in PR
material of commercial software. Is this considered as usage of "The
Hadoop mark" ?

- milind

-- 
Milind Bhandarkar
mbhandarkar@linkedin.com
+1-650-776-3167






On 5/12/11 11:16 PM, "Doug Cutting" <cutting@apache.org> wrote:

>Certification semms like mission creep.  Our mission is to produce
>open-source software.  If we wish to produce testing software, that
>seems fine.  But running a certification program for non-open-source
>software seems like a different task.
>
>The Hadoop mark should only be used to refer to open-source software
>produced by the ASF.  If other folks wish to make factual statements
>concerning our software, e.g., that their proprietary software passes
>tests that we've created, that may be fine, but I don't think we should
>validate those claims by granting certifications to institutions.  That
>ventures outside the mission of the ASF.  We are not an accrediting
>organization.
>
>Doug
>
>On 05/10/2011 12:29 PM, Steve Loughran wrote:
>> 
>> Back in Jan 2011, I started a discussion about how to define Apache
>> Hadoop Compatibility:
>> 
>>http://mail-archives.apache.org/mod_mbox/hadoop-general/201101.mbox/%3C4D
>>46B6AD.2020802@apache.org%3E
>> 
>> 
>> I am now reading EMC HD "Enterprise Ready" Apache Hadoop datasheet
>> 
>> 
>>http://www.greenplum.com/sites/default/files/EMC_Greenplum_HD_DS_Final_1.
>>pdf
>> 
>> 
>> It claims that their implementations are 100% compatible, even though
>> the Enterprise edition uses a C filesystem. It also claims that both
>> their software releases contain "Certified Stacks", without defining
>> what Certified means, or who does the certification -only that it is an
>> improvement.
>> 
>> 
>> I think we should revisit this issue before people with their own
>> agendas define what compatibility with Apache Hadoop is for us
>> 
>> 
>> Licensing
>> -Use of the Hadoop codebase must follow the Apache License
>> http://www.apache.org/licenses/LICENSE-2.0
>> -plug in components that are dynamically linked to (Filesystems and
>> schedulers) don't appear to be derivative works on my reading of this,
>> 
>> Naming
>>  -this is something for branding@apache, they will have their opinions.
>> The key one is that the name "Apache Hadoop" must get used, and it's
>> important to make clear it is a derivative work.
>>  -I don't think you can claim to have a Distribution/Fork/Version of
>> Apache Hadoop if you swap out big chunks of it for alternate
>> filesystems, MR engines, etc. Some description of this is needed
>> "Supports the Apache Hadoop MapReduce engine on top of Filesystem XYZ"
>> 
>> Compatibility
>>  -the definition of the Hadoop interfaces and classes is the Apache
>> Source tree,
>>  -the definition of semantics of the Hadoop interfaces and classes is
>> the Apache Source tree, including the test classes.
>>  -the verification that the actual semantics of an Apache Hadoop release
>> is compatible with the expected semantics is that current and future
>> tests pass
>>  -bug reports can highlight incompatibility with expectations of
>> community users, and once incorporated into tests form part of the
>> compatibility testing
>>  -vendors can claim and even certify their derivative works as
>> compatible with other versions of their derivative works, but cannot
>> claim compatibility with Apache Hadoop unless their code passes the
>> tests and is consistent with the bug reports marked as ("by design").
>> Perhaps we should have tests that verify each of these "by design"
>> bugreps to make them more formal.
>> 
>> Certification
>>  -I have no idea what this means in EMC's case, they just say
>>"Certified"
>>  -As we don't do any certification ourselves, it would seem impossible
>> for us to certify that any derivative work is compatible.
>>  -It may be best to state that nobody can certify their derivative as
>> "compatible with Apache Hadoop" unless it passes all current test suites
>>  -And require that anyone who declares compatibility define what they
>> mean by this
>> 
>> This is a good argument for getting more functional tests out there
>> -whoever has more functional tests needs to get them into a test module
>> that can be used to test real deployments.
>> 


Mime
View raw message