hadoop-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Milind Bhandarkar <mbhandar...@linkedin.com>
Subject Re: Defining Hadoop Compatibility -revisiting-
Date Wed, 11 May 2011 21:46:18 GMT
I think it's time to separate out functional tests as a "Hadoop
Compatibility Kit (HCK)", similar to the Sun TCK for Java, but under ASL
2.0. Then "certification" would mean "Passes 100% of the HCK testsuite."

- milind
Milind Bhandarkar

On 5/11/11 2:24 PM, "Eric Baldeschwieler" <eric14@yahoo-inc.com> wrote:

>This is a really interesting topic!  I completely agree that we need to
>get ahead of this.
>I would be really interested in learning of any experience other apache
>projects, such as apache or tomcat have with these issues.
>E14 - typing on glass
>On May 10, 2011, at 6:31 AM, "Steve Loughran" <stevel@apache.org> wrote:
>> Back in Jan 2011, I started a discussion about how to define Apache
>> Hadoop Compatibility:
>> I am now reading EMC HD "Enterprise Ready" Apache Hadoop datasheet
>> It claims that their implementations are 100% compatible, even though
>> the Enterprise edition uses a C filesystem. It also claims that both
>> their software releases contain "Certified Stacks", without defining
>> what Certified means, or who does the certification -only that it is an
>> improvement.
>> I think we should revisit this issue before people with their own
>> agendas define what compatibility with Apache Hadoop is for us
>> Licensing
>> -Use of the Hadoop codebase must follow the Apache License
>> http://www.apache.org/licenses/LICENSE-2.0
>> -plug in components that are dynamically linked to (Filesystems and
>> schedulers) don't appear to be derivative works on my reading of this,
>> Naming
>>  -this is something for branding@apache, they will have their opinions.
>> The key one is that the name "Apache Hadoop" must get used, and it's
>> important to make clear it is a derivative work.
>>  -I don't think you can claim to have a Distribution/Fork/Version of
>> Apache Hadoop if you swap out big chunks of it for alternate
>> filesystems, MR engines, etc. Some description of this is needed
>> "Supports the Apache Hadoop MapReduce engine on top of Filesystem XYZ"
>> Compatibility
>>  -the definition of the Hadoop interfaces and classes is the Apache
>> Source tree,
>>  -the definition of semantics of the Hadoop interfaces and classes is
>> the Apache Source tree, including the test classes.
>>  -the verification that the actual semantics of an Apache Hadoop
>> release is compatible with the expected semantics is that current and
>> future tests pass
>>  -bug reports can highlight incompatibility with expectations of
>> community users, and once incorporated into tests form part of the
>> compatibility testing
>>  -vendors can claim and even certify their derivative works as
>> compatible with other versions of their derivative works, but cannot
>> claim compatibility with Apache Hadoop unless their code passes the
>> tests and is consistent with the bug reports marked as ("by design").
>> Perhaps we should have tests that verify each of these "by design"
>> bugreps to make them more formal.
>> Certification
>>  -I have no idea what this means in EMC's case, they just say
>>  -As we don't do any certification ourselves, it would seem impossible
>> for us to certify that any derivative work is compatible.
>>  -It may be best to state that nobody can certify their derivative as
>> "compatible with Apache Hadoop" unless it passes all current test suites
>>  -And require that anyone who declares compatibility define what they
>> mean by this
>> This is a good argument for getting more functional tests out there
>> -whoever has more functional tests needs to get them into a test module
>> that can be used to test real deployments.

View raw message