hadoop-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Eric Baldeschwieler <eri...@yahoo-inc.com>
Subject Re: Defining Hadoop Compatibility -revisiting-
Date Wed, 11 May 2011 21:24:47 GMT
This is a really interesting topic!  I completely agree that we need to get ahead of this.

I would be really interested in learning of any experience other apache projects, such as
apache or tomcat have with these issues. 

E14 - typing on glass

On May 10, 2011, at 6:31 AM, "Steve Loughran" <stevel@apache.org> wrote:

> Back in Jan 2011, I started a discussion about how to define Apache 
> Hadoop Compatibility:
> http://mail-archives.apache.org/mod_mbox/hadoop-general/201101.mbox/%3C4D46B6AD.2020802@apache.org%3E
> I am now reading EMC HD "Enterprise Ready" Apache Hadoop datasheet
> http://www.greenplum.com/sites/default/files/EMC_Greenplum_HD_DS_Final_1.pdf
> It claims that their implementations are 100% compatible, even though 
> the Enterprise edition uses a C filesystem. It also claims that both 
> their software releases contain "Certified Stacks", without defining 
> what Certified means, or who does the certification -only that it is an 
> improvement.
> I think we should revisit this issue before people with their own 
> agendas define what compatibility with Apache Hadoop is for us
> Licensing
> -Use of the Hadoop codebase must follow the Apache License
> http://www.apache.org/licenses/LICENSE-2.0
> -plug in components that are dynamically linked to (Filesystems and 
> schedulers) don't appear to be derivative works on my reading of this,
> Naming
>  -this is something for branding@apache, they will have their opinions. 
> The key one is that the name "Apache Hadoop" must get used, and it's 
> important to make clear it is a derivative work.
>  -I don't think you can claim to have a Distribution/Fork/Version of 
> Apache Hadoop if you swap out big chunks of it for alternate 
> filesystems, MR engines, etc. Some description of this is needed
> "Supports the Apache Hadoop MapReduce engine on top of Filesystem XYZ"
> Compatibility
>  -the definition of the Hadoop interfaces and classes is the Apache 
> Source tree,
>  -the definition of semantics of the Hadoop interfaces and classes is 
> the Apache Source tree, including the test classes.
>  -the verification that the actual semantics of an Apache Hadoop 
> release is compatible with the expected semantics is that current and 
> future tests pass
>  -bug reports can highlight incompatibility with expectations of 
> community users, and once incorporated into tests form part of the 
> compatibility testing
>  -vendors can claim and even certify their derivative works as 
> compatible with other versions of their derivative works, but cannot 
> claim compatibility with Apache Hadoop unless their code passes the 
> tests and is consistent with the bug reports marked as ("by design"). 
> Perhaps we should have tests that verify each of these "by design" 
> bugreps to make them more formal.
> Certification
>  -I have no idea what this means in EMC's case, they just say "Certified"
>  -As we don't do any certification ourselves, it would seem impossible 
> for us to certify that any derivative work is compatible.
>  -It may be best to state that nobody can certify their derivative as 
> "compatible with Apache Hadoop" unless it passes all current test suites
>  -And require that anyone who declares compatibility define what they 
> mean by this
> This is a good argument for getting more functional tests out there 
> -whoever has more functional tests needs to get them into a test module 
> that can be used to test real deployments.

View raw message