This is a really interesting topic! I completely agree that we need to get ahead of this.
I would be really interested in learning of any experience other apache projects, such as
apache or tomcat have with these issues.
---
E14 - typing on glass
On May 10, 2011, at 6:31 AM, "Steve Loughran" <stevel@apache.org> wrote:
>
> Back in Jan 2011, I started a discussion about how to define Apache
> Hadoop Compatibility:
> http://mail-archives.apache.org/mod_mbox/hadoop-general/201101.mbox/%3C4D46B6AD.2020802@apache.org%3E
>
> I am now reading EMC HD "Enterprise Ready" Apache Hadoop datasheet
>
> http://www.greenplum.com/sites/default/files/EMC_Greenplum_HD_DS_Final_1.pdf
>
> It claims that their implementations are 100% compatible, even though
> the Enterprise edition uses a C filesystem. It also claims that both
> their software releases contain "Certified Stacks", without defining
> what Certified means, or who does the certification -only that it is an
> improvement.
>
>
> I think we should revisit this issue before people with their own
> agendas define what compatibility with Apache Hadoop is for us
>
>
> Licensing
> -Use of the Hadoop codebase must follow the Apache License
> http://www.apache.org/licenses/LICENSE-2.0
> -plug in components that are dynamically linked to (Filesystems and
> schedulers) don't appear to be derivative works on my reading of this,
>
> Naming
> -this is something for branding@apache, they will have their opinions.
> The key one is that the name "Apache Hadoop" must get used, and it's
> important to make clear it is a derivative work.
> -I don't think you can claim to have a Distribution/Fork/Version of
> Apache Hadoop if you swap out big chunks of it for alternate
> filesystems, MR engines, etc. Some description of this is needed
> "Supports the Apache Hadoop MapReduce engine on top of Filesystem XYZ"
>
> Compatibility
> -the definition of the Hadoop interfaces and classes is the Apache
> Source tree,
> -the definition of semantics of the Hadoop interfaces and classes is
> the Apache Source tree, including the test classes.
> -the verification that the actual semantics of an Apache Hadoop
> release is compatible with the expected semantics is that current and
> future tests pass
> -bug reports can highlight incompatibility with expectations of
> community users, and once incorporated into tests form part of the
> compatibility testing
> -vendors can claim and even certify their derivative works as
> compatible with other versions of their derivative works, but cannot
> claim compatibility with Apache Hadoop unless their code passes the
> tests and is consistent with the bug reports marked as ("by design").
> Perhaps we should have tests that verify each of these "by design"
> bugreps to make them more formal.
>
> Certification
> -I have no idea what this means in EMC's case, they just say "Certified"
> -As we don't do any certification ourselves, it would seem impossible
> for us to certify that any derivative work is compatible.
> -It may be best to state that nobody can certify their derivative as
> "compatible with Apache Hadoop" unless it passes all current test suites
> -And require that anyone who declares compatibility define what they
> mean by this
>
> This is a good argument for getting more functional tests out there
> -whoever has more functional tests needs to get them into a test module
> that can be used to test real deployments.
>
|