hadoop-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Aaron Kimball <akimbal...@gmail.com>
Subject Re: Defining Hadoop Compatibility -revisiting-
Date Wed, 11 May 2011 23:20:21 GMT
What does it mean to "implement" those interfaces? I'm +1 for a TCK-based
definition. In addition to statically implementing a set of interfaces, each
interface also implicitly includes a set of acceptable inputs and predicted
outputs (or ranges of outputs) for those inputs.

- Aaron

On Wed, May 11, 2011 at 3:56 PM, Jacob R Rideout <apache@jacobrideout.net>wrote:

> What about defining compatibility as fully implementing all the
> public-stable annotated interfaces for a particular release?
>
> Jacob Rideout
>
> On Wed, May 11, 2011 at 4:42 PM, Ian Holsman <hadoop@holsman.net> wrote:
> > For apache (httpd I'm assuming you mean). we define compatibility as
> adherence to the set of RFC's that define the HTTP protocol.
> >
> > I'm no expert in this (Roy is though), but we could attempt to do
> something similar when it comes to HDFS/Map-Reduce protocols. I'm not sure
> what benefit there would be to going to a RFC, as opposed to documenting the
> API on our site.
> >
> >
> > On May 12, 2011, at 7:24 AM, Eric Baldeschwieler wrote:
> >
> >> This is a really interesting topic!  I completely agree that we need to
> get ahead of this.
> >>
> >> I would be really interested in learning of any experience other apache
> projects, such as apache or tomcat have with these issues.
> >>
> >> ---
> >> E14 - typing on glass
> >>
> >> On May 10, 2011, at 6:31 AM, "Steve Loughran" <stevel@apache.org>
> wrote:
> >>
> >>>
> >>> Back in Jan 2011, I started a discussion about how to define Apache
> >>> Hadoop Compatibility:
> >>>
> http://mail-archives.apache.org/mod_mbox/hadoop-general/201101.mbox/%3C4D46B6AD.2020802@apache.org%3E
> >>>
> >>> I am now reading EMC HD "Enterprise Ready" Apache Hadoop datasheet
> >>>
> >>>
> http://www.greenplum.com/sites/default/files/EMC_Greenplum_HD_DS_Final_1.pdf
> >>>
> >>> It claims that their implementations are 100% compatible, even though
> >>> the Enterprise edition uses a C filesystem. It also claims that both
> >>> their software releases contain "Certified Stacks", without defining
> >>> what Certified means, or who does the certification -only that it is an
> >>> improvement.
> >>>
> >>>
> >>> I think we should revisit this issue before people with their own
> >>> agendas define what compatibility with Apache Hadoop is for us
> >>>
> >>>
> >>> Licensing
> >>> -Use of the Hadoop codebase must follow the Apache License
> >>> http://www.apache.org/licenses/LICENSE-2.0
> >>> -plug in components that are dynamically linked to (Filesystems and
> >>> schedulers) don't appear to be derivative works on my reading of this,
> >>>
> >>> Naming
> >>> -this is something for branding@apache, they will have their opinions.
> >>> The key one is that the name "Apache Hadoop" must get used, and it's
> >>> important to make clear it is a derivative work.
> >>> -I don't think you can claim to have a Distribution/Fork/Version of
> >>> Apache Hadoop if you swap out big chunks of it for alternate
> >>> filesystems, MR engines, etc. Some description of this is needed
> >>> "Supports the Apache Hadoop MapReduce engine on top of Filesystem XYZ"
> >>>
> >>> Compatibility
> >>> -the definition of the Hadoop interfaces and classes is the Apache
> >>> Source tree,
> >>> -the definition of semantics of the Hadoop interfaces and classes is
> >>> the Apache Source tree, including the test classes.
> >>> -the verification that the actual semantics of an Apache Hadoop
> >>> release is compatible with the expected semantics is that current and
> >>> future tests pass
> >>> -bug reports can highlight incompatibility with expectations of
> >>> community users, and once incorporated into tests form part of the
> >>> compatibility testing
> >>> -vendors can claim and even certify their derivative works as
> >>> compatible with other versions of their derivative works, but cannot
> >>> claim compatibility with Apache Hadoop unless their code passes the
> >>> tests and is consistent with the bug reports marked as ("by design").
> >>> Perhaps we should have tests that verify each of these "by design"
> >>> bugreps to make them more formal.
> >>>
> >>> Certification
> >>> -I have no idea what this means in EMC's case, they just say
> "Certified"
> >>> -As we don't do any certification ourselves, it would seem impossible
> >>> for us to certify that any derivative work is compatible.
> >>> -It may be best to state that nobody can certify their derivative as
> >>> "compatible with Apache Hadoop" unless it passes all current test
> suites
> >>> -And require that anyone who declares compatibility define what they
> >>> mean by this
> >>>
> >>> This is a good argument for getting more functional tests out there
> >>> -whoever has more functional tests needs to get them into a test module
> >>> that can be used to test real deployments.
> >>>
> >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message