Return-Path: X-Original-To: apmail-hadoop-general-archive@minotaur.apache.org Delivered-To: apmail-hadoop-general-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 9F8454A7C for ; Fri, 13 May 2011 06:25:42 +0000 (UTC) Received: (qmail 3474 invoked by uid 500); 13 May 2011 06:25:41 -0000 Delivered-To: apmail-hadoop-general-archive@hadoop.apache.org Received: (qmail 3017 invoked by uid 500); 13 May 2011 06:25:41 -0000 Mailing-List: contact general-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: general@hadoop.apache.org Delivered-To: mailing list general@hadoop.apache.org Received: (qmail 3009 invoked by uid 99); 13 May 2011 06:25:39 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 13 May 2011 06:25:39 +0000 X-ASF-Spam-Status: No, hits=-0.0 required=5.0 tests=RCVD_IN_DNSWL_LOW,SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (nike.apache.org: local policy) Received: from [74.125.83.176] (HELO mail-pv0-f176.google.com) (74.125.83.176) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 13 May 2011 06:25:33 +0000 Received: by pve37 with SMTP id 37so1487012pve.35 for ; Thu, 12 May 2011 23:25:11 -0700 (PDT) Received: by 10.68.57.235 with SMTP id l11mr1559473pbq.478.1305267911048; Thu, 12 May 2011 23:25:11 -0700 (PDT) MIME-Version: 1.0 Sender: cos@boudnik.org Received: by 10.68.62.230 with HTTP; Thu, 12 May 2011 23:24:51 -0700 (PDT) In-Reply-To: References: From: Konstantin Boudnik Date: Thu, 12 May 2011 23:24:51 -0700 X-Google-Sender-Auth: Y5YHFWFgZb22fXvUi-G9rRnipfI Message-ID: Subject: Re: Defining Hadoop Compatibility -revisiting- To: general@hadoop.apache.org Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable X-Virus-Checked: Checked by ClamAV on apache.org On Thu, May 12, 2011 at 20:40, Milind Bhandarkar wrote: > Cos, > > Can you give me an example of a "system test" that is not a functional > test ? My assumption was that the functionality being tested is specific > to a component, and that inter-component interactions (that's what you > meant, right?) would be taken care by the public interface and semantics > of a component API. Milind, kinda... However, to exercise inter-component interactions via component APIs one needs to have tests which are beyond functional or component realm (e.g. system). At some point I was part of a team working on integration validation framework for Hadoop (FIT) which was addressing inter-component interaction validations essentially guaranteeing their compatibility. Components being Hadoop, Pig, Oozie, etc. - thus massaging the whole stack of application and covering a lot of use cases. Having a framework like this and a set of test cases available for Hadoop community is a great benefit because one can quickly make sure that a Hadoop stack built from a set of components is working property. Another use case is to run the same set of tests - versioned separately from the product itself - against previous and a next release validating their compatibility at the functional level (sorta what you have mentioned). This doesn't by the way deploy if we'd choose to work on HCK or not, however HCK might be eventually based on top of such a framework. Cos > - milind > > -- > Milind Bhandarkar > mbhandarkar@linkedin.com > +1-650-776-3167 > > > > > > > On 5/12/11 3:30 PM, "Konstantin Boudnik" wrote: > >>On Thu, May 12, 2011 at 09:45, Milind Bhandarkar >> wrote: >>> HCK and written specifications are not mutually exclusive. However, >>>given >>> the evolving nature of Hadoop APIs, functional tests need to evolve as >> >>I would actually expand it to 'functional and system tests' because >>latter are capable of validating inter-component iterations not >>coverable by functional tests. >> >>Cos >> >>> well, and having them tied to a "current stable" version is easier to d= o >>> than it is to tie the written specifications. >>> >>> - milind >>> >>> -- >>> Milind Bhandarkar >>> mbhandarkar@linkedin.com >>> +1-650-776-3167 >>> >>> >>> >>> >>> >>> >>> On 5/11/11 7:26 PM, "M. C. Srivas" wrote: >>> >>>>While the HCK is a great idea to check quickly if an implementation is >>>>"compliant", =A0we still need a written specification to define what is >>>>meant >>>>by compliance, something akin to a set of RFC's, or a set of docs like >>>>the >>>> IEEE POSIX specifications. >>>> >>>>For example, the POSIX.1c pthreads API has a written document that >>>>specifies >>>>all the function calls, input params, return values, and error codes. I= t >>>>clearly indicates what any POSIX-complaint threads package needs to >>>>support, >>>>and what are vendor-specific non-portable extensions that one can use a= t >>>>one's own risk. >>>> >>>>Currently we have 2 sets of API =A0in the DFS and Map/Reduce layers, an= d >>>>the >>>>specification is extracted only by looking at the code, or (where the >>>>code >>>>is non-trivial) by writing really bizarre test programs to examine >>>>corner >>>>cases. Further, the interaction between a mix of the old and new APIs i= s >>>>not >>>>specified anywhere. Such specifications are vitally important when >>>>implementing libraries like Cascading, Mahout, etc. For example, an >>>>application might open a file using the new API, and pass that stream >>>>into a >>>>library that manipulates the stream using some of the old API ... what >>>>is >>>>then the expectation of the state of the stream when the library call >>>>returns? >>>> >>>>Sanjay Radia @ Y! already started specifying some the DFS APIs to nail >>>>such >>>>things down. There's similar good effort in the Map/Reduce and =A0Avro >>>>spaces, >>>>but it seems to have stalled somewhat. We should continue it. >>>> >>>>Doing such specs would be a great service to the community and the user= s >>>>of >>>>Hadoop. It provides them >>>> =A0 (a) clear-cut docs on how to use the Hadoop APIs >>>> =A0 (b) wider choice of Hadoop implementations by freeing them from >>>>vendor >>>>lock-in. >>>> >>>>Once we have such specification, the HCK becomes meaningful (since the >>>>HCK >>>>itself will be buggy initially). >>>> >>>> >>>>On Wed, May 11, 2011 at 2:46 PM, Milind Bhandarkar >>>>>>>> wrote: >>>> >>>>> I think it's time to separate out functional tests as a "Hadoop >>>>> Compatibility Kit (HCK)", similar to the Sun TCK for Java, but under >>>>>ASL >>>>> 2.0. Then "certification" would mean "Passes 100% of the HCK >>>>>testsuite." >>>>> >>>>> - milind >>>>> -- >>>>> Milind Bhandarkar >>>>> mbhandarkar@linkedin.com >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> On 5/11/11 2:24 PM, "Eric Baldeschwieler" >>>>>wrote: >>>>> >>>>> >This is a really interesting topic! =A0I completely agree that we ne= ed >>>>>to >>>>> >get ahead of this. >>>>> > >>>>> >I would be really interested in learning of any experience other >>>>>apache >>>>> >projects, such as apache or tomcat have with these issues. >>>>> > >>>>> >--- >>>>> >E14 - typing on glass >>>>> > >>>>> >On May 10, 2011, at 6:31 AM, "Steve Loughran" >>>>>wrote: >>>>> > >>>>> >> >>>>> >> Back in Jan 2011, I started a discussion about how to define Apach= e >>>>> >> Hadoop Compatibility: >>>>> >> >>>>> >> >>>>> >>>>>http://mail-archives.apache.org/mod_mbox/hadoop-general/201101.mbox/%3= C >>>>>4D >>>>> >>46B6AD.2020802@apache.org%3E >>>>> >> >>>>> >> I am now reading EMC HD "Enterprise Ready" Apache Hadoop datasheet >>>>> >> >>>>> >> >>>>> >>>>>>>http://www.greenplum.com/sites/default/files/EMC_Greenplum_HD_DS_Fin= a >>>>>>>l_ >>>>>>>1 >>>>> . >>>>> >>pdf >>>>> >> >>>>> >> It claims that their implementations are 100% compatible, even >>>>>though >>>>> >> the Enterprise edition uses a C filesystem. It also claims that >>>>>both >>>>> >> their software releases contain "Certified Stacks", without >>>>>defining >>>>> >> what Certified means, or who does the certification -only that it >>>>>is >>>>>an >>>>> >> improvement. >>>>> >> >>>>> >> >>>>> >> I think we should revisit this issue before people with their own >>>>> >> agendas define what compatibility with Apache Hadoop is for us >>>>> >> >>>>> >> >>>>> >> Licensing >>>>> >> -Use of the Hadoop codebase must follow the Apache License >>>>> >> http://www.apache.org/licenses/LICENSE-2.0 >>>>> >> -plug in components that are dynamically linked to (Filesystems an= d >>>>> >> schedulers) don't appear to be derivative works on my reading of >>>>>this, >>>>> >> >>>>> >> Naming >>>>> >> =A0-this is something for branding@apache, they will have their >>>>>opinions. >>>>> >> The key one is that the name "Apache Hadoop" must get used, and >>>>>it's >>>>> >> important to make clear it is a derivative work. >>>>> >> =A0-I don't think you can claim to have a Distribution/Fork/Versio= n >>>>>of >>>>> >> Apache Hadoop if you swap out big chunks of it for alternate >>>>> >> filesystems, MR engines, etc. Some description of this is needed >>>>> >> "Supports the Apache Hadoop MapReduce engine on top of Filesystem >>>>>XYZ" >>>>> >> >>>>> >> Compatibility >>>>> >> =A0-the definition of the Hadoop interfaces and classes is the Apa= che >>>>> >> Source tree, >>>>> >> =A0-the definition of semantics of the Hadoop interfaces and class= es >>>>>is >>>>> >> the Apache Source tree, including the test classes. >>>>> >> =A0-the verification that the actual semantics of an Apache Hadoop >>>>> >> release is compatible with the expected semantics is that current >>>>>and >>>>> >> future tests pass >>>>> >> =A0-bug reports can highlight incompatibility with expectations of >>>>> >> community users, and once incorporated into tests form part of the >>>>> >> compatibility testing >>>>> >> =A0-vendors can claim and even certify their derivative works as >>>>> >> compatible with other versions of their derivative works, but >>>>>cannot >>>>> >> claim compatibility with Apache Hadoop unless their code passes th= e >>>>> >> tests and is consistent with the bug reports marked as ("by >>>>>design"). >>>>> >> Perhaps we should have tests that verify each of these "by design" >>>>> >> bugreps to make them more formal. >>>>> >> >>>>> >> Certification >>>>> >> =A0-I have no idea what this means in EMC's case, they just say >>>>> >>"Certified" >>>>> >> =A0-As we don't do any certification ourselves, it would seem >>>>>impossible >>>>> >> for us to certify that any derivative work is compatible. >>>>> >> =A0-It may be best to state that nobody can certify their derivati= ve >>>>>as >>>>> >> "compatible with Apache Hadoop" unless it passes all current test >>>>>suites >>>>> >> =A0-And require that anyone who declares compatibility define what >>>>>they >>>>> >> mean by this >>>>> >> >>>>> >> This is a good argument for getting more functional tests out ther= e >>>>> >> -whoever has more functional tests needs to get them into a test >>>>>module >>>>> >> that can be used to test real deployments. >>>>> >> >>>>> >>>>> >>> >>> > >