Return-Path: Delivered-To: apmail-hadoop-general-archive@minotaur.apache.org Received: (qmail 37516 invoked from network); 31 Jan 2011 15:59:57 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 31 Jan 2011 15:59:57 -0000 Received: (qmail 60827 invoked by uid 500); 31 Jan 2011 15:59:56 -0000 Delivered-To: apmail-hadoop-general-archive@hadoop.apache.org Received: (qmail 60075 invoked by uid 500); 31 Jan 2011 15:59:52 -0000 Mailing-List: contact general-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: general@hadoop.apache.org Delivered-To: mailing list general@hadoop.apache.org Received: (qmail 59598 invoked by uid 99); 31 Jan 2011 15:59:51 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 31 Jan 2011 15:59:51 +0000 X-ASF-Spam-Status: No, hits=-1.6 required=5.0 tests=RCVD_IN_DNSWL_MED,SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (nike.apache.org: local policy) Received: from [192.6.10.2] (HELO colossus.hpl.hp.com) (192.6.10.2) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 31 Jan 2011 15:59:41 +0000 Received: from localhost (localhost [127.0.0.1]) by colossus.hpl.hp.com (Postfix) with ESMTP id C8E7C1BA427 for ; Mon, 31 Jan 2011 15:59:20 +0000 (GMT) X-Virus-Scanned: Debian amavisd-new at hpl.hp.com Received: from colossus.hpl.hp.com ([127.0.0.1]) by localhost (colossus.hpl.hp.com [127.0.0.1]) (amavisd-new, port 10024) with LMTP id DQ3VeEtd6wvQ for ; Mon, 31 Jan 2011 15:59:20 +0000 (GMT) Received: from 0-imap-br1.hpl.hp.com (0-imap-br1.hpl.hp.com [16.25.144.60]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by colossus.hpl.hp.com (Postfix) with ESMTPS id 580451BA418 for ; Mon, 31 Jan 2011 15:59:20 +0000 (GMT) MailScanner-NULL-Check: 1297094346.52959@DoMIeDZN5AQM8svjB2bCaA Received: from [16.25.175.158] (morzine.hpl.hp.com [16.25.175.158]) by 0-imap-br1.hpl.hp.com (8.14.1/8.13.4) with ESMTP id p0VFx61P023970 for ; Mon, 31 Jan 2011 15:59:06 GMT Message-ID: <4D46DC4A.7040700@apache.org> Date: Mon, 31 Jan 2011 15:59:06 +0000 From: Steve Loughran User-Agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.2.13) Gecko/20101208 Thunderbird/3.1.7 MIME-Version: 1.0 To: general@hadoop.apache.org Subject: Re: Defining Compatibility References: <4D46B6AD.2020802@apache.org> In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-HPL-MailScanner-Information: Please contact the ISP for more information X-MailScanner-ID: p0VFx61P023970 X-HPL-MailScanner: Found to be clean X-HPL-MailScanner-From: stevel@apache.org X-Virus-Checked: Checked by ClamAV on apache.org On 31/01/11 14:32, Chris Douglas wrote: > Steve- > > It's hard to answer without more concrete criteria. Is this a > trademark question affecting the marketing of a product? A > cross-compatibility taxonomy for users? The minimum criteria to > publish a paper/release a product without eye-rolling? The particular > compatibility claims made by a system will be nuanced and specific; a > runtime that executes MapReduce jobs as they would run in Hadoop can > simply make that claim, whether it uses parts of MapReduce, HDFS, or > neither. No, I'm thinking more about what large scale tests are needed to be run against the codebase before you can say "it works", and then how to say some changes means that it still works. > > For the various distributions "Powered by Apache Hadoop," one would > assume that compatibility will vary depending on the featureset and > the audience. A distribution that runs MapReduce applications > as-written for Apache Hadoop may be incompatible with a user's > deployed metrics/monitoring system. Some random script to scrape the > UI may not work. The product may only scale to 20 nodes. Whether these > are "compatible with Apache Hadoop" is awkward to answer generally, > unless we want to define the semantics of that phrase by policy. > > To put it bluntly, why would we bother to define such a policy? One > could assert that a fully-compatible system would implement all the > public/stable APIs as defined in HADOOP-5073, but who would that help? > And though interoperability is certainly relevant to systems built on > top of Hadoop, is there a reason the Apache project needs to be > involved in defining the standards for compatibility among them? Agreed, I'm just thinking about namings and definitions. Even with the stable/unstable internal/external split, there's still the question as to what the semantics of operations are, both explicit (this operation does X) and implicit (and it takes less than Y seconds to do it). It's those implicit things that always catch you out (indeed, they are the argument points in things like Java and Java EE compatibility test kits)