Return-Path: X-Original-To: apmail-incubator-general-archive@www.apache.org Delivered-To: apmail-incubator-general-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id C53C0CE26 for ; Tue, 8 May 2012 02:28:20 +0000 (UTC) Received: (qmail 6378 invoked by uid 500); 8 May 2012 02:28:20 -0000 Delivered-To: apmail-incubator-general-archive@incubator.apache.org Received: (qmail 6163 invoked by uid 500); 8 May 2012 02:28:20 -0000 Mailing-List: contact general-help@incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: general@incubator.apache.org Delivered-To: mailing list general@incubator.apache.org Delivered-To: moderator for general@incubator.apache.org Received: (qmail 27113 invoked by uid 99); 5 May 2012 19:18:49 -0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of steve.loughran@gmail.com designates 209.85.215.47 as permitted sender) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=pZKsSX5aiIZutD0VB37a2CLEiwFXR0Wg0vWS+dXJ4YM=; b=d94Qs1+1hZFrP/jN/AW3gBBvKAs9KxsvtKOL4kzxHGUQtgOfUqmul8Cjkb8hqZCDvr CH7sbv5bRK9KDY9BEAuqDLIVwBVfeqtpDLftzLPSqyUtrR8meixj06+JK2mKglKUVUCK WN8AzuUTN+weVb768BhXWSEvkHieLL4CQJl8XH7ACJJyzV95NDWfYB32E7a+SFO4SOMZ BvKf4dIQHEXwBiwvve+W44O1RDWOQYfbWBNir6bKaZaERWl/5eEVEzVVgbZWzLSF6wFK N1ArQlsWR8hS3S76+n2dDhTQWV0i5U1SfRipm3HDr1shEDOVxUqXfccBYWDWREgBgp7w 9XaQ== MIME-Version: 1.0 In-Reply-To: <0E213662-8501-4C5A-9F01-A22CBD95B14A@hortonworks.com> References: <4F9F4F6C.4000404@apache.org> <0E213662-8501-4C5A-9F01-A22CBD95B14A@hortonworks.com> Date: Sat, 5 May 2012 12:18:22 -0700 Message-ID: Subject: Re: [DISCUSS] BOM and supported platforms for Bigtop 0.4.0 From: Steve Loughran To: general@incubator.apache.org Content-Type: multipart/alternative; boundary=f46d04088d7b8a647004bf4ee69d --f46d04088d7b8a647004bf4ee69d Content-Type: text/plain; charset=UTF-8 On 4 May 2012 16:34, Alan Gates wrote: > > > * In that case there might still be a role for BigTop to provide a > > central repository for such easily consumable upstream releases. This > > would be somewhat similar to the discussions that took place a few > > years ago about whether and how the ASF could host something like the > > central Maven repository. > > > Do you know what list that discussion took place on and a general time > frame? Reading through that would be very helpful for my thinking on this > topic. > > That probably dates back a decade -the outcome was that maven.org became the place where any JAR could go, irrespective of origin, authentication or quality of pom.xml metadata. The ASF artifacts only go there after validation of the release process. the nice thing about JAR files is that they are platform-independent, so a single global repository can share stuff without problems; the artifacts remain valid forever. Once you get into platform binaries and OS-specific packaging, you get platform and version problems. My thoughts - It's up to the OS teams & vendors to run their own repositories. It's their business model, and I'm happy to use ubuntu and redhat's repositories of signed artifacts. - Similarly, it's up to them to compile native binaries down for their platforms, qualify them on clusters. - What they do need is a set of programs that are designed on their platforms, putting stuff in the right places, working with the OS, rather than against it (a permanent issue in java-land) - The metadata to create the OS specific installation bundles -RPM, Deb, whatever else- are something that helps them, and helps ensure that the downstream distributor doesn't come up with their own rules and layout that doesn't work or causes support issues for everyone. - What is useful for them are better tests to qualify the entire stack, so a nighly ubuntu or redhat stack can test the stable hadoop stack to catch OS/JDK regressions, other people can qualify their entire cluster with more than just terasort, etc, etc. These tests should be designed to qualify a cluster independently of how it was bundled, installed or deployed. - The ASF hasn't ever been in the role of an RPM/Deb source itself, and never pushing out virtual hard disk images containing entire Linux VMs. Owen's unhappy that HUE is going in as even though it has a license that works with Apache products, it's not part of the in-Apache Hadoop community codebase. It's aim is to provide a front end/management system for the product, and while some such product is invaluable, the issue that is arising is this -should the ASF be dictating which external tool should be managing your Hadoop cluster and doing so by declaring that the bigtop artifacts depend upon it -so giving what is effectively one product a defacto seal of approval? I don't think it should unless there's some consensus within all the projects contributing the code that yes, they are happy for the ASF integration project to do this, and that there is no alternative way to do it. The thing is, there is no reason to say apache-bigtop-x.y depends on HUE. -The ASF RPM/Deb metadata and any sandbox artifacts can include the ASF code and any dependent artifacts that they have no choice but to rely on. -Redistributors of any kind are free to make the production artifacts they choose. -They are also free to pull any other dependencies they want to. That includes Hue, ganglia, Nagios, SmartFrog (which does currently do its own Hadoop RPM and would gladly not do so). There's a big difference in saying "here is the metadata to create the apache artifacts" and "here is the MD to create the apache artifacts and some others we want everyone downstream to take up". That's more of a decision for the downstream people. There's a separate issue which is: Should the ASF be providing its own OS specific repositories of artifacts? That's a step beyond sticking JAR files up online. I think its good to have the artifacts on repositories for apt-get and yum, but there's no reason why the responsibility for production releases should not be pushed out to Redhat, Canonical, etc. Anything hosted by the ASF should be restricted to nightly build/snapshot releases so that people testing the distribution process has access to those releases, but formal releases ought to be pushed out to whoever wants to take the source code and build and qualify the packages so generated. Otherwise you can't say "we are just doing the metadata" Finally, there's the VMs. I'm never personally fond of downloading them as they never seem to come with a keyboard setting that works in my locale, but they do give people a great way to get up and running. At the same time they not only stick in the binary stack, they also include the linux image and take up a fairly large amount of bandwidth I think the tactic there is to work with someone like VirtualBoxImages who stick the VMs up on SourceForge, such as for all centos VMs http://virtualboxes.org/images/centos/ They get to build up the VMs with their choices, and they take on the cost of keeping those images secure, that being the great pain of VM images -every backup OS image you have ages at the rate of one critical adobe patch a fortnight. (**) Do that off-ASF packaging, you get to pull in whatever extra layers on top you want and infrastructure stop expressing so much concern. -Steve (who does now work at Hortonworks, but is actually working on the SmartFrog source tree this w/end using it to bring up a Hadoop pseudo cluster on the linux laptop so that I can submit groovy MR jobs to it from my new laptop and so get my berlin buzzwords talk finished) (**) The solution here is actually to provide not the VM image, but the metadata to create the entire VM image, and hand off the work to LinuxCOE, http://linuxcoe.sourceforge.net/ , which you can see live at http://www.pro.instalinux.com/cgi-bin/coe_bootimage.cgi "creating VMs by hand is like statically linking C++ binaries yourself" --f46d04088d7b8a647004bf4ee69d--