incubator-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Steve Loughran <steve.lough...@gmail.com>
Subject Re: [DISCUSS] BOM and supported platforms for Bigtop 0.4.0
Date Sat, 05 May 2012 19:18:22 GMT
On 4 May 2012 16:34, Alan Gates <gates@hortonworks.com> wrote:

>
> > * In that case there might still be a role for BigTop to provide a
> > central repository for such easily consumable upstream releases. This
> > would be somewhat similar to the discussions that took place a few
> > years ago about whether and how the ASF could host something like the
> > central Maven repository.
>
>
> Do you know what list that discussion took place on and a general time
> frame?  Reading through that would be very helpful for my thinking on this
> topic.
>
>
That probably dates back a decade -the outcome was that maven.org became
the place where any JAR could go, irrespective of origin, authentication or
quality of pom.xml metadata. The ASF artifacts only go there after
validation of the release process.

the nice thing about JAR files is that they are platform-independent, so a
single global repository can share stuff without problems; the artifacts
remain valid forever.

Once you get into platform binaries and OS-specific packaging, you get
platform and version problems.

My thoughts

   - It's up to the OS teams & vendors to run their own repositories. It's
   their business model, and I'm happy to use ubuntu and redhat's repositories
   of signed artifacts.
   - Similarly, it's up to them to compile native binaries down for their
   platforms, qualify them on clusters.
   - What they do need is a set of programs that are designed on their
   platforms, putting stuff in the right places, working with the OS, rather
   than against it (a permanent issue in java-land)
   - The metadata to create the OS specific installation bundles -RPM, Deb,
   whatever else- are something that helps them, and helps ensure that the
   downstream distributor doesn't come up with their own rules and layout that
   doesn't work or causes support issues for everyone.
   - What is useful for them are better tests to qualify the entire stack,
   so a nighly ubuntu or redhat stack can test the stable hadoop stack to
   catch OS/JDK regressions, other people can qualify their entire cluster
   with more than just terasort, etc, etc. These tests should be designed to
   qualify a cluster independently of how it was bundled, installed or
   deployed.
   - The ASF hasn't ever been in the role of an RPM/Deb source itself, and
   never pushing out virtual hard disk images containing entire Linux VMs.

Owen's unhappy that HUE is going in as even though it has a license that
works with Apache products, it's not part of the in-Apache Hadoop community
codebase.  It's aim is to provide a front end/management system for the
product, and while some such product is invaluable, the issue that is
arising is this

-should the ASF be dictating which external tool should be managing your
Hadoop cluster and doing so by declaring that the bigtop artifacts depend
upon it -so giving what is effectively one product a defacto seal of
approval?

I don't think it should unless there's some consensus within all the
projects contributing the code that yes, they are happy for the ASF
integration project to do this, and that there is no alternative way to do
it.


The thing is, there is no reason to say apache-bigtop-x.y depends on HUE.
 -The ASF RPM/Deb metadata and any sandbox artifacts can include the ASF
code and any dependent artifacts that they have no choice but to rely on.
 -Redistributors of any kind are free to make the production artifacts they
choose.
 -They are also free to pull any other dependencies they want to. That
includes Hue, ganglia, Nagios, SmartFrog (which does currently do its own
Hadoop RPM and would gladly not do so).

There's a big difference in saying "here is the metadata to create the
apache artifacts" and "here is the MD to create the apache artifacts and
some others we want everyone downstream to take up". That's more of a
decision for the downstream people.

There's a separate issue which is:

Should the ASF be providing its own OS specific repositories of artifacts?

That's a step beyond sticking JAR files up online. I think its good to have
the artifacts on repositories for apt-get and yum, but there's no reason
why the responsibility for production releases should not be pushed out to
Redhat, Canonical, etc. Anything hosted by the ASF should be restricted to
nightly build/snapshot releases so that people testing the distribution
process has access to those releases, but formal releases ought to be
pushed out to whoever wants to take the source code and build and qualify
the packages so generated. Otherwise you can't say "we are just doing the
metadata"

Finally, there's the VMs. I'm never personally fond of downloading them as
they never seem to come with a keyboard setting that works in my locale,
but they do give people a great way to get up and running. At the same time
they not only stick in the binary stack, they also include the linux image
and take up a fairly large amount of bandwidth

I think the tactic there is to work with someone like VirtualBoxImages who
stick the VMs up on SourceForge, such as for all centos VMs
 http://virtualboxes.org/images/centos/
They get to build up the VMs with their choices, and they take on the cost
of keeping those images secure, that being the great pain of VM images
-every backup OS image you have ages at the rate of one critical adobe
patch a fortnight. (**)

Do that off-ASF packaging, you get to pull in whatever extra layers on top
you want and infrastructure stop expressing so much concern.

-Steve


(who does now work at Hortonworks, but is actually working on the SmartFrog
source tree this w/end using it to bring up a Hadoop pseudo cluster on the
linux laptop so that I can submit groovy MR jobs to it from my new laptop
and so get my berlin buzzwords talk finished)


(**) The solution here is actually to provide not the VM image, but the
metadata to create the entire VM image, and hand off the work to LinuxCOE,
http://linuxcoe.sourceforge.net/ , which you can see live at
http://www.pro.instalinux.com/cgi-bin/coe_bootimage.cgi
"creating VMs by hand is like statically linking C++ binaries yourself"

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message