hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Steve Loughran <ste...@apache.org>
Subject Re: debian package of hadoop
Date Mon, 04 Jan 2010 12:37:48 GMT
Jordà Polo wrote:
> On Wed, Dec 30, 2009 at 07:53:43PM +0100, Thomas Koch wrote:
>> today I tried to run the cloudera debian dist on a 4 machine cluster. I still
>> have some itches, see my list below. Some of them may require a fix in the
>> packaging.
>> Therefor I thought that it may be time to start an official debian package of
>> hadoop with a public GIT repository so that everybody can participate.
>> Would cloudera support this? I'd package hadoop 0.20 and apply all the
>> cloudera patches (managed with topgit[1]).
>> At this point I'd like to have your opinion whether it would be wise to have
>> versioned binary packages like hadoop-18, hadoop-20 or just plain hadoop for
>> the Debian package?
> Hi Thomas,
> I have been thinking about an official Hadoop Debian package for a while
> too.

If you want "official" as in can say "Apache Hadoop" on it, then it will 
need to be managed and released as an apache project. That means 
somewhere in ASF SVN. If you want to cut your own, please give it a 
different name to avoid problems later.

> The main issue that prevents the inclusion of the current Cloudera
> package into Debian is that it depends on Sun's Java. I think it would
> be interesting, at least for an official Debian package, to depend on
> OpenJDK in order to make it possible to distribute it in "main" instead
> of "contrib".

+1 to more on packaging; I'd go so far as push for a separate 
"deployment" subproject which would be downstream of everything, 
including HBase and other layers.

I view .deb and .RPM releases as stuff you would push out to clusters, 
maybe with custom config files for everything else. Having the ability 
to create your own packages on demand would appear to be something that 
people need (disclaimer, I do create my own RPMs)

I would go for the package to not bother mentioning which Java it 
depends on, as that lets you run on any Java version, jrockit included. 
Or drive the .deb creation process such that you can decide at release 
time what the options are for any specific target cluster.

> Also, note that in order to fit into Debian's package autobuilding
> system, some scripts will probably require some tweaking. For instance,
> by default Hadoop downloads dependencies at build time using ivy, but
> Debian packages should use already existing packages. Incidentally,
> Hadoop depends on some libraries that aren't available in Debian yet,
> such as xmlenc, so there is even more work to do.

Well, we'll just have to ignore the debian autobuilding process then, 
won't we?

There are some hooks in Ivy and Ant to give local machine artifacts 
priority over other stuff, but it's not ideal. Let's just say there are 
differences in opinion between some of the linux packaging people and 
others as to what is the correct way to manage dependencies. I'm in the 
"everything is specified under SCM" camp; others are in the "build 
against what you find" world.

I cut my rpms by
*  pushing the .rpm template through a <copy> with property expansion; 
this creates an RPM containing all the version markers set up right and 
driven my build's properties files.
* not declaring dependencies on anything, java or any other JAR like 
log4J. This ensures my code runs with the JARs I told it to, not 
anything else. Also it gives me the option to sign all the JARs, which 
the normal Linux packaging doesn't like.
* releasing the tar of everything needed to sign the JARs and create the 
RPMs as a redistributable. This gives anyone else the option to create 
their own RPMs too. You don' t need to move the entire build/release 
process to source RPMs or .debs for this, any more than the Ant or log4J 
packages get built/released this way.
* <scp> the packages to a VMWare or Virtualbox image of each supported 
platform, ssh in and exec the rpm uninstall/install commands, then walk 
the scripts through their lifecycle. You were planning on testing the 
upgrade process weren't you ?


> (Anyway, I'm interested in the package, so let me know if you need some
> help and want to set up a group on alioth or something.)

A lot of the fun here is not going to be setting up the package files (

View raw message