hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Steve Loughran <ste...@apache.org>
Subject Re: debian package of hadoop
Date Mon, 04 Jan 2010 14:46:55 GMT
Thomas Koch wrote:
> Hi Jordà,
> 
>> The main issue that prevents the inclusion of the current Cloudera
>> package into Debian is that it depends on Sun's Java. I think it would
>> be interesting, at least for an official Debian package, to depend on
>> OpenJDK in order to make it possible to distribute it in "main" instead
>> of "contrib".
> The build-depends line can easily be changed as long as hadoop will build with 
> openjdk. The binary will depend on java5-runtime-headless which is provided by 
> any java runtime. So the user of the package is free to choose either Sun or 
> openjdk.


Java6+ only.  It will build on openjdk or jrockit, the Hadoop team 
merely chooses to ignore all bug reports that you can't recreate on the 
official JDKs. You are still free to fix them yourself. You must also 
know that your JVM hasn't been tested at scale, unless you have the 
scale to compare with the big datacentres.

What use cases are you thinking of here?

1) developer coding against the hadoop Java and C APIs
2) Someone setting up a small 1-5 machine cluster
3) large production datacentre of hundreds of worker nodes
4) transient virtualised worker nodes


for (3) and (4) the challenge is getting the right configuration out 
there, where configuration =
hadoop XML files
log4j settings
rack awareness scripts
and such like

For virtualised clusters you set up one node then ask the infrastructure 
for 100 instances; for physical ones you just need to get the right 
files out everywhere. Packaging them up and pushing it out as a .deb or 
RPM is one option -the cloudera one- and is better than trying to by 
hand -but it is only one option.

Mime
View raw message