hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Steve Loughran <ste...@apache.org>
Subject Re: Running Hadoop/Hbase in a OSGi container
Date Fri, 12 Jun 2009 10:26:37 GMT
Ninad Raut wrote:
> OSGi provides navigability to your components and create a life cycle for
> each of those components viz; install. start, stop, un- deploy etc.
> This is the reason why we are thinking of creating components using OSGi.
> The problem we are facing is our components using mapreduce and HDFS, as
> such OSGi container cannot detect hadoop mapred engine or HDFS.
> I  have searched through the net and looks like people are working or have
> achieved success in running hadoop in OSGi container....
> Ninad

1. I am doing work on a simple lifecycle for the services, 
start/stop/ping, which is not OSGI (which worries a lot about 
classloading and versioning, check out HADOOP-3628 for this.

2. You can run it under OSGi systems, such as the OSGi branch of 
SmartFrog : 
or under non-OSGi tools. Either way, these tools are left dealing with 
classloading and the like.

3. Any container is going to have to deal with the problem that there 
are bits of all the services that call System.Exit() by running under a 
security manager, trapping the call, raising an exception etc.

4. Any container is going to have  to then deal with the fact that from 
0.20 onwards, Hadoop does things with security policy that are 
incompatible with normal Java security managers. whatever security 
manager you have for trapping system exits, can't extend the default one.

5. any container also has to deal with every service (namenode, job 
tracker, etc) makes a lot of assumptions about singletons, that they 
have exclusive use of filesystem objects retrieved through 
FileSystem.get(), and the like. While OSGi can do that with its 
classloading work, its still fairly complex.

6. There are also lots of JVM memory/thread management issues, see the 
various Hadoop bugs

If you look at the slides of what I've been up to, you can see that it 
can be done

  * you really need to run every service in its own process, for memory 
and reliability alone
  * It's pretty leading edge
  * You will have to invest the time and effort to get it working

If you want to do the work, start with what I've been doing, bring it up 
under the OSGi container of your choice. You can come and play with our 
tooling, I'm cutting a release today of this week's Hadoop trunk merged 
with my branch, it is of course experimental, as even the trunk is a bit 
up-and-down on feature stability.


View raw message