nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Kirby Bohling <>
Subject OSGi progress
Date Mon, 03 Aug 2009 04:00:50 GMT

I have pushed out a version of Nutch to github:


I replaced as many of the libraries with OSGi bundles from off the
web.  All of those ended up in lib/bundle/.  The MANIFEST.txt file
should document the URL's they were downloaded from.  I picked
alternative versions of a couple of libraries to avoid having to
repackage them.

The "README.osgi.txt" should explain where it is at.  This doesn't
actually run any useful Nutch code.  It does get 95% of the libraries
up and running inside of an OSGi environment.  I skipped a handful of
plugins that I've never used before.

I would like to minimize the amount of non-OSGi libraries to just
Hadoop, the OSGi Framework, and whatever support libraries are
required if possible.  Have three directories:

lib/bootstrap - The libraries required to bootstrap the embedded OSGi
lib/bundles - The libraries that are distributed as OSGi bundles.
lib/jar - The libraries that must be translated from Jars into OSGi bundles.

This way, everything is kept as originally distributed.  For now, I am
just using the "bnd" tool to translate anything that isn't an OSGi
bundle into one.

I'm going to look into how to use OSGi features to hook the plugins
into the core configuration.  I am trying to minimize the amount
change to Nutch code.  In that vein, I am contemplating adding one
service implementation to each plugin.  Create
"RegisterExtensionService", that has a single method that will allow
each plugin to register whatever factory type objects are needed by
the core to instantiate the objects required to assemble a runtime
environment.  If a plugin has multiple extensions, we can either
register them all in one service, or register each as a separate

Unit tests are the next hurdle to overcome after that.  Any comments
on the approach would be much appreciated.


View raw message