stanbol-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Rupert Westenthaler <>
Subject Redesigning the Entityhub Configuration
Date Tue, 01 Mar 2011 14:45:51 GMT
Based on my own experience and feedback of some (very) friendly early
adopters the main issue with the Stanbol Entityhub is the complex
configuration. This is something that is ok to try it out in the lab,
but becomes more and more an show stopper because it keeps people away
from using it.
In short - the goal must be to make the Entityhub as easily
installable as the Stanbol Enhancer. That means providing an runable
jar that can be used without any additional configuration steps.
The main focus of my work in the coming weeks will to achieve exactly
that for the Stanbol Entityhub.

Current State:

Currently the configuration of the Stanbol Entityhub is a very complex
thing to do because It requires to configure a lot of OSGI components
and to connect them to each other. This has the following
 - The users need to know how to configure components (e.g. to use the
SPARQL Endpoint for the SPARQL Dereferencer and the LOD Endpoint for
the Cool URI Dereferencer when configuring a Referenced Site)
 - The user need to remember the IDs of the components to correctly
relate them with each other
 - The user need to understand all the components and there role to
even know that he has to connect them with each other
 - The user needs to keep track of dependent components when changing
the configuration of an component
 - The Apache Felix Web Console is a cool interface to configure
single components, but does not really support the management of such
 - Some Components (e.g. the SolrYard) also requires to connect to an
external Service or to point to specific files on the local hard disc.

To give an example here are the stepse needed to setup as
ReferencedSite by using a local index instead of the remote SPARQL
 1) download or create an local index of this site
 2) set up a SolrServer or providing the Solr index + configuration
needed to run an EmbeddedSolrServer (described by [1])
 3) configure a SolrYard instance that points to the SolrServer (also
described by [1])
 4) configure a Cache instance and connect it to the configured
SolrYard (no documentation)
 5) create a ReferencedSite instance that uses the cache and - as
fallback - also an EntityDereferencer to be used as fallback if the
cache is not available (described by [2]).

Even to set up the Entityhub with a minimal configuration one need to
complete the following three steps (as described by [3]):
 1) create and configure a Yard used by the Entityhub to store its data
 2) configuration of the Entityhub (especially linking it to the Yard
created in step 1
 3) configure at least one ReferencedSite (because without any
referenced site there will be no Entities to work with)

Getting this right might take the average user - even with a very good
documentation - several hours what is way to much for typical users
that plan to try a new technology.

Planed Changes:

(1) Automatic configuration of the Core Framework

This includes the configuration of the Entityhub, the Yard used by the
Entityhub to store its data and the Jersey Endpoint.

The Entityhub will come with a default configuration. In case no
configuration is present the default will be set via the OSGI
ConfigAdmin Service or by using default values for all required
The Yard instance required by the Entityhub need also to be
instantiated if not available. Here the plan is to use the Yard
implementation with the highest service rank. In case initialization
based on this implementation fails the implementation with the next
highest rank will be chosen until success.

(2) Configuration of ReferencedSites

Configuring ReferencedSites is tricky, because depending on the actual
configuration this requires to configure a lot of different components
and link them together.
The Current Idea is to support two options for configuring Referenced Sites:
 a) A configuration File: This should be the best in cases one does
not require a local cache with preloaded information.
 b) A Bundle (Archive) that contains not only the configuration but
also a local index.
For Both cases the Felix FileInstall can be used to dynamically load
(and initialize) the configuration as soon as the user copies or
updates it within a special directory.

As far as I know with (a) one can only provide the configuration for a
single component/config file. In that case one would need to define a
new component "e.g. ReferencedSiteConfig" that is responsible for
creating and configuring all the necessary components (ReferencedSite,
EntityDereferencer, EntitySearcher, Cache and Yard).
For (b) I think about using a BundleActivator that first inits the
files for the local index and than loads the configuration from within
the bundle and parses it to the ConfigAdmin. From that point the
initialization would be the same as for (a).

(3) Updates of local caches/indexes

The update of local indexes is an other important configuration task
that need to be done by users. Currently the Idea is to use the same
files as described for (2b). However in that cases the initialization
would need to detect existing configuration to don't override them
(only the index data would need to be updated, but not possible
changes to the configurations of the components)


When finishing all this it should be possible to double click a
runable jar containing the Entityhub and immediately start to use it.
Adding new sites will be possible to download prepared configurations
and simple copy them into a configuration directory. For changing the
configuration of already installed ReferencedSites the Apache Felix
WebConsole is used.

So thats the plan as for now. If someone has any comments, tips,
experiences in implementing functionality like that, nice code
examples ... I would be very thankful! Most of that stuff is put
together based on examples I found on sites like and I am still wondering if that is the way
to go.

Rupert Westenthaler


| Rupert Westenthaler   
| Bodenlehenstraße 11                             ++43-699-11108907
| A-5500 Bischofshofen

View raw message