stanbol-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Rupert Westenthaler <rupert.westentha...@gmail.com>
Subject Re: Apache Stanbol
Date Tue, 09 Aug 2011 14:45:34 GMT
Hallo Mr Srecko Joksimovic

Two initial Notes:

* I am sending this also to the stanbol-dev list, because I am on
vacation until end of August and will only read/answer mails from time
to time until than. So maybe others of the Stanbol community will be
able to answer questions more quickly then myself.
* I am sending this reply via my gmail account, because somehow I am
not able to connect to the SMTP server of salzburgresearch.at.

- - -

Actually this is possible by using the TaxonomyLinkingEngine that is
currently in development.

Basically there are two steps:

(1) upload your Ontology to the Entityhub
(2) configure an instance of the TaxonomyLinkingEngine to use your
ontology (as stored by the Entityhub) to enhance your documents


for (1) there are two possibilities

(1a) upload your ontology directly to the "/entityhub"

If you have a Stanbol instance running at "http://localhost:8080/" the
followng curl command can be used to upload an RDF graph (such as your
ontology)

curl -X POST -H "Content-Type: application/rdf+xml" --data
"@{rdfXmlFile}" http://localhost:8080/entityhub/entity

This assumes that your Ontology is encoded as "application/rdf+xml".
{rdfXmlFile} denotes to the path to the file on the local file system.
Please also have a look at [1] for a more detailed description on how
to upload RDF data to the "/entityhub" endpoint.

(1b) manage the RDF data as an own ReferencedSite

The Entityhub also supports the configuration of so called
ReferencedSites. This allows to manage different RDF datasets (e.g.
dbpedia.org, geonames.org, IPTC thesaurus [2], your ontology, ...).
Such sites are read-only and accessible under
http://localhost:8080/entityhub/site/{siteId}

Stanbol also includes an indexing tool that helps you in creating such
"referencedSites" for local datasets (such as your Ontology). A
detailed description of this process can be found at [3]. [2] is a
specific configuration of [3] for the IPTC thesaurus.

In general for testing I would suggest to use (1a) because it is much
easer to start with. However (1a) will require to load your ontology
in memory therefore it will not work for big datasets. In addition
(1b) allows you to optimize your ontology (by defining mappings)
during the indexing process and it gives you the possibility to use
different Ontologies for enhancing your content. Therefore for more
complex usage scenarios option (1b) is typically the better solution.

(2) Configure the TaxonomylinkingEngine

This Engine is by default included in the Full launcher of Stanbol. If
you prefer the stable launcher you will need to manually install it
(e.g. by using the Apache Felix Wenconsole accessible under
http://localhost:8080/system/console default user: admin pwd: admin).
The bundle to install can be found at
"{stanbol-trunk}/enhancer/engines/taxonomylinking/target/org.apache.stanbol.enhancer.engine.taxonomy-0.9.0-incubating-SNAPSHOT.jar"

Assuming a running Stanbol Instance that includes the
TaxonomyLinkingEngine the following steps are required for the
configuration:
1. go the "configuration tab" (http://localhost:8080/system/console/configMgr)
2. search for "Apache Stanbol Enhancement Engine for Taxonomy linking"
3. pres on the [+] button on the end of this line (this will open the
dialog to configure a new instance of this engine)
4. configure the source. If you used (1a) put "entityhub" in case of
(1b) you must enter the siteId of the referenced site.
5. configure the property used to search for labels of the concepts in
your Ontology. The default is rdfs:label (typically used for labeling
concepts within ontologies, but you might also want to use a different
one based on your ontology

I would not recommend to change any other properties because this
engine is currently under development and changes to this values might
not be implemented or even worse break this engine.



best
Rupert Westenthaler

[1] http://markmail.org/message/plertstj6fx4xutj
[2] http://markmail.org/message/rgwug74s3u6olrby
[3] http://svn.apache.org/repos/asf/incubator/stanbol/trunk/entityhub/indexing/genericrdf/README.md

Am 08.08.2011, 22:53 Uhr, schrieb srecko joksimovic
<sreckojoksimovic@gmail.com>:
>
> Hello Mr Westenthaler,
>
> Mr Pereira gave me your contact, in order to ask you a few questions about
> Apache Stanbol. I suppose that you have received Mr Pereira's email which he
> sent to me, and you maybe already know what the problem is.
>
> Mr Pereira suggested me to use Apache Stanbol. The idea is to load my
> ontology, and than to call method which is going to annotate provided text,
> based on loaded ontology. Could you please explain to me how to implement
> scenario which I described using Apache Stanbol? If you could provide me a
> code example, I would be very grateful.
>
> Thank you very much.
>
> Best,
> Srecko Joksimovic

-- 
| Rupert Westenthaler             rupert.westenthaler@gmail.com
| Bodenlehenstraße 11                             ++43-699-11108907
| A-5500 Bischofshofen

Mime
View raw message