incubator-stanbol-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Suat Gonul <suatgo...@gmail.com>
Subject Re: RdfPathLanguage: Next steps (was Re: Updates to LMF/Stanbol integration)
Date Fri, 28 Oct 2011 08:52:57 GMT
Hi,

On 10/28/2011 11:26 AM, Rupert Westenthaler wrote:
> Hi Sebastian, Jakob, Stanbol team
>
> Based on the positive feedback of Anil to participate on this I decided to create an
own thread to plan the next steps.
>
> Next steps:
>
> The first step will be to define Java API that allows to provide different implementations.
I think the Idea was to create an own Project (should we use Github or GoogleCode? MIT/BSD/Apache
licensed?) that only focusses on the Specification of the Language [1] and the Java API. Sebastian
needs to take the lead of this. If I remember correctly his plan was to start this next week.
>
> As soon as a first version of this specification is available we can start to work on
implementations.
>
> * Kiwi TripleStore: I assume Sebastian and Jakob will work on that
> * Clerezza: Anil could you take the lead for that?

Anil says OK for taking the lead for this.

> * Entityhub: This will be my responsibility
> * SPARQL based implementation: I think that would be interesting - someone interested
to work on that?
> * CMS Adapter: Suat could you follow this effort and check for possible usage scenarios.

Currently, CMS Adapter can generate RDF from JCR/CMIS content 
repositories. This RDF conforms to the simple specification that we 
created while implementing bidirectional mapping feature. As a first 
attempt, this generated RDF can be queried with RDFPathLanguage.

For the time being, I don't have a concrete use case yet for directly 
querying the CMS with RDFPathLanguage. I would be glad hear any use 
cases. I think CMS developers might have more ideas at this point.

Best,
Suat

> * Fact Store: This could be also an interesting. But same as for the CMS Adapter we need
first to check usage scenarios.
>
> best
> Rupert
>
>
>
> On 28.10.2011, at 10:07, Ali Anil SINACI wrote:
>
>> Dear Rupert,
>>
>> On 10/28/2011 08:47 AM, Rupert Westenthaler wrote:
>>> On 27.10.2011, at 16:59, Ali Anil SINACI wrote:
>>>>> * The LMF semantic search component overlaps greatly with the recently
by Anil contributed "contenthub/search/engines/solr" component.  Related to this it would
be great if Anil could have a look at [2] and check for similarities/differencies and possible
integration paths.
>>>>>
>>>> I had a look on the semantic search component of LMF. As you pointed it out,
LMF semantic search provides a convenient way to index any part of documents with the help
of RDFPath Language. I think that we can make use of this feature in contenthub. As I described
in my previous e-mail, currently, contenthub indexes a number of semantic fields based on
DBPedia relations. These are hardcoded relations. RDFPath language can be used  to indicate
specific semantic fields to be indexed along with the content itself. Let me describe the
thing in our mind in a scenario:
>>>>
>>>> A user provides a domain ontology (e.g. music domain), submits to Entityhub
to be used in the enhancement process. Suppose the domain ontology includes vast of information
about artists, their albums etc... I assume that this ontology does not include conceptual
definitions (it only includes Abox definitions). User writes an RDF Path Program (in LMF terminology)
to indicate the fields to be indexed when a content item has an enhancement related with any
path in that program. Suppose user submits a content item along with the RDF Path Program(s)
to be used to determine the fields to be indexed. Enhancement engines find an entity (or lots
of entities). Now, we execute the selected RDF Path Program(s) and embed the results into
the Solr representation of the content item.
>>>>
>>>> If you have any other suggestions, please let me know so that we can discuss
in detail (in SRDC) before the meeting.
>>>>
>>> This is exactly what I was thinking about. Let me only add that such additional
Knowledge to be included within the Semantic Index might not only come from the Entityhub,
but also from other sources (like the CMS via the CMS adapter)
>>>
>>> I you would like to help me with an Implementation of the RdfPathLanguage (e.g.
the Clerezza based Implementation, or maybe a Jena bases implementation) please let me know.
Help would be greatly welcome, because I have already a lot of things on my TODO list before
the Meeting in November (such as defining a Proposal for the Stanbol Enhancement Structure).
>>>
>> We would like to get involved in the implementation of RDFPathLanguage for Stanbol.
We plan to work on this starting from next week. I think you&  LMF team already have a
design in your mind. I will appreciate if you could share your thoughts with us.
>>
>>>>> * The Semantic Search Inteface: The Contenthub currently defines it's
own query API (supports keyword based search as well as "field ->    value" like constraints,
supports facets). The LMF directly exposes the RESTful API of the semantic Solr index. I strongly
prefer the approach of the LMF, because the two points already described above.
>>>> We think that we do not have to make a selection here. We can keep a simple
wrap-up on the Solr interface (contenthub's own query API) while providing the Solr RESTful
API as is. IMO a wrap-up on Solr interface would be beneficial. On the other hand, in this
interface we try to make use of an ontology to be used in OntologyResourceSearchEngine. This
might help to figure out new keywords based on the subsumption hierarchy inside the ontology.
However, I think this may lead to performance issues and may not be useful at all. We can
decide on this later.
>>> You forgot to mention one additional advantage for using the Solr RESTful API:
If we do that one could create the Semantic Index and than copy it over to some other SolrServer
without the need to run Stanbol directly on the production infrastructure.
>>>
>>> In general I would suggest to first focus the discussion on the unique features
we would like to provide with the Semantic Search component. I already included three features
I would like to have in my first Mail (Query preprocessing, Entity Facets, Semantic Facets).
As you now mention the OntologyResourceSearchEngine is very relevant in relation to such features.
>>> However adding such features must not necessarily mean to create an own query
language. One could also try to add such features directly to Solr by implementing some Solr
extensions.
>>>
>> Let me briefly comment in your suggestions about the semantic search.
>>
>>>>>   But I am also the opinion that a semantic search interface should at
least provide the following three additional features:
>>>>>      1. Query preprocessing: e.g. substitute  "Paris" in the query with
"http://dbpedia.org/resource/Paris";
>>>>>      2. Entity Facets: if a keyword matches a Entity (e.g. "Paris" ->
   "dbpedia:Paris", "dbpedia:Paris_Texas", "dbpedia:Paris_Hilton") than provide a Facet to
the user over such possible nnnnnnnnmatches;
>> As far as we understand, first and second features will be handled by querying the
Entityhub with the query keyword (Paris) i.e the first entity obtained from the Entityhub
will help us to recognize its type and the other entities will be served as facet values of
Paris facet.
>>
>>>>>      3. Semantic Facets: if a user uses an instance of an ontology type
(e.g. a Place, Person, Organization) in a query, that provide facets over semantic relations
for such types (e.g. fiends for persons, products/services for Organizations, nearby Points-Of-Interests
for Places, Participants for Events, …). To implement features like that we need components
that provide query preprocessing capabilities based on data available in the Entityhub, Ontonet
… . To me it seams that the contenthub/search/engines/ontologyresource component provides
already some functionality related to this so this might be a good starting point.
>> Currently, we are trying to integrate an exploration mechanism like you said above.
It is also based on DBPedia ontology.  OntologyResourceEngine can be used for this purpose
for the user registered ontologies. Current implementation of this engine only computes closures
by exploiting the hierarchy in the ontology. RDFPath Programs can also be an option at this
point. With an RDF Path Program user may specify the relations to be used in the exploration
process. But I think this means the user decides beforehand which fields should be presented
to him as exploration fields. I think this is open to discussion.
>>
>>> best
>>> Rupert
>>>
>> Regards,
>> Anil.
>


Mime
View raw message