stanbol-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From <arthi.ven...@wipro.com>
Subject RE: Working with large RDF data
Date Wed, 18 Sep 2013 04:32:03 GMT
Thanks a lot Rupert,
Regards,
Arthi

-----Original Message-----
From: Rupert Westenthaler [mailto:rupert.westenthaler@gmail.com] 
Sent: Tuesday, September 17, 2013 9:06 PM
To: dev@stanbol.apache.org
Subject: Re: Working with large RDF data

Hi Arthi,

On Tue, Sep 17, 2013 at 3:30 PM,  <arthi.venkat@wipro.com> wrote:
> Thanks Rupert,
> I have few clarifications / queries which I have asked inline below.
>
> "It should be possible to reason over the enhancement results and store all triples (including
the deduced one) in Jena TDB. After that you can use SPARQL on the Jena TDB as suggested by
Reto. However note that any change in the Ontology will not be reflected in the Jena TDB -
as there is not truth maintenance."
>
> Iam currently  storing RDF in a separate SDB outside Stanbol.   Is there a way this /
TDB  can be stored and managed as  part of the Stanbol.?
> For the problem of stale triples I could refresh entire store on change of Ontology.

Storing enhancement results in a Clerezza Triple Store (by default Jena TDB) is one part of
what the Contenthub does. You can also write a simple component that retrieves an EnhancementRequest
calls the Enhancer API and stores the results to a Clerezza store.

>
> "If the data does fit into memory you just store the plain RDF data, load them into an
reasoning session to get the results. After that you can store the results in an other RDF
store (e.g. Jena TDB) for later queries."
> How do you load RDF data into a session.  I could see a way to load an Ontology into
a session  but not RDF instances.

Loading an Ontology or Instance data is not different. Both are RDF triples. The problem is
that instance data are typically much bigger and will not fit into a session. If you have
a quad store (such as Jena TDB) you could store Triples of each ContentItem in an own context
(you can use the URI of the ContentItem as Context). This would allow to do perform reasoning
sessions per content item
(context) when the Ontology changes.

NOTE that the default Clerezza TDB storage provider does not scale to a high number of contexts.
So you would need to use the Scaleable TcProvider (CLEREZZA-736). Clerezza does not support
quads. However you can get TripleCollections for each context via the TcManager.

I can not tell you how to load RDF data into a session, but I hope the documentation of the
OntologyManager, Reasoning and Rule components provide such information.

best
Rupert

>
>
> "IMO if you need reasoning support over the whole knowledge base you should use a System
that natively supports it. While the above workflows would allow to mimic such functionality
it will become unpractical as the amount of data grows."
> I will evaluate some other stores to be used along with Stanbol say Virtuoso , etc to
see if this limitation can be overcome.
>
>
> Thanking you and Regards,
> Arthi
>
> -----Original Message-----
> From: Rupert Westenthaler [mailto:rupert.westenthaler@gmail.com]
> Sent: Tuesday, September 17, 2013 12:46 PM
> To: dev@stanbol.apache.org
> Subject: Re: Working with large RDF data
>
> Hi
>
> It should be possible to reason over the enhancement results and store all triples (including
the deduced one) in Jena TDB. After that you can use SPARQL on the Jena TDB as suggested by
Reto. However note that any change in the Ontology will not be reflected in the Jena TDB -
as there is not truth maintenance.
>
> If the data does fit into memory you just store the plain RDF data, load them into an
reasoning session to get the results. After that you can store the results in an other RDF
store (e.g. Jena TDB) for later queries.
>
> IMO if you need reasoning support over the whole knowledge base you should use a System
that natively supports it. While the above workflows would allow to mimic such functionality
it will become unpractical as the amount of data grows.
>
> best
> Rupert
>
>
>
>
> On Mon, Sep 16, 2013 at 3:29 PM, Reto Bachmann-Gmür <reto@wymiwyg.com> wrote:
>> Why in memory? TDB based clerezza store is quite efficient, so why 
>> not add the data to such a graph?
>>
>> reto
>>
>>
>> On Sat, Sep 14, 2013 at 9:14 AM, <arthi.venkat@wipro.com> wrote:
>>
>>> Thanks a lot Rupert
>>> If the RDF data is smaller ( can fit into memory ) is there a way we 
>>> can import into Stanbol and do a joint search across the 
>>> enhancements from unstructured text as well as the imported RDF data.
>>> If yes would this import be permanent or needs to be repeated each time.
>>>
>>>
>>> Thanks and Rgds,
>>> Arthi
>>>
>>>
>>> -----Original Message-----
>>> From: Rupert Westenthaler [mailto:rupert.westenthaler@gmail.com]
>>> Sent: Saturday, September 14, 2013 12:40 PM
>>> To: dev@stanbol.apache.org
>>> Subject: Re: Working with large RDF data
>>>
>>> Hi Arthi
>>>
>>> AFAIK the reasoning and rule components of Apache Stanbol are 
>>> intended to be used in "Sessions". They are not intended to be used 
>>> on a whole knowledge base. A typical use case could be validating 
>>> RDF data retrieved from a remote Server (e.g. Linked Data) against some validation
rules.
>>> Rewriting RDF generated by the Enhancer (Refactor
>>> Engine) ...
>>>
>>> Applying Rules and Reasoning on a whole knowledge base (RDF data 
>>> that do not fit in-memory) is not a typical use case.
>>>
>>> Based on your problem description you might want to have a look onto
>>>
>>> * Apache Marmotta and the Kiwi Triple Store
>>> (http://marmotta.incubator.apache.org/kiwi/introduction.html): This 
>>> is a Sesame Sail implementation that supports reasoning
>>> * OWLLIM (http://www.ontotext.com/owlim): Commercial product also 
>>> implementing Reasoning on top of the Sesame API.
>>>
>>> But I am not an export in those topics so there might be additional 
>>> options I am not aware of.
>>>
>>> hope this helps
>>> best
>>> Rupert
>>>
>>>
>>> On Fri, Sep 13, 2013 at 1:48 PM,  <arthi.venkat@wipro.com> wrote:
>>> > Hi,
>>> >
>>> >   I have large RDF data.
>>> >
>>> >    The requirement is to be able to reason / run rules on this 
>>> > data /
>>> >
>>> > search this data along with any other unstructured data which  I 
>>> > have
>>> enhanced using  Stanbol.
>>> >
>>> >
>>> >
>>> > Any pointers on how I can achieve this?
>>> >
>>> >
>>> >
>>> >
>>> >
>>> > Thanking you and Rgds,
>>> >
>>> > Arthi
>>> >
>>> >
>>> >
>>> >
>>> > Please do not print this email unless it is absolutely necessary.
>>> >
>>> > The information contained in this electronic message and any 
>>> > attachments
>>> to this message are intended for the exclusive use of the
>>> addressee(s) and may contain proprietary, confidential or privileged 
>>> information. If you are not the intended recipient, you should not 
>>> disseminate, distribute or copy this e-mail. Please notify the 
>>> sender immediately and destroy all copies of this message and any attachments.
>>> >
>>> > WARNING: Computer viruses can be transmitted via email. The 
>>> > recipient
>>> should check this email and any attachments for the presence of viruses.
>>> The company accepts no liability for any damage caused by any virus 
>>> transmitted by this email.
>>> >
>>> > www.wipro.com
>>>
>>>
>>>
>>> --
>>> | Rupert Westenthaler             rupert.westenthaler@gmail.com
>>> | Bodenlehenstraße 11                             ++43-699-11108907
>>> | A-5500 Bischofshofen
>>>
>>> Please do not print this email unless it is absolutely necessary.
>>>
>>> The information contained in this electronic message and any 
>>> attachments to this message are intended for the exclusive use of 
>>> the
>>> addressee(s) and may contain proprietary, confidential or privileged 
>>> information. If you are not the intended recipient, you should not 
>>> disseminate, distribute or copy this e-mail. Please notify the 
>>> sender immediately and destroy all copies of this message and any attachments.
>>>
>>> WARNING: Computer viruses can be transmitted via email. The 
>>> recipient should check this email and any attachments for the presence of viruses.
>>> The company accepts no liability for any damage caused by any virus 
>>> transmitted by this email.
>>>
>>> www.wipro.com
>>>
>
>
>
> --
> | Rupert Westenthaler             rupert.westenthaler@gmail.com
> | Bodenlehenstraße 11                             ++43-699-11108907
> | A-5500 Bischofshofen
>
> Please do not print this email unless it is absolutely necessary.
>
> The information contained in this electronic message and any attachments to this message
are intended for the exclusive use of the addressee(s) and may contain proprietary, confidential
or privileged information. If you are not the intended recipient, you should not disseminate,
distribute or copy this e-mail. Please notify the sender immediately and destroy all copies
of this message and any attachments.
>
> WARNING: Computer viruses can be transmitted via email. The recipient should check this
email and any attachments for the presence of viruses. The company accepts no liability for
any damage caused by any virus transmitted by this email.
>
> www.wipro.com



-- 
| Rupert Westenthaler             rupert.westenthaler@gmail.com
| Bodenlehenstraße 11                             ++43-699-11108907
| A-5500 Bischofshofen

Please do not print this email unless it is absolutely necessary. 

The information contained in this electronic message and any attachments to this message are
intended for the exclusive use of the addressee(s) and may contain proprietary, confidential
or privileged information. If you are not the intended recipient, you should not disseminate,
distribute or copy this e-mail. Please notify the sender immediately and destroy all copies
of this message and any attachments. 

WARNING: Computer viruses can be transmitted via email. The recipient should check this email
and any attachments for the presence of viruses. The company accepts no liability for any
damage caused by any virus transmitted by this email. 

www.wipro.com
Mime
View raw message