incubator-clerezza-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tommaso Teofili <tommaso.teof...@gmail.com>
Subject Re: Composite Resource Indexing Service: ready for review.
Date Sun, 12 Jun 2011 18:49:52 GMT
Hello all,
after reviewing the CRIS module I am +1 for committing it.
Regarding LuceneTools it looks good to me, only I'm not too sure about the
call to optimize() just before the IndexWriter gets closed, in fact that
could lead to long operations whit large indexes one could decide to
postpone (the IndexWriter is exposed so explicit calls are allowed); I'd
suggest to add a parameter to decide wether the optimize() should be called
on close and set it to true by default. A possible enhancement could be to
create an OptimizeThread with a timeInterval constructor which optimizes the
index, if there is not "much" activity (i.e.: analyze isLocked, InfoStream,
etc), every timeInterval seconds.
Great work guys :)
Tommaso


2011/6/3 Tommaso Teofili <tommaso.teofili@gmail.com>

>
>
> 2011/6/2 Tsuyoshi Ito <tsuy.ito@trialox.org>
>
>> Dear all
>>
>> Composite Resource Indexing Service is now ready for review (issue
>> CLEREZZA-501). Junit Tests and documentation is available (install
>> rdf.cris/core on clerezza and search for Composite Resource Indexing
>> Service under /documentation)
>>
>> excerpt:
>>
>> CRIS is based on Apache Lucene and provides means to index RDF
>> resources. It works by indexing the values of properties on a
>> resource. This enables to search for the property values using CRIS.
>> The results that CRIS delivers are the corresponding RDF resources.
>>
>> GraphIndexer
>> The core of CRIS is the GraphIndexer class. Note that GraphIndexer is
>> not an OSGi service, but it has to be instantiated by the user to
>> provide an index. The GraphIndexer needs two graphs to work with. One
>> graph contains the IndexDefinitions, that is the specification of
>> which resources and properties to index (see IndexDefinitionManager).
>> The other graph is the the graph that contains the resources to index.
>> Note that CRIS indexes RDF resources based on their rdf:type and that
>> the indexing works on a per-property basis. That means, not all
>> properties on a resource are indexed by default. The user has to
>> specify which properties to index.
>> GraphIndexer also provides the interface to search for resources using
>> the findResources method. The search is specified using Conditions and
>> optionally a SortSpecification and FacetCollectors. The findResources
>> method is overloaded with methods that allow the specification of the
>> resource type and search query directly.
>>
>>
>> IndexDefinitionManager
>> The IndexDefinitionManager helps to manage indexing specifications
>> using the CRIS ontology in the index definition graph (see
>> GraphIndexer). Indexing is enabled for resources according to their
>> rdf:type. Additionally the index definitions specify the properties of
>> the resource that are indexed.
>>
>> One can think of an index definition as specifying the keys
>> (properties) that are mapped to the value (the resource URI) in the
>> index.
>>
>> ....
>>
>>
>> Note:
>> - GraphIndexer is quit complex and has many responsibilities.
>> - No other clerezza project depends on Composite Resource Indexing
>> Service.
>> - GraphIndexer is available as Platform CRIS Service in project
>> platform.cris (for the contentgraph incl. additions)
>>
>> @Tommaso
>> Lucene is used in LuceneTools.java in rdf.cris/core. Feedback
>> appreciated - I have little experience with lucene, so feel free to
>> improve it. Especially I am not sure when to call optimize (see
>> comment in LuceneTools)
>>
>
> Uber cool Tsuy and others, I'll definitely have a deep look there, thanks
> for this awesome work!
> Tommaso
>
>
>>
>> Thanks to Reto, Daniel and Hasan for the work! We already use it in a
>> monitoring tool - the performance is outstanding compared to the
>> available alternatives in clerezza (filter resp. sparql)
>>
>> Cheers
>> Tsuy
>>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message