incubator-clerezza-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tsuyoshi Ito <tsuy....@trialox.org>
Subject Composite Resource Indexing Service: ready for review.
Date Thu, 02 Jun 2011 16:10:03 GMT
Dear all

Composite Resource Indexing Service is now ready for review (issue
CLEREZZA-501). Junit Tests and documentation is available (install
rdf.cris/core on clerezza and search for Composite Resource Indexing
Service under /documentation)

excerpt:

CRIS is based on Apache Lucene and provides means to index RDF
resources. It works by indexing the values of properties on a
resource. This enables to search for the property values using CRIS.
The results that CRIS delivers are the corresponding RDF resources.

GraphIndexer
The core of CRIS is the GraphIndexer class. Note that GraphIndexer is
not an OSGi service, but it has to be instantiated by the user to
provide an index. The GraphIndexer needs two graphs to work with. One
graph contains the IndexDefinitions, that is the specification of
which resources and properties to index (see IndexDefinitionManager).
The other graph is the the graph that contains the resources to index.
Note that CRIS indexes RDF resources based on their rdf:type and that
the indexing works on a per-property basis. That means, not all
properties on a resource are indexed by default. The user has to
specify which properties to index.
GraphIndexer also provides the interface to search for resources using
the findResources method. The search is specified using Conditions and
optionally a SortSpecification and FacetCollectors. The findResources
method is overloaded with methods that allow the specification of the
resource type and search query directly.


IndexDefinitionManager
The IndexDefinitionManager helps to manage indexing specifications
using the CRIS ontology in the index definition graph (see
GraphIndexer). Indexing is enabled for resources according to their
rdf:type. Additionally the index definitions specify the properties of
the resource that are indexed.

One can think of an index definition as specifying the keys
(properties) that are mapped to the value (the resource URI) in the
index.

....


Note:
- GraphIndexer is quit complex and has many responsibilities.
- No other clerezza project depends on Composite Resource Indexing Service.
- GraphIndexer is available as Platform CRIS Service in project
platform.cris (for the contentgraph incl. additions)

@Tommaso
Lucene is used in LuceneTools.java in rdf.cris/core. Feedback
appreciated - I have little experience with lucene, so feel free to
improve it. Especially I am not sure when to call optimize (see
comment in LuceneTools)

Thanks to Reto, Daniel and Hasan for the work! We already use it in a
monitoring tool - the performance is outstanding compared to the
available alternatives in clerezza (filter resp. sparql)

Cheers
Tsuy

Mime
View raw message