jackrabbit-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bertrand Delacretaz <bdelacre...@apache.org>
Subject Re: [jr3] Search index in content
Date Thu, 18 Feb 2010 09:48:41 GMT

On Thu, Feb 18, 2010 at 8:39 AM, Thomas Müller <thomas.mueller@day.com> wrote:
> The property/value indexes are for property values, node names, paths,
> node references, and so on. ....
> ...Currently, we use Apache Lucene for this, but I wouldn't. I
> would keep those indexes within the repository.
> ...The fulltext index is (potentially) slow, specially fulltext
> extraction. Therefore, fulltext index should be done asynchronously if
> it takes too long....

I love this idea of separating the two kinds of indexes, having the
fulltext "eventually indexed" might be good enough, with interfaces to
find out about the status of the indexing queue.

I am involved in the IKS project (http://iks-project.eu/) where we
envision new types of indexing/search for content-based applications,
and in this perspective it might make sense to be able to add more
indexing methods.

So as we're dreaming aloud, my ideal view of Jackrabbit
indexing/search would be:

1. The "structural index" (your first type) is managed by Jackrabbit,
doesn't require configuration, behaves like a database index
(synchronous, transactional, stored in repository, etc.)

2. The "standard fulltext index" uses Lucene, large items are queued
for eventual indexing, can be delegated to a separate cluster,
configurable as to what to index and what not, can be disabled, etc.
Ideally stored in repository.

3. Additional "custom external indexes" can be configured, work like
the Lucene index but using external components (a la Solr for example,
RESTful indexing engines). Not sure how the JCR query syntax can
address those, that's a different problem.


View raw message