jackrabbit-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Thomas Müller <thomas.muel...@day.com>
Subject Re: [jr3] Search index in content
Date Thu, 18 Feb 2010 07:39:56 GMT
Hi,

For me, there are two kinds of indexes: the property/value indexes,
and the fulltext index.

The property/value indexes are for property values, node names, paths,
node references, and so on. Such indexes (or "indices") are relatively
small and fast. In relational databases, those are the secondary
indexes (non-primary-key indexes). Those index updates should be done
synchronously as part of the transaction (maybe even in the transient
space). Currently, we use Apache Lucene for this, but I wouldn't. I
would keep those indexes within the repository.

The fulltext index is (potentially) slow, specially fulltext
extraction. Therefore, fulltext index should be done asynchronously if
it takes too long. Also, in a clustered environment, at least text
extraction should only be done in one cluster node. I would still use
Apache Tika and Apache Lucene for this.

Regards,
Thomas

Mime
View raw message