jackrabbit-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dennis van der Laan <d.g.van.der.l...@rug.nl>
Subject Re: Lucene consistency in clustered environment
Date Wed, 29 Sep 2010 10:33:22 GMT
 Thanks for the reply. We did setup all machines as a Jackrabbit cluster
from the start.  We don't have a complete journal starting from the
first operation ever done on the repository. One node is running the
janitor process, so if all current nodes are in sync, the old journal
entries are removed.

We currently rebuild the Lucene indexes by shutting down the
repositories one by one, removing the 'version' and 'workspaces' folder
from the repository root folder and then starting the repositories
again. All works well after that, but it is an ugly workaround.
Rebuilding the indexes currently takes about 2.5 hours per machine
(approx. 300.000 documents in the repository).

The only thing we see right now is that some documents don't seem to get
indexed on one machine, but do get indexed on another. From your reply I
understand that this should not be the case with Lucene, is it?


On 28-9-2010 14:19, Ian Boston wrote:
> Did you setup the machines as a Jackrabbit cluster as per [1] right from the start ?
> And do you have a complete Journal right from the first operation in the JCR (or local
state snapshots)
> The local state of the Lucene index is dependent on every node in the cluster replaying
every event from the Journal to ensure that they all contain the same content. The Journal
is also used for sharedItem state invalidation which will ensure that stale items do not get
into responses.
> Ian
> 1 http://wiki.apache.org/jackrabbit/Clustering
> On 28 Sep 2010, at 12:11, Dennis van der Laan wrote:
>> Hi all,
>> We are using Jackrabbit 1.6.1 in a production environment. It is
>> clustered across 6 machines, which all store the Lucene indexes on a
>> local disk. After using this setup for a couple of months, we are seeing
>> the Lucene indexes differ per machine. Not only the size of the indexes,
>> but some documents seem to be indexed on one machine, but not on another
>> machine. So doing a full-text search (xpath with a ' contains' clause)
>> will have different results depending on the machine the query is run on.
>> I am a complete Lucene novice, so my question is: is this
>> non-deterministic behaviour a characteristic of Lucene (and should we
>> rebuild all indexes on a regular basis to keep them in sync) or is
>> something going wrong here?
>> Thanks for any help!
>> Beste regards,
>> Dennis
>> -- 
>> Dennis van der Laan, MSc
>> Centre for Information Technology
>> University of Groningen

Dennis van der Laan

  • Unnamed multipart/mixed (inline, None, 0 bytes)
View raw message