Return-Path: Delivered-To: apmail-jackrabbit-users-archive@minotaur.apache.org Received: (qmail 7361 invoked from network); 29 Sep 2010 10:33:57 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 29 Sep 2010 10:33:57 -0000 Received: (qmail 75486 invoked by uid 500); 29 Sep 2010 10:33:57 -0000 Delivered-To: apmail-jackrabbit-users-archive@jackrabbit.apache.org Received: (qmail 75033 invoked by uid 500); 29 Sep 2010 10:33:53 -0000 Mailing-List: contact users-help@jackrabbit.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: users@jackrabbit.apache.org Delivered-To: mailing list users@jackrabbit.apache.org Received: (qmail 75024 invoked by uid 99); 29 Sep 2010 10:33:52 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 29 Sep 2010 10:33:52 +0000 X-ASF-Spam-Status: No, hits=-0.0 required=10.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of d.g.van.der.laan@rug.nl designates 129.125.60.6 as permitted sender) Received: from [129.125.60.6] (HELO smtp6.rug.nl) (129.125.60.6) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 29 Sep 2010 10:33:44 +0000 Received: from [129.125.249.92] ([172.23.16.211]) (authenticated bits=0) by smtp6.rug.nl (8.14.4/8.14.4) with ESMTP id o8TAXOZg017027 for ; Wed, 29 Sep 2010 12:33:24 +0200 Message-ID: <4CA315F2.8070706@rug.nl> Date: Wed, 29 Sep 2010 12:33:22 +0200 From: Dennis van der Laan Organization: Rijksuniversiteit Groningen / Centrum voor Informatietechnologie User-Agent: Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv:1.9.2.9) Gecko/20100915 Lightning/1.0b2 Thunderbird/3.1.4 MIME-Version: 1.0 To: users@jackrabbit.apache.org Subject: Re: Lucene consistency in clustered environment References: <4CA1CD76.9010103@rug.nl> <33BB0FEA-96AD-4EA0-B5EE-4AE0C212B693@tfd.co.uk> In-Reply-To: <33BB0FEA-96AD-4EA0-B5EE-4AE0C212B693@tfd.co.uk> Content-Type: multipart/mixed; boundary="------------090705060909050605040805" X-Virus-Scanned: clamav-milter 0.96.2 at smtp6 X-Virus-Status: Clean X-Virus-Checked: Checked by ClamAV on apache.org --------------090705060909050605040805 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Thanks for the reply. We did setup all machines as a Jackrabbit cluster from the start. We don't have a complete journal starting from the first operation ever done on the repository. One node is running the janitor process, so if all current nodes are in sync, the old journal entries are removed. We currently rebuild the Lucene indexes by shutting down the repositories one by one, removing the 'version' and 'workspaces' folder from the repository root folder and then starting the repositories again. All works well after that, but it is an ugly workaround. Rebuilding the indexes currently takes about 2.5 hours per machine (approx. 300.000 documents in the repository). The only thing we see right now is that some documents don't seem to get indexed on one machine, but do get indexed on another. From your reply I understand that this should not be the case with Lucene, is it? TIA, Dennis On 28-9-2010 14:19, Ian Boston wrote: > Did you setup the machines as a Jackrabbit cluster as per [1] right from the start ? > And do you have a complete Journal right from the first operation in the JCR (or local state snapshots) > > The local state of the Lucene index is dependent on every node in the cluster replaying every event from the Journal to ensure that they all contain the same content. The Journal is also used for sharedItem state invalidation which will ensure that stale items do not get into responses. > > HTH > Ian > > 1 http://wiki.apache.org/jackrabbit/Clustering > > On 28 Sep 2010, at 12:11, Dennis van der Laan wrote: > >> Hi all, >> >> We are using Jackrabbit 1.6.1 in a production environment. It is >> clustered across 6 machines, which all store the Lucene indexes on a >> local disk. After using this setup for a couple of months, we are seeing >> the Lucene indexes differ per machine. Not only the size of the indexes, >> but some documents seem to be indexed on one machine, but not on another >> machine. So doing a full-text search (xpath with a ' contains' clause) >> will have different results depending on the machine the query is run on. >> I am a complete Lucene novice, so my question is: is this >> non-deterministic behaviour a characteristic of Lucene (and should we >> rebuild all indexes on a regular basis to keep them in sync) or is >> something going wrong here? >> >> Thanks for any help! >> Beste regards, >> Dennis >> >> -- >> Dennis van der Laan, MSc >> Centre for Information Technology >> University of Groningen >> -- Dennis van der Laan --------------090705060909050605040805--