jackrabbit-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ian Boston <...@tfd.co.uk>
Subject Re: consistency guarantees of Jackrabbit/Lucene indexes
Date Thu, 14 May 2009 21:38:14 GMT
We have been running Jackrabbit 1.4 in production in a 8 way cluster  
for about a year now.
Not with a massive number of users (15K), although many actions cause  
JCR writes as the portal layout state of every user is stored in JCR,  
and we often see about 1000 concurrent.
We have had no problems with indexes on local disk, bodies on hard  
mounted NFS on a solid network and MySQL as the DB backend, however as  
a precaution we run some perl based rsyncs of the local space in  
Jackrabbit to take snapshots of the indexes. These run periodically  
rsyncing locally until no change is seen between syncs and then the  
image is saves centrally. At the same time it grabs the relevant  
information from the DB and checks local file states (yes Java binary  
files can be read in perl...its bytes after all)

We have tested recovery procedures including pulling power plugs, and  
recovery by restoring the last snapshot state, then bringing the node  
up appears to work. Its not ideal but avoids a 24h+ index rebuild when  
things go really wrong.

Now this is a hack, but it works for us and uses relatively simple and  
reliable unix commands.

On 14 May 2009, at 22:10, Alexander Klimetschek wrote:

> On Wed, May 13, 2009 at 6:36 AM, Johannes Boneschanscher
> <jackrabbit@boneschanscher.net> wrote:
>> Hi All,
>> I have searched the Internet and codebase of Jackrabbit about  
>> recoverability
>> of the Lucene Indexing in a cluster scenario, however I'm not certain
>> whether it is really recoverable. I hope someone can enlighten me.
>> To make failover of the Jackrabbit machine possible we have our  
>> files for
>> indexes of each node and of the FileDataStore on a network share.
>> We use JNDIDatabaseJournal for clustering two nodes on the same  
>> machine. The
>> version of Jackrabbit is Fri Jan 11 14:41:29 EET 2008 version=1.4  
>> (according
>> to the pom.properties inside Jackrabbit-Core)
>> As far as I understand from JCR-204
>> (http://issues.apache.org/jira/browse/JCR-204), which is still  
>> open, some
>> measures have been taken to make indexes recoverable.
>> Also JCR-905 (closed) and JCR-778 (closed) seem related.
>> In the past we have had issues with Jackrabbit that the connection  
>> to the
>> network share was unstable and the index became corrupted, we try  
>> to avoid
>> that (by moving it to a SAN with iSCSI), but as reindexing the entire
>> repository takes a lot of time, as we also index the content with  
>> almost all
>> text extractors (See:
>> http://jackrabbit.apache.org/api/1.4/org/apache/jackrabbit/extractor/package-summary.html)
>> we would like to know whether Jackrabbit can completely recover  
>> from this
>> kind of situation. (BTW: We solve this by restarting the AppServer
>> Jackrabbit is running on, and then the auto recover kicks in, I  
>> guess this
>> one:
>> http://svn.apache.org/viewvc/jackrabbit/branches/1.3/jackrabbit-core/src/main/java/org/apache/jackrabbit/core/query/lucene/Recovery.java?view=log&pathrev=544247#rev544247
>> )
>> If it can recover, why is JCR-204 still open? If it cannot recover,  
>> we would
>> have to use a local disk and we cannot cluster the machine anymore,  
>> and (if
>> I can find time) I'll try and fix the issue.
> Many things have improved in Jackrabbit 1.5, but I wouldn't generally
> rule out that the index might get broken when the underlying IO is
> disfunctional. So I'd choose the safe way.
> Regards,
> Alex
> -- 
> Alexander Klimetschek
> alexander.klimetschek@day.com

View raw message