Return-Path: Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: (qmail 99967 invoked from network); 8 May 2008 08:02:57 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 8 May 2008 08:02:57 -0000 Received: (qmail 78179 invoked by uid 500); 8 May 2008 08:02:51 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 78140 invoked by uid 500); 8 May 2008 08:02:51 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 78129 invoked by uid 99); 8 May 2008 08:02:51 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 08 May 2008 01:02:51 -0700 X-ASF-Spam-Status: No, hits=-0.0 required=10.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: local policy) Received: from [216.75.55.147] (HELO stimulussoft.com) (216.75.55.147) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 08 May 2008 08:02:05 +0000 Received: (qmail 9695 invoked by uid 511); 8 May 2008 01:02:19 -0700 Received: from 41.243.250.164 by live.stimulussoft.com (envelope-from , uid 509) with qmail-scanner-1.25-st-qms (spamassassin: 3.2.3. perlscan: 1.25-st-qms. Clear:RC:0(41.243.250.164):SA:0(-1.9/5.0):. Processed in 1.397995 secs); 08 May 2008 08:02:19 -0000 X-Antivirus-MYDOMAIN-Mail-From: jamie@stimulussoft.com via live.stimulussoft.com X-Antivirus-MYDOMAIN: 1.25-st-qms (Clear:RC:0(41.243.250.164):SA:0(-1.9/5.0):. Processed in 1.397995 secs Process 9685) Received: from dsl-243-250-164.telkomadsl.co.za (HELO ?192.168.0.50?) (jamie@stimulussoft.com@41.243.250.164) by stimulussoft.com with SMTP; 8 May 2008 01:02:17 -0700 Message-ID: <4822B474.8020806@stimulussoft.com> Date: Thu, 08 May 2008 10:06:12 +0200 From: Jamie User-Agent: Thunderbird 2.0.0.14 (Windows/20080421) MIME-Version: 1.0 To: java-user@lucene.apache.org Subject: Re: Serious Index Corruption Error - FileNotFoundException References: <126142c0805061426n1168421ya5594ef854fae5e4@mail.gmail.com> <126142c0805061515p652410ebt1744c2ccd7afb8e2@mail.gmail.com> <0208CA7F-6822-4B1E-95B4-9BCF1ADD6EFB@mikemccandless.com> <48218C59.2030100@stimulussoft.com> <4822B00F.3070308@stimulussoft.com> In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org X-Old-Spam-Status: No, hits=-1.9 required=5.0 Hi Michael I had in fact preempted you and moved the delete lock code to a startup function. However, I found a nice little optimization that seems to force the writer to close when the process is manually killed. I added a JVM shutdown hook (i.e. using Runtime.getRuntime().addShutdownHook(this) ) that closes the writer. This seems to reduce the number of instances where the lock file is left lying around. Perhaps, it is something that could be included in the base lucene code. Also, are you aware of any chkdsk like utility for Lucene? i.e. in the event that an index is corrupted through disk error or otherwise? Many thanks! Jamie Michael McCandless wrote: > > OK, that sounds like a legitimate reason to forcibly remove the write > lock, but it would be better to do that only on startup of your > process rather than in every openIndex() call. > > If ever you hit LockObtainFailedException in openIndex, even after > having deleted the write lock on startup, then that means there's a > bug (ie, two writers are in fact trying to open on the same index, one > of the two cases below). > > Mike > > Jamie wrote: >> Hi Mike >> >> Thanks for the suggestions. I've implemented all of them. The main >> reason why I manually deleted the lock file was that sometimes users >> kill the server process manually or there is a hard reboot without >> any warning. In such circumstances, Lucene leaves a lock file lying >> around as it was busy writing to the index. Now, I understand that >> one shouldn't simply delete the lock file, but what do you suggest my >> users do? The server must continue running... the only way that I see >> how is to delete the lock file, unless there is the equivalent of >> chkdsk for Lucene indexes that I could run. >> >> Regards, >> >> Jmaie >> >> >> Michael McCandless wrote: >>> >>> On quickly looking through the code I think there are some serious >>> hazards that could lead to this exception. >>> >>> First, in your openIndex code, if you hit a >>> LockObtainFailedException in trying to open your writer, you are >>> forcefully removing the write lock and then retrying. Yet, you also >>> open an IndexReader to delete documents, which acquires the write >>> lock. If ever you have this IndexReader open, and then you >>> forcefully remove the write lock and open the writer, that would >>> cause this exception. >>> >>> Second, you have a deletIndex method, which first tries to use the >>> writer with create=true (good) but then falls back to manually >>> removing the files. Why is that fallback necessary? If, for >>> example, you are also hitting a LockObtainFailedException, then >>> forcefully removing files while an IndexReader or IndexWriter holds >>> the write lock is also dangerous and would lead to this exception. >>> >>> In general it's very dangerous to forcibly remove, or ignore, >>> Lucene's write lock. It really should only be necessary when >>> something catastrophic occurred (JVM crashed). >>> >>> Also, note that IndexWriter can now delete documents. This would >>> simplify your code and possibly fix these two hazards. >>> >>> Do you see any of the error/warnings that you send to your logger? >>> (They would be corroborating evidence here). >>> >>> Mike --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org For additional commands, e-mail: java-user-help@lucene.apache.org