Return-Path: Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: (qmail 49127 invoked from network); 6 Oct 2007 09:07:49 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 6 Oct 2007 09:07:49 -0000 Received: (qmail 86401 invoked by uid 500); 6 Oct 2007 09:07:31 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 86357 invoked by uid 500); 6 Oct 2007 09:07:31 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 86346 invoked by uid 99); 6 Oct 2007 09:07:31 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 06 Oct 2007 02:07:31 -0700 X-ASF-Spam-Status: No, hits=-0.0 required=10.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: local policy) Received: from [66.111.4.25] (HELO out1.smtp.messagingengine.com) (66.111.4.25) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 06 Oct 2007 09:07:34 +0000 Received: from compute2.internal (compute2.internal [10.202.2.42]) by out1.messagingengine.com (Postfix) with ESMTP id CBA282F4D0 for ; Sat, 6 Oct 2007 05:04:03 -0400 (EDT) Received: from web8.messagingengine.com ([10.202.2.217]) by compute2.internal (MEProxy); Sat, 06 Oct 2007 05:04:03 -0400 Received: by web8.messagingengine.com (Postfix, from userid 99) id A98CADA1C; Sat, 6 Oct 2007 05:04:03 -0400 (EDT) Message-Id: <1191661443.14232.1214452223@webmail.messagingengine.com> X-Sasl-Enc: KSVaiwvW+HNFpZPIdE9eGUPdnA7rJrzWmfA9IneRiH1V 1191661443 From: "Michael McCandless" To: java-user@lucene.apache.org Content-Disposition: inline Content-Transfer-Encoding: 7bit Content-Type: text/plain; charset="ISO-8859-1" MIME-Version: 1.0 X-Mailer: MessagingEngine.com Webmail Interface References: <1191585112.19609.1214311329@webmail.messagingengine.com> Subject: Re: Help with Lucene Indexer crash recovery In-Reply-To: Date: Sat, 06 Oct 2007 05:04:03 -0400 X-Virus-Checked: Checked by ClamAV on apache.org "vivek sar" wrote: > Sorry, I'm using Lucene 2.2. We are using Lucene to index our database > (Oracle) into documents for full-text search feature. Here is the > process of indexing, > > 1) Have two IndexWriters which run in two different threads and write > to two different directories (temporary indexes). They both read from > the same queue (db resultset queue) and then right to the index. Close > the indexwriters once done. > 2) Once the IndexWriters are done we start the MasterIndex, which is > another IndexWriter. This merges the indexes in those two temporary > indexes. > 3) Once the writer.addIndexes is done I call writer.optimize() and > then writer.close(). > 4) Our IndexSearcher reads only from the MasterIndex This process sounds fine, though as Karl pointed out you could let the reader before you start the optimize. You could also consider skipping the optimize entirely, unless the search latency is in fact too high (or throughput too low) without it. > Once in a while we kill the running application using "kill -9". I > think if the IndexWriter is in process of merging and we kill it we > run into this problem. It has already happened few times in last one > week. I do clean up the lock if there is a write.lock at the startup > of the system. I can not recreate the index as it may take hours to > re-index. As Hoss pointed out, "kill -9" really should be a means of last resort. That said, it should never in fact cause index corruption, as far as I know. Lucene is "semi-transactional": at any & all moments you should be able to destroy the JVM and the index will be unharmed. I would really like to get to the bottom of why this is not the case here. So you've noticed that if kill -9 is sent while the addIndexes is happening then that can lead to this corruption? If possible, could you use IndexWriter.setInfoStream(...) during at least that step to get verbose details about what the writer is doing, and then capture that output & post it the next time you get this error to happen? That would go a long ways to getting to the root cause here. Which OS and file system are you using? Are all these steps happening on a single machine & JVM? > I don't have any shutdown hook right now, but I'm thinking of adding > one for graceful index closing. We use following merge parameters, > > mergeFactor=100 > maxMergeDocs=99999 > maxBufferedDocs=1000 Seems OK. > I can try out your tool, is it something that can be integrated into > the application itself? So, basically I'm looking to catch the > "FileNotFoundException" and take some action to recover from it. Well, once the tool has been tested and shown to be bug-free then you could in theory use this as a live recovery inside the application. But for starters I would run it from the command line without the -check. Be very careful: this is totally new code and it could make your situation even worse, if it has any bugs. And remember when the tool works, it will have removed a whole segment from your index which means possibly a great many documents are now gone. Also, it would be far better to get to the root cause & fix it, instead of having to use this tool perpetually. Mike --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org For additional commands, e-mail: java-user-help@lucene.apache.org