Return-Path: Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: (qmail 80216 invoked from network); 3 Jul 2007 12:20:38 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 3 Jul 2007 12:20:38 -0000 Received: (qmail 70104 invoked by uid 500); 3 Jul 2007 12:20:33 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 69892 invoked by uid 500); 3 Jul 2007 12:20:32 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 69881 invoked by uid 99); 3 Jul 2007 12:20:32 -0000 Received: from herse.apache.org (HELO herse.apache.org) (140.211.11.133) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 03 Jul 2007 05:20:32 -0700 X-ASF-Spam-Status: No, hits=0.0 required=10.0 tests= X-Spam-Check-By: apache.org Received-SPF: pass (herse.apache.org: local policy) Received: from [66.111.4.26] (HELO out2.smtp.messagingengine.com) (66.111.4.26) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 03 Jul 2007 05:20:28 -0700 Received: from compute1.internal (compute1.internal [10.202.2.41]) by out1.messagingengine.com (Postfix) with ESMTP id 98B9E5D1D for ; Tue, 3 Jul 2007 08:20:07 -0400 (EDT) Received: from web8.messagingengine.com ([10.202.2.217]) by compute1.internal (MEProxy); Tue, 03 Jul 2007 08:20:07 -0400 Received: by web8.messagingengine.com (Postfix, from userid 99) id 686931638C; Tue, 3 Jul 2007 08:20:07 -0400 (EDT) Message-Id: <1183465207.11743.1198277029@webmail.messagingengine.com> X-Sasl-Enc: 02bUu2JkvjyirjnlvcuUEfBvcPNW2y2eZIDXKbinXuny 1183465207 From: "Michael McCandless" To: java-user@lucene.apache.org Content-Disposition: inline Content-Transfer-Encoding: 7bit Content-Type: text/plain; charset="ISO-8859-1" MIME-Version: 1.0 X-Mailer: MessagingEngine.com Webmail Interface References: <1183461674.2220.1198268681@webmail.messagingengine.com> <4f9b23800707030432i1067a67auc8fc0f83abeeea0@mail.gmail.com> Subject: Re: Lucene 2.2, NFS, Lock obtain timed out In-Reply-To: <4f9b23800707030432i1067a67auc8fc0f83abeeea0@mail.gmail.com> Date: Tue, 03 Jul 2007 08:20:07 -0400 X-Virus-Checked: Checked by ClamAV on apache.org OK I opened issue LUCENE-948, and attached a patch & new 2.2.0 JAR. Please make sure you use the "take2" versions (they have added instrumentation to help us debug): https://issues.apache.org/jira/browse/LUCENE-948 Patrick, could you please test the above "take2" JAR? Could you also call IndexWriter.setDefaultInfoStream(...) and capture all output from both machines (it will produce quite a bit of output). However: I'm now concerned about another potential impact of stale directory listing caches, specifically that the writer on the 2nd machine will not see the current segments_N file written by the first machine and will incorrectly remove the newly created files. I think that "take2" JAR should at least resolve this FileNotFoundException but I think likely you are about to hit this new issue. Mike "Patrick Kimber" wrote: > Hi Michael > > I am really pleased we have a potential fix. I will look out for the > patch. > > Thanks for your help. > > Patrick > > On 03/07/07, Michael McCandless wrote: > > > > "Patrick Kimber" wrote: > > > > > I am using the NativeFSLockFactory. I was hoping this would have > > > stopped these errors. > > > > I believe this is not a locking issue and NativeFSLockFactory should > > be working correctly over NFS. > > > > > Here is the whole of the stack trace: > > > > > > Caused by: java.io.FileNotFoundException: > > > /mnt/nfstest/repository/lucene/lucene-icm-test-1-0/segments_n (No such > > > file or directory) > > > at java.io.RandomAccessFile.open(Native Method) > > > at java.io.RandomAccessFile.(RandomAccessFile.java:204) > > > at org.apache.lucene.store.FSDirectory$FSIndexInput$Descriptor.(FSDirectory.java:506) > > > at org.apache.lucene.store.FSDirectory$FSIndexInput.(FSDirectory.java:536) > > > at org.apache.lucene.store.FSDirectory$FSIndexInput.(FSDirectory.java:531) > > > at org.apache.lucene.store.FSDirectory.openInput(FSDirectory.java:440) > > > at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:193) > > > at org.apache.lucene.index.IndexFileDeleter.(IndexFileDeleter.java:156) > > > at org.apache.lucene.index.IndexWriter.init(IndexWriter.java:626) > > > at org.apache.lucene.index.IndexWriter.(IndexWriter.java:573) > > > at com.subshell.lucene.indexaccess.impl.IndexAccessProvider.getWriter(IndexAccessProvider.java:68) > > > at com.subshell.lucene.indexaccess.impl.LuceneIndexAccessor.getWriter(LuceneIndexAccessor.java:171) > > > at com.company.lucene.RepositoryWriter.addDocument(RepositoryWriter.java:176) > > > ... 13 more > > > > OK, indeed the exception is inside IndexFileDeleter's initialization > > (this is what I had guessed might be happening). > > > > > I have added more logging to my test application. I have two servers > > > writing to a shared Lucene index on an NFS partition... > > > > > > Here is the logging from one server... > > > > > > [10:49:18] [DEBUG] LuceneIndexAccessor closing cached writer > > > [10:49:18] [DEBUG] ExpirationTimeDeletionPolicy onCommit() delete > > > [segments_n] > > > > > > and the other server (at the same time): > > > > > > [10:49:18] [DEBUG] LuceneIndexAccessor opening new writer and caching it > > > [10:49:18] [DEBUG] IndexAccessProvider getWriter() > > > [10:49:18] [ERROR] DocumentCollection update(DocumentData) > > > com.company.lucene.LuceneIcmException: I/O Error: Cannot add the > > > document to the index. > > > [/mnt/nfstest/repository/lucene/lucene-icm-test-1-0/segments_n (No > > > such file or directory)] > > > at > > > com.company.lucene.RepositoryWriter.addDocument(RepositoryWriter.java:182) > > > > > > I think the exception is being thrown when the IndexWriter is created: > > > new IndexWriter(directory, false, analyzer, false, deletionPolicy); > > > > > > I am confused... segments_n should not have been touched for 3 minutes > > > so why would a new IndexWriter want to read it? > > > > Whenever a writer is opeened, it initializes the deleter > > (IndexFileDeleter). During that initialization, we list all files in > > the index directory, and for every segments_N file we find, we open it > > and "incref" all index files that it's using. We then call the > > deletion policy's "onInit" to give it a chance to remove any of these > > commit points. > > > > What's happening here is the NFS directory listing is "stale" and is > > reporting that segments_n exists when in fact it doesn't. This is > > almost certainly due to the NFS client's caching (directory listing > > caches are in general not coherent for NFS clients, ie, they can "lie" > > for a short period of time, especially in cases like this). > > > > I think this fix is fairly simple: we should catch the > > FileNotFoundException and handle that as if the file did not exist. I > > will open a Jira issue & get a patch. > > > > Mike > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org > > For additional commands, e-mail: java-user-help@lucene.apache.org > > > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org > For additional commands, e-mail: java-user-help@lucene.apache.org > --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org For additional commands, e-mail: java-user-help@lucene.apache.org