From java-user-return-28988-apmail-lucene-java-user-archive=lucene.apache.org@lucene.apache.org Tue Jul 03 11:33:18 2007 Return-Path: Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: (qmail 60377 invoked from network); 3 Jul 2007 11:33:17 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 3 Jul 2007 11:33:17 -0000 Received: (qmail 77715 invoked by uid 500); 3 Jul 2007 11:33:11 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 77674 invoked by uid 500); 3 Jul 2007 11:33:11 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 77663 invoked by uid 99); 3 Jul 2007 11:33:11 -0000 Received: from herse.apache.org (HELO herse.apache.org) (140.211.11.133) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 03 Jul 2007 04:33:11 -0700 X-ASF-Spam-Status: No, hits=-0.0 required=10.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (herse.apache.org: domain of mailing.patrick.kimber@gmail.com designates 209.85.146.181 as permitted sender) Received: from [209.85.146.181] (HELO wa-out-1112.google.com) (209.85.146.181) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 03 Jul 2007 04:33:07 -0700 Received: by wa-out-1112.google.com with SMTP id j40so2868713wah for ; Tue, 03 Jul 2007 04:32:47 -0700 (PDT) DKIM-Signature: a=rsa-sha1; c=relaxed/relaxed; d=gmail.com; s=beta; h=domainkey-signature:received:received:message-id:date:from:to:subject:in-reply-to:mime-version:content-type:content-transfer-encoding:content-disposition:references; b=kNqjGVhdB6+WvhPp/yigGETkgHd3AX3OyG+uztWzx1Aw/ZAvtnPIl7dU+W/66ahdQvaRPDvG6nZj8NeadHvpPByx31Wqcpt5GicsiHQYKz4v7M8qaJty0epeJZxQAX+PLhFRcG08+QnjLAAJxo7QuF2JezG4KvMhl3EywwUA4xE= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=beta; h=received:message-id:date:from:to:subject:in-reply-to:mime-version:content-type:content-transfer-encoding:content-disposition:references; b=lHzW6kRXUnbtxJHt6Zd5MoQrBjZ3uGyz7mQuOPhyf+qJfq0oZfeE7CSGNk/kPQDVJBJMAhbSaNj7Fv7Gfna6TDTtF+mCjYabF7Mtbw/ra/gYiKOf2+PEE9oymIEg6WtecLSuqLPYhVL3lE92PgPXZsEtfVkEvgYBMLoYOy7ztZo= Received: by 10.115.111.1 with SMTP id o1mr6065063wam.1183462363986; Tue, 03 Jul 2007 04:32:43 -0700 (PDT) Received: by 10.115.55.4 with HTTP; Tue, 3 Jul 2007 04:32:43 -0700 (PDT) Message-ID: <4f9b23800707030432i1067a67auc8fc0f83abeeea0@mail.gmail.com> Date: Tue, 3 Jul 2007 12:32:43 +0100 From: "Patrick Kimber" To: java-user@lucene.apache.org Subject: Re: Lucene 2.2, NFS, Lock obtain timed out In-Reply-To: <1183461674.2220.1198268681@webmail.messagingengine.com> MIME-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Content-Disposition: inline References: <1183461674.2220.1198268681@webmail.messagingengine.com> X-Virus-Checked: Checked by ClamAV on apache.org Hi Michael I am really pleased we have a potential fix. I will look out for the patch. Thanks for your help. Patrick On 03/07/07, Michael McCandless wrote: > > "Patrick Kimber" wrote: > > > I am using the NativeFSLockFactory. I was hoping this would have > > stopped these errors. > > I believe this is not a locking issue and NativeFSLockFactory should > be working correctly over NFS. > > > Here is the whole of the stack trace: > > > > Caused by: java.io.FileNotFoundException: > > /mnt/nfstest/repository/lucene/lucene-icm-test-1-0/segments_n (No such > > file or directory) > > at java.io.RandomAccessFile.open(Native Method) > > at java.io.RandomAccessFile.(RandomAccessFile.java:204) > > at org.apache.lucene.store.FSDirectory$FSIndexInput$Descriptor.(FSDirectory.java:506) > > at org.apache.lucene.store.FSDirectory$FSIndexInput.(FSDirectory.java:536) > > at org.apache.lucene.store.FSDirectory$FSIndexInput.(FSDirectory.java:531) > > at org.apache.lucene.store.FSDirectory.openInput(FSDirectory.java:440) > > at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:193) > > at org.apache.lucene.index.IndexFileDeleter.(IndexFileDeleter.java:156) > > at org.apache.lucene.index.IndexWriter.init(IndexWriter.java:626) > > at org.apache.lucene.index.IndexWriter.(IndexWriter.java:573) > > at com.subshell.lucene.indexaccess.impl.IndexAccessProvider.getWriter(IndexAccessProvider.java:68) > > at com.subshell.lucene.indexaccess.impl.LuceneIndexAccessor.getWriter(LuceneIndexAccessor.java:171) > > at com.company.lucene.RepositoryWriter.addDocument(RepositoryWriter.java:176) > > ... 13 more > > OK, indeed the exception is inside IndexFileDeleter's initialization > (this is what I had guessed might be happening). > > > I have added more logging to my test application. I have two servers > > writing to a shared Lucene index on an NFS partition... > > > > Here is the logging from one server... > > > > [10:49:18] [DEBUG] LuceneIndexAccessor closing cached writer > > [10:49:18] [DEBUG] ExpirationTimeDeletionPolicy onCommit() delete > > [segments_n] > > > > and the other server (at the same time): > > > > [10:49:18] [DEBUG] LuceneIndexAccessor opening new writer and caching it > > [10:49:18] [DEBUG] IndexAccessProvider getWriter() > > [10:49:18] [ERROR] DocumentCollection update(DocumentData) > > com.company.lucene.LuceneIcmException: I/O Error: Cannot add the > > document to the index. > > [/mnt/nfstest/repository/lucene/lucene-icm-test-1-0/segments_n (No > > such file or directory)] > > at > > com.company.lucene.RepositoryWriter.addDocument(RepositoryWriter.java:182) > > > > I think the exception is being thrown when the IndexWriter is created: > > new IndexWriter(directory, false, analyzer, false, deletionPolicy); > > > > I am confused... segments_n should not have been touched for 3 minutes > > so why would a new IndexWriter want to read it? > > Whenever a writer is opeened, it initializes the deleter > (IndexFileDeleter). During that initialization, we list all files in > the index directory, and for every segments_N file we find, we open it > and "incref" all index files that it's using. We then call the > deletion policy's "onInit" to give it a chance to remove any of these > commit points. > > What's happening here is the NFS directory listing is "stale" and is > reporting that segments_n exists when in fact it doesn't. This is > almost certainly due to the NFS client's caching (directory listing > caches are in general not coherent for NFS clients, ie, they can "lie" > for a short period of time, especially in cases like this). > > I think this fix is fairly simple: we should catch the > FileNotFoundException and handle that as if the file did not exist. I > will open a Jira issue & get a patch. > > Mike > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org > For additional commands, e-mail: java-user-help@lucene.apache.org > > --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org For additional commands, e-mail: java-user-help@lucene.apache.org