lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Michael McCandless" <luc...@mikemccandless.com>
Subject Re: Lucene 2.2, NFS, Lock obtain timed out
Date Tue, 03 Jul 2007 12:20:07 GMT
OK I opened issue LUCENE-948, and attached a patch & new 2.2.0 JAR.
Please make sure you use the "take2" versions (they have added
instrumentation to help us debug):

    https://issues.apache.org/jira/browse/LUCENE-948

Patrick, could you please test the above "take2" JAR?  Could you also call
IndexWriter.setDefaultInfoStream(...) and capture all output from both
machines (it will produce quite a bit of output).

However: I'm now concerned about another potential impact of stale
directory listing caches, specifically that the writer on the 2nd
machine will not see the current segments_N file written by the first
machine and will incorrectly remove the newly created files.

I think that "take2" JAR should at least resolve this
FileNotFoundException but I think likely you are about to hit this new
issue.

Mike

"Patrick Kimber" <mailing.patrick.kimber@gmail.com> wrote:
> Hi Michael
> 
> I am really pleased we have a potential fix.  I will look out for the
> patch.
> 
> Thanks for your help.
> 
> Patrick
> 
> On 03/07/07, Michael McCandless <lucene@mikemccandless.com> wrote:
> >
> > "Patrick Kimber" <mailing.patrick.kimber@gmail.com> wrote:
> >
> > > I am using the NativeFSLockFactory.  I was hoping this would have
> > > stopped these errors.
> >
> > I believe this is not a locking issue and NativeFSLockFactory should
> > be working correctly over NFS.
> >
> > > Here is the whole of the stack trace:
> > >
> > > Caused by: java.io.FileNotFoundException:
> > > /mnt/nfstest/repository/lucene/lucene-icm-test-1-0/segments_n (No such
> > > file or directory)
> > >       at java.io.RandomAccessFile.open(Native Method)
> > >       at java.io.RandomAccessFile.<init>(RandomAccessFile.java:204)
> > >       at org.apache.lucene.store.FSDirectory$FSIndexInput$Descriptor.<init>(FSDirectory.java:506)
> > >       at org.apache.lucene.store.FSDirectory$FSIndexInput.<init>(FSDirectory.java:536)
> > >       at org.apache.lucene.store.FSDirectory$FSIndexInput.<init>(FSDirectory.java:531)
> > >       at org.apache.lucene.store.FSDirectory.openInput(FSDirectory.java:440)
> > >       at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:193)
> > >       at org.apache.lucene.index.IndexFileDeleter.<init>(IndexFileDeleter.java:156)
> > >       at org.apache.lucene.index.IndexWriter.init(IndexWriter.java:626)
> > >       at org.apache.lucene.index.IndexWriter.<init>(IndexWriter.java:573)
> > >       at com.subshell.lucene.indexaccess.impl.IndexAccessProvider.getWriter(IndexAccessProvider.java:68)
> > >       at com.subshell.lucene.indexaccess.impl.LuceneIndexAccessor.getWriter(LuceneIndexAccessor.java:171)
> > >       at com.company.lucene.RepositoryWriter.addDocument(RepositoryWriter.java:176)
> > >       ... 13 more
> >
> > OK, indeed the exception is inside IndexFileDeleter's initialization
> > (this is what I had guessed might be happening).
> >
> > > I have added more logging to my test application.  I have two servers
> > > writing to a shared Lucene index on an NFS partition...
> > >
> > > Here is the logging from one server...
> > >
> > > [10:49:18] [DEBUG] LuceneIndexAccessor closing cached writer
> > > [10:49:18] [DEBUG] ExpirationTimeDeletionPolicy onCommit() delete
> > > [segments_n]
> > >
> > > and the other server (at the same time):
> > >
> > > [10:49:18] [DEBUG] LuceneIndexAccessor opening new writer and caching it
> > > [10:49:18] [DEBUG] IndexAccessProvider getWriter()
> > > [10:49:18] [ERROR] DocumentCollection update(DocumentData)
> > > com.company.lucene.LuceneIcmException: I/O Error: Cannot add the
> > > document to the index.
> > > [/mnt/nfstest/repository/lucene/lucene-icm-test-1-0/segments_n (No
> > > such file or directory)]
> > >     at
> > >     com.company.lucene.RepositoryWriter.addDocument(RepositoryWriter.java:182)
> > >
> > > I think the exception is being thrown when the IndexWriter is created:
> > > new IndexWriter(directory, false, analyzer, false, deletionPolicy);
> > >
> > > I am confused... segments_n should not have been touched for 3 minutes
> > > so why would a new IndexWriter want to read it?
> >
> > Whenever a writer is opeened, it initializes the deleter
> > (IndexFileDeleter).  During that initialization, we list all files in
> > the index directory, and for every segments_N file we find, we open it
> > and "incref" all index files that it's using.  We then call the
> > deletion policy's "onInit" to give it a chance to remove any of these
> > commit points.
> >
> > What's happening here is the NFS directory listing is "stale" and is
> > reporting that segments_n exists when in fact it doesn't.  This is
> > almost certainly due to the NFS client's caching (directory listing
> > caches are in general not coherent for NFS clients, ie, they can "lie"
> > for a short period of time, especially in cases like this).
> >
> > I think this fix is fairly simple: we should catch the
> > FileNotFoundException and handle that as if the file did not exist.  I
> > will open a Jira issue & get a patch.
> >
> > Mike
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-user-help@lucene.apache.org
> >
> >
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message