lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Michael McCandless (JIRA)" <j...@apache.org>
Subject [jira] Assigned: (LUCENE-673) Exceptions when using Lucene over NFS
Date Tue, 28 Nov 2006 22:30:23 GMT
     [ http://issues.apache.org/jira/browse/LUCENE-673?page=all ]

Michael McCandless reassigned LUCENE-673:
-----------------------------------------

    Assignee: Michael McCandless

> Exceptions when using Lucene over NFS
> -------------------------------------
>
>                 Key: LUCENE-673
>                 URL: http://issues.apache.org/jira/browse/LUCENE-673
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Index
>    Affects Versions: 2.0.0
>         Environment: NFS server/client
>            Reporter: Michael McCandless
>         Assigned To: Michael McCandless
>
> I'm opening this issue to track details on the known problems with
> Lucene over NFS.
> The summary is: if you have one machine writing to an index stored on
> an NFS mount, and other machine(s) reading (and periodically
> re-opening the index) then sometimes on re-opening the index the
> reader will hit a FileNotFound exception.
> This has hit many users because this is a natural way to "scale up"
> your searching (single writer, multiple readers) across machines.  The
> best current workaround (I think?) is to take the approach Solr takes
> (either by actually using Solr or copying/modifying its approach) to
> take snapshots of the index and then have the readers open the
> snapshots instead of the "live" index being written to.
> I've been working on two patches for Lucene:
>   * A locking (LockFactory) implementation using native OS locks
>   * Lock-less commits
> (I'll open separate issues with the details for those).
> I have a simple stress test where one machine is constantly adding
> docs to an index over NFS, and another machine is constantly
> re-opening the index searcher over NFS.
> These tests have revealed new details (at least for me!) about the
> root cause of our NFS problems:
>   * Even when using native locks over NFS, Lucene still hits these
>     exceptions!
>     I was surprised by this because I had always thought (assumed?)
>     the NFS problem was because the "simple" file-based locking was
>     not correct over NFS, and that switching to native OS filesystem
>     locking would resolve it, but it doesn't.
>     I can reproduce the "FileNotFound" exceptions even when using NFS
>     V4 (the latest NFS protocol), so this is not just a "your NFS
>     server is too old" issue.
>   * Then, when running the same stress test with the lock-less
>     changes, I don't hit any exceptions.  I've tested on NFS version
>     2, 3 and 4 (using the "nfsvers=N" mount option).
> I think this means that in fact (as Hoss at one point suggested I
> believe), the NFS problems are likely due to the cache coherence of
> the NFS file system (I think the "segments" file in particular)
> against the existence of the actual segment data files.
> In other words, even if you lock correctly, on the reader side it will
> sometimes see stale contents of the "segments" file which lead it to
> try to open a now deleted segment data file.
> So I think this is good news / bad news: the bad news is, native
> locking doesn't fix our problems with NFS (as at least I had expected
> it to).  But the good news is, it looks like (still need to do more
> thorough testing of this) the changes for lock-less commits do enable
> Lucene to work fine over NFS.
> [One quick side note in case it helps others: to get native locks
> working over NFS on Ubuntu/Debian Linux 6.06, I had to "apt-get
> install nfs-common" on the NFS client machines.  Before I did this I
> would hit "No locks available" IOExceptions on calling the "tryLock"
> method.  The default nfs server install on the server machine just
> worked because it runs in kernel mode and it start a lockd process.]

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message