lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Michael McCandless (JIRA)" <>
Subject [jira] Commented: (LUCENE-628) Intermittent FileNotFoundException for .fnm when using rsync
Date Tue, 18 Jul 2006 18:01:16 GMT
    [ ] 
Michael McCandless commented on LUCENE-628:

My best guess on what's happening here is, on one of your Searcher

  * rdist has copied over the new segments file but not yet the actual
    _1zm.cfs file

  * IndexSearcher is re-instantiated at this moment and reads this new
    segments file

  * IndexSearcher then tries to load the _1zm.cfs (referenced by the
    new segments file), but because it does not yet exist (rdist
    hasn't copied it yet), it falls back to non compound file
    (_1zm.fnm) which also does not exist, and hits that exception

The one thing that's odd in your traceback above is line 154 of is only used when there are more than 1 segment in
your index.  Are you allowing rdist to make a copy after IndexWriter
has added docs (and closed) but before optimize is called?  Otherwise
I can't explain why the index on your Searcher box has more than one

Note that there are two lock files on the Writer machine: the write
lock, held for a long time (whenever an IndexWriter is open), and the
commit lock, held briefly while a new segments file is written.

I think you need to change your approach to more correctly use
Lucene's locking:

  * On the Writer box, before rdist can run, it must hold (acquire)
    the write lock, for the full duration of the copy.  Just checking
    that the write lock file doesn't exist isn't generally sufficient
    because an IndexWriter may wake up and start changing things while
    your rdist is running (unless that can't happen in your current
    design, for example if from a single Java process you close the
    IndexWriter, run rdist, repeat).

  * On each Searcher box, before rdist can copy to it, you need to
    acquire the commit lock and hold it for the full duration of the
    copy, then release it.  Note that no IndexSearcher (IndexReader)
    can be instantiated during this time (it will block on commit lock
    acquire until the rdist copy is done).

Note that the Solr project: 

has an excellent solution for correctly distributing an index from
single Writer to multiple Searhcers (they call it "snaphots").  It
also uses rdist to move snapshots around.  You might want to try Solr,
or perhaps "borrow" it's approach, especially the neat "cp -l -r"
trick for quickly creating a snapshot of the index on the Writer

See also this recent thread that touched on similar issues:

> Intermittent FileNotFoundException for .fnm when using rsync
> ------------------------------------------------------------
>                 Key: LUCENE-628
>                 URL:
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Search
>    Affects Versions: 1.9
>         Environment: Linux RedHat ES3, Jboss402
>            Reporter: Simon Lorenz
> We use Lucene 1.9.1 to create and search indexes for web applications. The application
runs in Jboss402 on Redhat ES3. A single Master (Writer) Jboss instance creates and writes
the indexes using the compound file format , which is optimised after all updates. These index
files are replicated every few hours using rsync, to a number of other application servers
(Searchers). The rsync job only runs if there are no lucene lock files present on the Writer.
The Searcher servers that receive the replicated files, perform only searches on the index.
Up to 60 searches may be performed each minute. 
> Everything works well most of the time, but we get the following issue on the Searcher
servers about 10% of the time. 
> Following an rsync replication one or all of the Searcher server throws
> IOException caught when creating and IndexSearcher
> /..../_1zm.fnm (No such file or directory)
>         at Method)
>         at<init>(
>         at$Descriptor.<init>(
>         at<init>(
>         at
>         at org.apache.lucene.index.FieldInfos.<init>(
>         at org.apache.lucene.index.SegmentReader.initialize(
>         at org.apache.lucene.index.SegmentReader.get(
>         at org.apache.lucene.index.SegmentReader.get(
>         at org.apache.lucene.index.IndexReader$1.doBody(
>         at$
>         at  
> As we use the compound file format I would not expect .fnm files to be present. When
replicating, we do not delete the old .cfs index files as these could still be referenced
by old Searcher threads. We do overwrite the segments and deletable files on the Searcher
> My thoughts are: Either we are occasionally overwriting a file at the exact time a new
searcher is being created, or the lock files are removed from the Writer server before the
compaction process is completed, we then replicate a segments file that still references a
ghost .fnm file.
> I would greatly appreciate any ideas and suggestions to solve this annoying issue.

This message is automatically generated by JIRA.
If you think it was sent incorrectly contact one of the administrators:
For more information on JIRA, see:


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message