lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael McCandless <luc...@mikemccandless.com>
Subject Lock-less commits
Date Fri, 18 Aug 2006 12:24:13 GMT

I think it's possible to modify Lucene's commit process so that it
does not require any commit locking at all.

This would be a big win because it would prevent all the various messy
errors (FileNotFound exceptions on instantiating an IndexReader,
Access Denied errors on renaming X.new -> X, Lock obtain timed out
from leftover lock files, etc.) that Lucene users keep coming across.

Also, indices against remote (NFS, Samba) filesystems, where current
locking has known issues that users seem to hit fairly often, would
then be fine.

I'd like to get feedback on this idea (am I missing something?) and if
there are no objections I can submit a full patch.

I have an initial implementation that passes all unit tests.  It also
runs fine with a writer/searcher stress test: the writer adding docs
to an index stored on NFS, and a multi-threaded reader on a separate
(Windows XP, mounted over Samba) machine continuously re-instantiating
an IndexSearcher and doing a search against the same index.

The basic idea is to change all commits (from SegmentReader or
IndexWriter) so that we never write to an existing file that a reader
could be reading from.  Instead, always write to a new file name using
sequentially numbered files.  For example, for "segments", on every
commit, write to a the sequence: segments.1, segments.2, segments.3,
etc.  Likewise for the *.del and *.fN (norms) files that
SegmentReaders write to.

Disk usage should be the same, even temporarily when merging, because
we still remove the old segments after merging.

We can also get rid of the "deletable" file (and associated errors
renaming deletable.new -> deletable) because we can compute what's
deletable according to "what's not referenced by current segments
file."

This means IndexReader, on opening an index, finds the most recent
segments file and loads it.  If, when loading the segments, it hits a
FileNotFound exception, and a newer segments file has appeared, it
re-tries against the new one.

This does entail small changes to the index file format.
Specifically, file names are different (they have new .N suffixes),
and, the contents of the segments file is expanded to contain details
about which del/norm files are current for each segment.

Note that the write lock is still needed to catch people accidentally
creating two writers on one index.  But since this lock file isn't
obtained/released as frequently as the current commit lock, I would
expect fewer issues from it.

This change should be fully backwards compatible, meaning the new code
would read the old index format and I believe existing APIs should not
change.  But, if there are applications (maybe Solr?) that peek inside
the index files expecting (for example) a file named "segments" to be
there then such cases would need to be fixed.

Mike

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message