lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From robert engels <>
Subject Re: Lock-less commits
Date Fri, 18 Aug 2006 18:51:31 GMT
i don't think these changes are going to work. With multiple writers  
and or readers doing deletes, without serializing the writes you  
will  have inconsistencies - and the del files will need to be unioned.

That is:

station A opens the index
station B opens the index
station A deletes some documents creating segment.del1
station B deletes some documents creating segment.del2

when station C opens the index (or when the segment is merged) del1  
and del2 need to be merged.

The locking enforces that writers are serialized - you cannot remove  
this restriction unless you merge the writes when reading.

On Aug 18, 2006, at 1:41 PM, Michael McCandless wrote:

>>> It could in theory lead to starvation but this should be rare in
>>> practice unless you have an IndexWriter that's constantly  
>>> committing.
>> An index with a small mergeFactor (say 2) and a small maxBufferedDocs
>> (default 10), would have segments deleted every
>> mergeFactor*maxBufferedDocs when rapidly adding documents.  It might
>> help to start opening segments with the *last* segment, where segment
>> deletions are most likely to happen.
> That is true.  I like the idea of opening last segments first --  
> I'll do
> that.
>> Also, when loading a .del file, how would one tell if it didn't exist
>> or if it was just deleted?
>> I guess one would always need to write a .del file even if no docs
>> were deleted.  Or, one could just order the deletes (delete optional
>> files in a segment last).
> Right, in order to handle this, I've modified the segments file to
> also contain the current "generation" (the .N suffix) of each
> segment's .del & norms suffixes.  This way when SegmentReader reads
> the segment, it knows exactly which del/norms files it's supposed to
> find.  For "doUndeleteAll()" I write a zero-length .del.N+1 file.
> SegmentReader is already writing a new segments file when it commits
> (in today's code).
>> One would also have to worry about partially deleted segments on
>> Windows... while removing a segment, some of the files might fail to
>> delete (due to still being open) and some might succeed.
> Yes, I think this case is handled correctly.  Once all searchers using
> those old segments are closed, then the next commit that runs will
> remove those files (just like it does today).
> Not having to read/write the deletable file should make things more
> robust (there was a thread recently on users list about hitting an
> exception because couldn't be deleted on Windows).
>> This idea is worth kicking around more for the future (maybe for when
>> the index format changes again), but it's probably too much change  
>> for
>> right now (Lucene 2.0.x), right?
> Yes I don't think this should go in for a 2.0.x point release.  Maybe
> for a 2.1.x?  Or I guess whenever we next have a major enough release
> to allow changing of the index format.
> I do think the benefits are sizable, though, so we should not wait too
> too long :) The number of poor people who post to the users list with
> errant Access Denied, FileNotFound, lock obtain timed out, etc.,
> exceptions is quite large.  There was just one today that I'm going to
> go try to respond to next.  Plus the prospect of working just fine on
> remote filesystems is great!
> OK I will keep working through this & running stress tests on it to
> see if I can uncover any issues...
> Mike
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message