lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael McCandless <>
Subject Re: lock path thoughts
Date Wed, 01 Nov 2006 01:40:50 GMT
Marvin Humphrey wrote:
> On Oct 31, 2006, at 11:47 AM, Doug Cutting wrote:
>> I think the need for that would disappear if the lockless commit patch 
>> gets committed.  Then there'd be no reason not to put lock files 
>> directly in the index directory, since only writers would need to lock 
>> things.
> Unless the index is on an NFS volume.  Then a Reader and a Writer can 
> come into conflict because delete-on-last-close isn't supported.  Some 
> sort of read lock would be handy.

Right, Lucene's nice "point in time" searching feature currently
relies on the filesystem semantics and NFS doesn't give us "delete on
last close".  This means searchers over NFS need to expect the "stale
NFS handle" IOException when searching and re-open.  This is true with
or without lock-less commits patch.

> One possibility is to extend our file-based locking system to read locks 
> by appending an integer increment to the lock-file name, so that we 
> could tell how many readers were live by how many read-lock files were 
> present.
> Maybe we could have such files and compare modification dates against 
> the incrementing segments.N files to identify which version of the index 
> a Reader was opened against?  Then, when it was time to delete files, 
> the writer could discern which files were no longer needed and zap 'em.
> One problem is that if a reader crashes, you don't get a fatal error -- 
> the only effect is that the Writer just stops deleting files.  Might be 
> other problems, too, but I thought I'd throw the idea out there.

I think this is one of the important things that lock-less commits
makes possible: implementing "point in time" searching explicitly
instead of relying on [rather variable] filesystem semantics.  The less
we can assume about the filesystem, the more portable Lucene is!

If we do this we could have different policies perhaps, ie, "save the
past M commits", or, "save any commits newer than N days", or "save
any commits that are in use by readers".  That last policy is indeed
tricky as to how the readers actually communicate to the writer that
they are still using "generation N".

I like your approach above.  If each reader writes to its own unique
file, and that file records (either by name or contents) which
segments.N that reader is using, then the writers would look for these
files and know what not to delete.  I think these can just be normal
files (ie not lock files)?  But the problem of a crashed reader is
important to fix.  Though if a given reader X always re-used the same
file name once it restarted then that should greatly decrease this.

An important (I think?) improvement of such an explicit approach would
be that readers could be re-opened against previous "point in time"
snapshots.  Whereas now when you open a reader you always get the most
recent commit.

Also note that this approach would leave more segments files in your
index.  However, no additional disk space will actually be consumed
because the way it works now disk space is still consumed too (until
the readers close and the file really gets deleted).


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message