incubator-lucy-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Marvin Humphrey <>
Subject Re: Snapshot
Date Tue, 24 Mar 2009 22:25:56 GMT
On Tue, Mar 24, 2009 at 07:57:48AM -0400, Michael McCandless wrote:

> Will it do the same write-once lockless approach (snapshot_N) that Lucene does?

More or less, I think.  Definitely I advocate embedding base-36 generation
numbers in the filenames.

The crucial innovation of lockless commits was the retry logic in, which depends on the snapshot generation numbers.  That
retry logic is in the KS prototype.  

However, I have found it difficult to stop the caught exception from leaking
memory in the event of a retry.  Hopefully we can fix that, but it's tricky.

> It still seems like storing per-segment metadata in the snapshot would
> be necessary/helpful.

As you surmised over in the "Segment" thread, that's in segmeta.json.  

> > Snapshot_Delete_Entry() does not delete the file from the index folder; all it
> > does is remove the filename from the next snapshot to be written.  Once the
> > new snapshot has been committed, it is possible to identify candidates for
> > deletion by determining which files are present in the old snapshot file but
> > gone from the new one.
> Are you just doing reference counting to determine deletable files?

Yes and no. The logic currently resides in a class called "FilePurger":

  * Don't delete any file listed in the most recent snapshot.
  * Don't delete any file listed in any snapshot file that's read-locked.

By default, Readers don't do any locking, so only the first part matters.  

If you turn on read-locking, the "is-this-snapshot-file-locked" test uses
reference counts in the form of numbered dot-lock files -- though you can
override the locking mechanism if you choose.

However, the Snapshot class itself is agnostic about that.  It's just a list
of files.

In a little while, I'll propose an "IndexManager" class from which all
merging and deletion policies flow.

> Will Lucy allow more than one snapshot to remain in the index?


(Perhaps that would have been clear in my original post had I remembered to
endorse the base-36 generation naming scheme.)

The Snapshot class is supposed to be very simple and flexible.  Logically
speaking, it's easy to leave more than one snapshot file around and to avoid
deleting any file that's listed in an active snapshot.

Marvin Humphrey

View raw message