Return-Path: Delivered-To: apmail-lucene-lucy-dev-archive@minotaur.apache.org Received: (qmail 97035 invoked from network); 24 Mar 2009 22:26:24 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 24 Mar 2009 22:26:24 -0000 Received: (qmail 22378 invoked by uid 500); 24 Mar 2009 22:26:24 -0000 Delivered-To: apmail-lucene-lucy-dev-archive@lucene.apache.org Received: (qmail 22314 invoked by uid 500); 24 Mar 2009 22:26:24 -0000 Mailing-List: contact lucy-dev-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: lucy-dev@lucene.apache.org Delivered-To: mailing list lucy-dev@lucene.apache.org Received: (qmail 22304 invoked by uid 99); 24 Mar 2009 22:26:24 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 24 Mar 2009 22:26:24 +0000 X-ASF-Spam-Status: No, hits=-0.0 required=10.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: local policy) Received: from [68.116.38.202] (HELO rectangular.com) (68.116.38.202) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 24 Mar 2009 22:26:18 +0000 Received: from marvin by rectangular.com with local (Exim 4.63) (envelope-from ) id 1LmF4G-0005VW-RP for lucy-dev@lucene.apache.org; Tue, 24 Mar 2009 15:25:56 -0700 Date: Tue, 24 Mar 2009 15:25:56 -0700 To: lucy-dev@lucene.apache.org Subject: Re: Snapshot Message-ID: <20090324222556.GA21092@rectangular.com> References: <20090323021034.GA2330@rectangular.com> <9ac0c6aa0903240457p4b6078a6if5b058a480f84c55@mail.gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <9ac0c6aa0903240457p4b6078a6if5b058a480f84c55@mail.gmail.com> User-Agent: Mutt/1.5.13 (2006-08-11) From: Marvin Humphrey X-Virus-Checked: Checked by ClamAV on apache.org On Tue, Mar 24, 2009 at 07:57:48AM -0400, Michael McCandless wrote: > Will it do the same write-once lockless approach (snapshot_N) that Lucene does? More or less, I think. Definitely I advocate embedding base-36 generation numbers in the filenames. The crucial innovation of lockless commits was the retry logic in IndexReader.open(), which depends on the snapshot generation numbers. That retry logic is in the KS prototype. However, I have found it difficult to stop the caught exception from leaking memory in the event of a retry. Hopefully we can fix that, but it's tricky. > It still seems like storing per-segment metadata in the snapshot would > be necessary/helpful. As you surmised over in the "Segment" thread, that's in segmeta.json. > > Snapshot_Delete_Entry() does not delete the file from the index folder; all it > > does is remove the filename from the next snapshot to be written. �Once the > > new snapshot has been committed, it is possible to identify candidates for > > deletion by determining which files are present in the old snapshot file but > > gone from the new one. > > Are you just doing reference counting to determine deletable files? Yes and no. The logic currently resides in a class called "FilePurger": * Don't delete any file listed in the most recent snapshot. * Don't delete any file listed in any snapshot file that's read-locked. By default, Readers don't do any locking, so only the first part matters. If you turn on read-locking, the "is-this-snapshot-file-locked" test uses reference counts in the form of numbered dot-lock files -- though you can override the locking mechanism if you choose. However, the Snapshot class itself is agnostic about that. It's just a list of files. In a little while, I'll propose an "IndexManager" class from which all merging and deletion policies flow. > Will Lucy allow more than one snapshot to remain in the index? Sure. (Perhaps that would have been clear in my original post had I remembered to endorse the base-36 generation naming scheme.) The Snapshot class is supposed to be very simple and flexible. Logically speaking, it's easy to leave more than one snapshot file around and to avoid deleting any file that's listed in an active snapshot. Marvin Humphrey