Return-Path: Delivered-To: apmail-lucene-java-dev-archive@www.apache.org Received: (qmail 7837 invoked from network); 1 Nov 2006 22:57:42 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 1 Nov 2006 22:57:42 -0000 Received: (qmail 33173 invoked by uid 500); 1 Nov 2006 22:57:50 -0000 Delivered-To: apmail-lucene-java-dev-archive@lucene.apache.org Received: (qmail 33135 invoked by uid 500); 1 Nov 2006 22:57:50 -0000 Mailing-List: contact java-dev-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-dev@lucene.apache.org Delivered-To: mailing list java-dev@lucene.apache.org Received: (qmail 33124 invoked by uid 99); 1 Nov 2006 22:57:50 -0000 Received: from herse.apache.org (HELO herse.apache.org) (140.211.11.133) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 01 Nov 2006 14:57:50 -0800 X-ASF-Spam-Status: No, hits=0.0 required=10.0 tests= X-Spam-Check-By: apache.org Received: from [140.211.11.4] (HELO brutus.apache.org) (140.211.11.4) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 01 Nov 2006 14:57:37 -0800 Received: from brutus (localhost [127.0.0.1]) by brutus.apache.org (Postfix) with ESMTP id CE19A7142C1 for ; Wed, 1 Nov 2006 14:57:17 -0800 (PST) Message-ID: <7297113.1162421837841.JavaMail.root@brutus> Date: Wed, 1 Nov 2006 14:57:17 -0800 (PST) From: "Yonik Seeley (JIRA)" To: java-dev@lucene.apache.org Subject: [jira] Commented: (LUCENE-701) Lock-less commits In-Reply-To: <9436339.1161992492586.JavaMail.root@brutus> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org [ http://issues.apache.org/jira/browse/LUCENE-701?page=comments#action_12446397 ] Yonik Seeley commented on LUCENE-701: ------------------------------------- Thanks for all the details Michael! A few more random comments and questions: In the future, it might be nice if there was an option to disable segments.gen to be more friendly to write-once filesystems like HDFS. As far as performance goes, I was personally interested in the contentionless case since that's what processes that coordinate everything (like Solr) will see. I'm not sure I understand the "segments.gen" logic of writing two longs that are identical. Looking at the code, it doesn't seem like you are implementing this: http://www.nabble.com/Re%3A-Lock-less-commits-p5978090.html Are there two longs instead of one in order to leave "space" for that implementation if needed, w/o having to change the file format? The file deleting code does much more than in the past, and that's a good thing IMO. For example it looks like leftover non-compound segment files from a failed CFS merge (say the JVM dies) will now be cleaned up! I'm having a hard time figuring out how older delete files are removed (since they contain the current segment name, it looks like findDeletableFiles would skip them). > Lock-less commits > ----------------- > > Key: LUCENE-701 > URL: http://issues.apache.org/jira/browse/LUCENE-701 > Project: Lucene - Java > Issue Type: Improvement > Components: Index > Affects Versions: 2.1 > Reporter: Michael McCandless > Assigned To: Michael McCandless > Priority: Minor > Attachments: index.prelockless.cfs.zip, index.prelockless.nocfs.zip, lockless-commits-patch.txt > > > This is a patch based on discussion a while back on lucene-dev: > http://mail-archives.apache.org/mod_mbox/lucene-java-dev/200608.mbox/%3c44E5B16D.4010805@mikemccandless.com%3e > The approach is a small modification over the original discussion (see > Retry Logic below). It works correctly in all my cross-machine test > case, but I want to open it up for feedback, testing by > users/developers in more diverse environments, etc. > This is a small change to how lucene stores its index that enables > elimination of the commit lock entirely. The write lock still > remains. > Of the two, the commit lock has been more troublesome for users since > it typically serves an active role in production. Whereas the write > lock is usually more of a design check to make sure you only have one > writer against the index at a time. > The basic idea is that filenames are never reused ("write once"), > meaning, a writer never writes to a file that a reader may be reading > (there is one exception: the segments.gen file; see "RETRY LOGIC" > below). Instead it writes to generational files, ie, segments_1, then > segments_2, etc. Besides the segments file, the .del files and norm > files (.sX suffix) are also now generational. A generation is stored > as an "_N" suffix before the file extension (eg, _p_4.s0 is the > separate norms file for segment "p", generation 4). > One important benefit of this is it avoids files contents caching > entirely (the likely cause of errors when readers open an index > mounted on NFS) since the file is always a new file. > With this patch I can reliably instantiate readers over NFS when a > writer is writing to the index. However, with NFS, you are still forced to > refresh your reader once a writer has committed because "point in > time" searching doesn't work over NFS (see LUCENE-673 ). > The changes are fully backwards compatible: you can open an old index > for searching, or to add/delete docs, etc. I've added a new unit test > to test these cases. > All units test pass, and I've added a number of additional unit tests, > some of which fail on WIN32 in the current lucene but pass with this > patch. The "fileformats.xml" has been updated to describe the changes > to the files (but XXX references need to be fixed before committing). > There are some other important benefits: > * Readers are now entirely read-only. > * Readers no longer block one another (false contention) on > initialization. > * On hitting contention, we immediately retry instead of a fixed > (default 1.0 second now) pause. > * No file renaming is ever done. File renaming has caused sneaky > access denied errors on WIN32 (see LUCENE-665 ). (Yonik, I used > your approach here to not rename the segments_N file(try > segments_(N-1) on hitting IOException on segments_N): the separate > ".done" file did not work reliably under very high stress testing > when a directory listing was not "point in time"). > * On WIN32, you can now call IndexReader.setNorm() even if other > readers have the index open (fixes a pre-existing minor bug in > Lucene). > * On WIN32, You can now create an IndexWriter with create=true even > if readers have the index open (eg see > www.gossamer-threads.com/lists/lucene/java-user/39265) . > Here's an overview of the changes: > * Every commit writes to the next segments_(N+1). > * Loading the segments_N file (& opening the segments) now requires > retry logic. I've captured this logic into a new static class: > SegmentInfos.FindSegmentsFile. All places that need to do > something on the current segments file now use this class. > * No more deletable file. Instead, the writer computes what's > deletable on instantiation and updates this in memory whenever > files can be deleted (ie, when it commits). Created a common > class index.IndexFileDeleter shared by reader & writer, to manage > deletes. > * Storing more information into segments info file: whether it has > separate deletes (and which generation), whether it has separate > norms, per field (and which generation), whether it's compound or > not. This is instead of relying on IO operations (file exists > calls). Note that this fixes the current misleading > FileNotFoundException users now see when an _X.cfs file is missing > (eg http://www.nabble.com/FileNotFound-Exception-t6987.html). > * Fixed some small things about RAMDirectory that were not > filesystem-like (eg opening a non-existent IndexInput failed to > raise IOException; renames were not atomic). I added a stress > test against a RAMDirectory (1 writer thread & 2 reader threads) > that uncovered these. > * Added option to not remove old files when create=true on creating > FSDirectory; this is so the writer can do its own [more > sophisticated because it retries on errors] removal. > * Removed all references to commit lock, COMMIT_LOCK_TIMEOUT, etc. > (This is an API change). > * Extended index/IndexFileNames.java and index/IndexFileNameFilter.java > with logic for computing generational file names. > * Changed index/IndexFileNameFilter.java to use a HashSet to check > file extentsions for better performance. > * Fixed the test case TestIndexReader.testLastModified: it was > incorrectly (I think?) comparing lastModified to version, of the > index. I fixed that and then added a new test case for version. > Retry Logic (in index/SegmentInfos.java) > If a reader tries to load the segments just as a writer is committing, > it may hit an IOException. This is just normal contention. In > current Lucene contention causes a [default] 1.0 second pause then > retry. With lock-less the contention causes no added delay beyond the > time to retry. > When this happens, we first try segments_(N-1) if present, because it > could be segments_N is still being written. If that fails, we > re-check to see if there is now a newer segments_M where M > N and > advance if so. Else we retry segments_N once more (since it could be > it was in process previously but must now be complete since > segments_(N-1) did not load). > In order to find the current segments_N file, I list the directory and > take the biggest segments_N that exists. > However, under extreme stress testing (5 threads just opening & > closing readers over and over), on one platform (OS X) I found that > the directory listing can be incorrect (stale) by up to 1.0 seconds. > This means the listing will show a segments_N file but that file does > not exist (fileExists() returns false). > In order to handle this (and other such platforms), I switched to a > hybrid approach (originally proposed by Doron Cohen in the original > thread): on committing, the writer writes to a file "segments.gen" the > generation it just committed. It writes 2 identical longs into this > file. The retry logic, on detecting that the directory listing is > stale falls back to the contents of this file. If that file is > consistent (the two longs are identical), and, the generation is > indeed newer than the dir listing, it will use that. > Finally, if this approach is also stale, we fallback to stepping > through sequential generations (up to a maximum # tries). If all 3 > methods fail, we throw the original exception we hit. > I added a static method SegmentInfos.setInfoStream() which will print > details of retry attempts. In the patch it's set to System.out right > now (we should turn off before a real commit) so if there are problems > we can see what retry logic had done. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org For additional commands, e-mail: java-dev-help@lucene.apache.org