Return-Path:
The optional autoCommit
argument to the
- constructors
- controls visibility of the changes to {@link IndexReader} instances reading the same index.
- When this is false
, changes are not
- visible until {@link #close()} is called.
- Note that changes will still be flushed to the
- {@link org.apache.lucene.store.Directory} as new files,
- but are not committed (no new segments_N
file
- is written referencing the new files) until {@link #close} is
- called. If something goes terribly wrong (for example the
- JVM crashes) before {@link #close()}, then
- the index will reflect none of the changes made (it will
- remain in its starting state).
- You can also call {@link #abort()}, which closes the writer without committing any
- changes, and removes any index
+
[Deprecated: Note that in 3.0, IndexWriter will
+ no longer accept autoCommit=true (it will be hardwired to
+ false). You can always call {@link IndexWriter#commit()} yourself
+ when needed]. The optional autoCommit
argument to the constructors
+ controls visibility of the changes to {@link IndexReader}
+ instances reading the same index. When this is
+ false
, changes are not visible until {@link
+ #close()} is called. Note that changes will still be
+ flushed to the {@link org.apache.lucene.store.Directory}
+ as new files, but are not committed (no new
+ segments_N
file is written referencing the
+ new files, nor are the files sync'd to stable storage)
+ until {@link #commit} or {@link #close} is called. If something
+ goes terribly wrong (for example the JVM crashes), then
+ the index will reflect none of the changes made since the
+ last commit, or the starting state if commit was not called.
+ You can also call {@link #abort}, which closes the writer
+ without committing any changes, and removes any index
files that had been flushed but are now unreferenced.
This mode is useful for preventing readers from refreshing
at a bad time (for example after you've done all your
- deletes but before you've done your adds).
- It can also be used to implement simple single-writer
- transactional semantics ("all or none").
When autoCommit
is true
then
- every flush is also a commit ({@link IndexReader}
- instances will see each flush as changes to the index).
- This is the default, to match the behavior before 2.2.
- When running in this mode, be careful not to refresh your
+ the writer will periodically commit on its own. This is
+ the default, to match the behavior before 2.2. However,
+ in 3.0, autoCommit will be hardwired to false. There is
+ no guarantee when exactly an auto commit will occur (it
+ used to be after every flush, but it is now after every
+ completed merge, as of 2.4). If you want to force a
+ commit, call {@link #commit}, or, close the writer. Once
+ a commit has finished, ({@link IndexReader} instances will
+ see the changes to the index as of that commit. When
+ running in this mode, be careful not to refresh your
readers while optimize or segment merges are taking place
as this can tie up substantial disk space.
path
.
* Text will be analyzed with a
. If create
* is true, then a new, empty index will be created in
- * path
, replacing the index already there, if any.
+ * path
, replacing the index already there,
+ * if any. Note that autoCommit defaults to true, but
+ * starting in 3.0 it will be hardwired to false.
*
* @param path the path to the index directory
* @param a the analyzer to use
@@ -487,6 +526,8 @@
* Text will be analyzed with a
. If create
* is true, then a new, empty index will be created in
* path
, replacing the index already there, if any.
+ * Note that autoCommit defaults to true, but starting in 3.0
+ * it will be hardwired to false.
*
* @param path the path to the index directory
* @param a the analyzer to use
@@ -541,6 +582,8 @@
* Text will be analyzed with a
. If create
* is true, then a new, empty index will be created in
* d
, replacing the index already there, if any.
+ * Note that autoCommit defaults to true, but starting in 3.0
+ * it will be hardwired to false.
*
* @param d the index directory
* @param a the analyzer to use
@@ -595,6 +638,8 @@
* path
, first creating it if it does not
* already exist. Text will be analyzed with
* a
.
+ * Note that autoCommit defaults to true, but starting in 3.0
+ * it will be hardwired to false.
*
* @param path the path to the index directory
* @param a the analyzer to use
@@ -641,6 +686,8 @@
* path
, first creating it if it does not
* already exist. Text will be analyzed with
* a
.
+ * Note that autoCommit defaults to true, but starting in 3.0
+ * it will be hardwired to false.
*
* @param path the path to the index directory
* @param a the analyzer to use
@@ -687,6 +734,8 @@
* d
, first creating it if it does not
* already exist. Text will be analyzed with
* a
.
+ * Note that autoCommit defaults to true, but starting in 3.0
+ * it will be hardwired to false.
*
* @param d the index directory
* @param a the analyzer to use
@@ -746,6 +795,10 @@
* @throws IOException if the directory cannot be
* read/written to or if there is any other low-level
* IO error
+ * @deprecated This will be removed in 3.0, when
+ * autoCommit will be hardwired to false. Use {@link
+ * #IndexWriter(Directory,Analyzer,MaxFieldLength)}
+ * instead, and call {@link #commit} when needed.
*/
public IndexWriter(Directory d, boolean autoCommit, Analyzer a, MaxFieldLength mfl)
throws CorruptIndexException, LockObtainFailedException, IOException {
@@ -798,6 +851,10 @@
* if it does not exist and create
is
* false
or if there is any other low-level
* IO error
+ * @deprecated This will be removed in 3.0, when
+ * autoCommit will be hardwired to false. Use {@link
+ * #IndexWriter(Directory,Analyzer,boolean,MaxFieldLength)}
+ * instead, and call {@link #commit} when needed.
*/
public IndexWriter(Directory d, boolean autoCommit, Analyzer a, boolean create, MaxFieldLength mfl)
throws CorruptIndexException, LockObtainFailedException, IOException {
@@ -837,6 +894,31 @@
* IndexDeletionPolicy}, for the index in d
,
* first creating it if it does not already exist. Text
* will be analyzed with a
.
+ * Note that autoCommit defaults to true, but starting in 3.0
+ * it will be hardwired to false.
+ *
+ * @param d the index directory
+ * @param a the analyzer to use
+ * @param deletionPolicy see above
+ * @param mfl whether or not to limit field lengths
+ * @throws CorruptIndexException if the index is corrupt
+ * @throws LockObtainFailedException if another writer
+ * has this index open (write.lock
could not
+ * be obtained)
+ * @throws IOException if the directory cannot be
+ * read/written to or if there is any other low-level
+ * IO error
+ */
+ public IndexWriter(Directory d, Analyzer a, IndexDeletionPolicy deletionPolicy, MaxFieldLength mfl)
+ throws CorruptIndexException, LockObtainFailedException, IOException {
+ init(d, a, false, deletionPolicy, true, mfl.getLimit());
+ }
+
+ /**
+ * Expert: constructs an IndexWriter with a custom {@link
+ * IndexDeletionPolicy}, for the index in d
,
+ * first creating it if it does not already exist. Text
+ * will be analyzed with a
.
*
* @param d the index directory
* @param autoCommit see above
@@ -851,6 +933,10 @@
* @throws IOException if the directory cannot be
* read/written to or if there is any other low-level
* IO error
+ * @deprecated This will be removed in 3.0, when
+ * autoCommit will be hardwired to false. Use {@link
+ * #IndexWriter(Directory,Analyzer,IndexDeletionPolicy,MaxFieldLength)}
+ * instead, and call {@link #commit} when needed.
*/
public IndexWriter(Directory d, boolean autoCommit, Analyzer a, IndexDeletionPolicy deletionPolicy, MaxFieldLength mfl)
throws CorruptIndexException, LockObtainFailedException, IOException {
@@ -889,6 +975,37 @@
* create
is true, then a new, empty index
* will be created in d
, replacing the index
* already there, if any.
+ * Note that autoCommit defaults to true, but starting in 3.0
+ * it will be hardwired to false.
+ *
+ * @param d the index directory
+ * @param a the analyzer to use
+ * @param create true
to create the index or overwrite
+ * the existing one; false
to append to the existing
+ * index
+ * @param deletionPolicy see above
+ * @param mfl whether or not to limit field lengths
+ * @throws CorruptIndexException if the index is corrupt
+ * @throws LockObtainFailedException if another writer
+ * has this index open (write.lock
could not
+ * be obtained)
+ * @throws IOException if the directory cannot be read/written to, or
+ * if it does not exist and create
is
+ * false
or if there is any other low-level
+ * IO error
+ */
+ public IndexWriter(Directory d, Analyzer a, boolean create, IndexDeletionPolicy deletionPolicy, MaxFieldLength mfl)
+ throws CorruptIndexException, LockObtainFailedException, IOException {
+ init(d, a, create, false, deletionPolicy, true, mfl.getLimit());
+ }
+
+ /**
+ * Expert: constructs an IndexWriter with a custom {@link
+ * IndexDeletionPolicy}, for the index in d
.
+ * Text will be analyzed with a
. If
+ * create
is true, then a new, empty index
+ * will be created in d
, replacing the index
+ * already there, if any.
*
* @param d the index directory
* @param autoCommit see above
@@ -907,6 +1024,10 @@
* if it does not exist and create
is
* false
or if there is any other low-level
* IO error
+ * @deprecated This will be removed in 3.0, when
+ * autoCommit will be hardwired to false. Use {@link
+ * #IndexWriter(Directory,Analyzer,boolean,IndexDeletionPolicy,MaxFieldLength)}
+ * instead, and call {@link #commit} when needed.
*/
public IndexWriter(Directory d, boolean autoCommit, Analyzer a, boolean create, IndexDeletionPolicy deletionPolicy, MaxFieldLength mfl)
throws CorruptIndexException, LockObtainFailedException, IOException {
@@ -984,15 +1105,22 @@
} catch (IOException e) {
// Likely this means it's a fresh directory
}
- segmentInfos.write(directory);
+ segmentInfos.commit(directory);
} else {
segmentInfos.read(directory);
+
+ // We assume that this segments_N was previously
+ // properly sync'd:
+ for(int i=0;iautoCommit=false
, flushed data would still
- * not be visible to readers, until {@link #close} is called.
+ * Note: while this will force buffered docs to be + * pushed into the index, it will not make these docs + * visible to a reader. Use {@link #commit} instead * @throws CorruptIndexException if the index is corrupt * @throws IOException if there is a low-level IO error + * @deprecated please call {@link #commit}) instead */ public final void flush() throws CorruptIndexException, IOException { flush(true, false); } /** + *
Commits all pending updates (added & deleted documents) + * to the index, and syncs all referenced index files, + * such that a reader will see the changes. Note that + * this does not wait for any running background merges to + * finish. This may be a costly operation, so you should + * test the cost in your application and do it only when + * really necessary.
+ * + *Note that this operation calls Directory.sync on + * the index files. That call should not return until the + * file contents & metadata are on stable storage. For + * FSDirectory, this calls the OS's fsync. But, beware: + * some hardware devices may in fact cache writes even + * during fsync, and return before the bits are actually + * on stable storage, to give the appearance of faster + * performance. If you have such a device, and it does + * not have a battery backup (for example) then on power + * loss it may still lose data. Lucene cannot guarantee + * consistency on such devices.
+ */ + public final void commit() throws CorruptIndexException, IOException { + commit(true); + } + + private final void commit(boolean triggerMerges) throws CorruptIndexException, IOException { + flush(triggerMerges, true); + sync(true, 0); + } + + /** * Flush all in-memory buffered udpates (adds and deletes) * to the Directory. * @param triggerMerge if true, we may merge segments (if @@ -2681,10 +2852,15 @@ maybeMerge(); } + // TODO: this method should not have to be entirely + // synchronized, ie, merges should be allowed to commit + // even while a flush is happening private synchronized final boolean doFlush(boolean flushDocStores) throws CorruptIndexException, IOException { // Make sure no threads are actively adding a document + flushCount++; + // Returns true if docWriter is currently aborting, in // which case we skip flushing this segment if (docWriter.pauseAllThreads()) { @@ -2717,10 +2893,18 @@ // apply to more than just the last flushed segment boolean flushDeletes = docWriter.hasDeletes(); + int docStoreOffset = docWriter.getDocStoreOffset(); + + // docStoreOffset should only be non-zero when + // autoCommit == false + assert !autoCommit || 0 == docStoreOffset; + + boolean docStoreIsCompoundFile = false; + if (infoStream != null) { message(" flush: segment=" + docWriter.getSegment() + " docStoreSegment=" + docWriter.getDocStoreSegment() + - " docStoreOffset=" + docWriter.getDocStoreOffset() + + " docStoreOffset=" + docStoreOffset + " flushDocs=" + flushDocs + " flushDeletes=" + flushDeletes + " flushDocStores=" + flushDocStores + @@ -2729,14 +2913,6 @@ message(" index before flush " + segString()); } - int docStoreOffset = docWriter.getDocStoreOffset(); - - // docStoreOffset should only be non-zero when - // autoCommit == false - assert !autoCommit || 0 == docStoreOffset; - - boolean docStoreIsCompoundFile = false; - // Check if the doc stores must be separately flushed // because other segments, besides the one we are about // to flush, reference it @@ -2754,60 +2930,63 @@ // If we are flushing docs, segment must not be null: assert segment != null || !flushDocs; - if (flushDocs || flushDeletes) { - - SegmentInfos rollback = null; - - if (flushDeletes) - rollback = (SegmentInfos) segmentInfos.clone(); + if (flushDocs) { boolean success = false; + final int flushedDocCount; try { - if (flushDocs) { - - if (0 == docStoreOffset && flushDocStores) { - // This means we are flushing private doc stores - // with this segment, so it will not be shared - // with other segments - assert docStoreSegment != null; - assert docStoreSegment.equals(segment); - docStoreOffset = -1; - docStoreIsCompoundFile = false; - docStoreSegment = null; - } - - int flushedDocCount = docWriter.flush(flushDocStores); - - newSegment = new SegmentInfo(segment, - flushedDocCount, - directory, false, true, - docStoreOffset, docStoreSegment, - docStoreIsCompoundFile); - segmentInfos.addElement(newSegment); - } - - if (flushDeletes) { - // we should be able to change this so we can - // buffer deletes longer and then flush them to - // multiple flushed segments, when - // autoCommit=false - applyDeletes(flushDocs); - doAfterFlush(); - } - - checkpoint(); + flushedDocCount = docWriter.flush(flushDocStores); success = true; } finally { if (!success) { - if (infoStream != null) message("hit exception flushing segment " + segment); - - if (flushDeletes) { + docWriter.abort(null); + deleter.refresh(segment); + } + } + + if (0 == docStoreOffset && flushDocStores) { + // This means we are flushing private doc stores + // with this segment, so it will not be shared + // with other segments + assert docStoreSegment != null; + assert docStoreSegment.equals(segment); + docStoreOffset = -1; + docStoreIsCompoundFile = false; + docStoreSegment = null; + } + + // Create new SegmentInfo, but do not add to our + // segmentInfos until deletes are flushed + // successfully. + newSegment = new SegmentInfo(segment, + flushedDocCount, + directory, false, true, + docStoreOffset, docStoreSegment, + docStoreIsCompoundFile); + } + + if (flushDeletes) { + try { + SegmentInfos rollback = (SegmentInfos) segmentInfos.clone(); - // Carefully check if any partial .del files - // should be removed: + boolean success = false; + try { + // we should be able to change this so we can + // buffer deletes longer and then flush them to + // multiple flushed segments only when a commit() + // finally happens + applyDeletes(newSegment); + success = true; + } finally { + if (!success) { + if (infoStream != null) + message("hit exception flushing deletes"); + + // Carefully remove any partially written .del + // files final int size = rollback.size(); for(int i=0;i- 2.3 and above: + 2.3: Segments --> Format, Version, NameCounter, SegCount, <SegName, SegSize, DelGen, DocStoreOffset, [DocStoreSegment, DocStoreIsCompoundFile], HasSingleNormFile, NumField, NormGenNumField, IsCompoundFile>SegCount
++ 2.4 and above: + Segments --> Format, Version, NameCounter, SegCount, <SegName, SegSize, DelGen, DocStoreOffset, [DocStoreSegment, DocStoreIsCompoundFile], HasSingleNormFile, NumField, + NormGenNumField, + IsCompoundFile>SegCount, Checksum +
Format, NameCounter, SegCount, SegSize, NumField, DocStoreOffset --> Int32
- Version, DelGen, NormGen --> Int64 + Version, DelGen, NormGen, Checksum --> Int64
@@ -842,7 +848,7 @@
- Format is -1 as of Lucene 1.4, -3 (SegmentInfos.FORMAT_SINGLE_NORM_FILE) as of Lucene 2.1 and 2.2, and -4 (SegmentInfos.FORMAT_SHARED_DOC_STORE) as of Lucene 2.3 + Format is -1 as of Lucene 1.4, -3 (SegmentInfos.FORMAT_SINGLE_NORM_FILE) as of Lucene 2.1 and 2.2, -4 (SegmentInfos.FORMAT_SHARED_DOC_STORE) as of Lucene 2.3 and -5 (SegmentInfos.FORMAT_CHECKSUM) as of Lucene 2.4.
@@ -925,6 +931,13 @@ shares a single set of these files with other segments.
+ ++ Checksum contains the CRC32 checksum of all bytes + in the segments_N file up until the checksum. + This is used to verify integrity of the file on + opening the index. +
Modified: lucene/java/trunk/src/test/org/apache/lucene/index/TestAtomicUpdate.java URL: http://svn.apache.org/viewvc/lucene/java/trunk/src/test/org/apache/lucene/index/TestAtomicUpdate.java?rev=620576&r1=620575&r2=620576&view=diff ============================================================================== --- lucene/java/trunk/src/test/org/apache/lucene/index/TestAtomicUpdate.java (original) +++ lucene/java/trunk/src/test/org/apache/lucene/index/TestAtomicUpdate.java Mon Feb 11 10:56:09 2008 @@ -20,12 +20,8 @@ import org.apache.lucene.store.*; import org.apache.lucene.document.*; import org.apache.lucene.analysis.*; -import org.apache.lucene.index.*; import org.apache.lucene.search.*; import org.apache.lucene.queryParser.*; -import org.apache.lucene.util._TestUtil; - -import org.apache.lucene.util.LuceneTestCase; import java.util.Random; import java.io.File; @@ -83,7 +79,6 @@ // Update all 100 docs... for(int i=0; i<100; i++) { Document d = new Document(); - int n = RANDOM.nextInt(); d.add(new Field("id", Integer.toString(i), Field.Store.YES, Field.Index.UN_TOKENIZED)); d.add(new Field("contents", English.intToEnglish(i+10*count), Field.Store.NO, Field.Index.TOKENIZED)); writer.updateDocument(new Term("id", Integer.toString(i)), d); @@ -127,7 +122,7 @@ d.add(new Field("contents", English.intToEnglish(i), Field.Store.NO, Field.Index.TOKENIZED)); writer.addDocument(d); } - writer.flush(); + writer.commit(); IndexerThread indexerThread = new IndexerThread(writer, threads); threads[0] = indexerThread; Modified: lucene/java/trunk/src/test/org/apache/lucene/index/TestBackwardsCompatibility.java URL: http://svn.apache.org/viewvc/lucene/java/trunk/src/test/org/apache/lucene/index/TestBackwardsCompatibility.java?rev=620576&r1=620575&r2=620576&view=diff ============================================================================== --- lucene/java/trunk/src/test/org/apache/lucene/index/TestBackwardsCompatibility.java (original) +++ lucene/java/trunk/src/test/org/apache/lucene/index/TestBackwardsCompatibility.java Mon Feb 11 10:56:09 2008 @@ -349,7 +349,6 @@ IndexWriter writer = new IndexWriter(dir, autoCommit, new WhitespaceAnalyzer(), true, IndexWriter.MaxFieldLength.LIMITED); writer.setRAMBufferSizeMB(16.0); - //IndexWriter writer = new IndexWriter(dir, new WhitespaceAnalyzer(), true); for(int i=0;i<35;i++) { addDoc(writer, i); } @@ -390,11 +389,8 @@ expected = new String[] {"_0.cfs", "_0_1.del", "_0_1.s" + contentFieldIndex, - "segments_4", + "segments_3", "segments.gen"}; - - if (!autoCommit) - expected[3] = "segments_3"; String[] actual = dir.list(); Arrays.sort(expected); Added: lucene/java/trunk/src/test/org/apache/lucene/index/TestCrash.java URL: http://svn.apache.org/viewvc/lucene/java/trunk/src/test/org/apache/lucene/index/TestCrash.java?rev=620576&view=auto ============================================================================== --- lucene/java/trunk/src/test/org/apache/lucene/index/TestCrash.java (added) +++ lucene/java/trunk/src/test/org/apache/lucene/index/TestCrash.java Mon Feb 11 10:56:09 2008 @@ -0,0 +1,181 @@ +package org.apache.lucene.index; + +/** + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +import java.io.IOException; + +import org.apache.lucene.util.LuceneTestCase; +import org.apache.lucene.analysis.WhitespaceAnalyzer; +import org.apache.lucene.store.MockRAMDirectory; +import org.apache.lucene.store.NoLockFactory; +import org.apache.lucene.document.Document; +import org.apache.lucene.document.Field; + +public class TestCrash extends LuceneTestCase { + + private IndexWriter initIndex() throws IOException { + return initIndex(new MockRAMDirectory()); + } + + private IndexWriter initIndex(MockRAMDirectory dir) throws IOException { + dir.setLockFactory(NoLockFactory.getNoLockFactory()); + + IndexWriter writer = new IndexWriter(dir, new WhitespaceAnalyzer()); + //writer.setMaxBufferedDocs(2); + writer.setMaxBufferedDocs(10); + ((ConcurrentMergeScheduler) writer.getMergeScheduler()).setSuppressExceptions(); + + Document doc = new Document(); + doc.add(new Field("content", "aaa", Field.Store.YES, Field.Index.TOKENIZED)); + doc.add(new Field("id", "0", Field.Store.YES, Field.Index.TOKENIZED)); + for(int i=0;i<157;i++) + writer.addDocument(doc); + + return writer; + } + + private void crash(final IndexWriter writer) throws IOException { + final MockRAMDirectory dir = (MockRAMDirectory) writer.getDirectory(); + ConcurrentMergeScheduler cms = (ConcurrentMergeScheduler) writer.getMergeScheduler(); + dir.crash(); + cms.sync(); + dir.clearCrash(); + } + + public void testCrashWhileIndexing() throws IOException { + IndexWriter writer = initIndex(); + MockRAMDirectory dir = (MockRAMDirectory) writer.getDirectory(); + crash(writer); + IndexReader reader = IndexReader.open(dir); + assertTrue(reader.numDocs() < 157); + } + + public void testWriterAfterCrash() throws IOException { + IndexWriter writer = initIndex(); + MockRAMDirectory dir = (MockRAMDirectory) writer.getDirectory(); + dir.setPreventDoubleWrite(false); + crash(writer); + writer = initIndex(dir); + writer.close(); + + IndexReader reader = IndexReader.open(dir); + assertTrue(reader.numDocs() < 314); + } + + public void testCrashAfterReopen() throws IOException { + IndexWriter writer = initIndex(); + MockRAMDirectory dir = (MockRAMDirectory) writer.getDirectory(); + writer.close(); + writer = initIndex(dir); + assertEquals(314, writer.docCount()); + crash(writer); + + /* + System.out.println("\n\nTEST: open reader"); + String[] l = dir.list(); + Arrays.sort(l); + for(int i=0;i