Return-Path: Delivered-To: apmail-lucene-java-commits-archive@www.apache.org Received: (qmail 5860 invoked from network); 24 Nov 2010 19:48:48 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 24 Nov 2010 19:48:48 -0000 Received: (qmail 86956 invoked by uid 500); 24 Nov 2010 19:49:20 -0000 Delivered-To: apmail-lucene-java-commits-archive@lucene.apache.org Received: (qmail 86913 invoked by uid 500); 24 Nov 2010 19:49:20 -0000 Mailing-List: contact java-commits-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-dev@lucene.apache.org Delivered-To: mailing list java-commits@lucene.apache.org Received: (qmail 86906 invoked by uid 99); 24 Nov 2010 19:49:20 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 24 Nov 2010 19:49:20 +0000 X-ASF-Spam-Status: No, hits=-2000.0 required=10.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.4] (HELO eris.apache.org) (140.211.11.4) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 24 Nov 2010 19:49:16 +0000 Received: by eris.apache.org (Postfix, from userid 65534) id C3EB7238890D; Wed, 24 Nov 2010 19:47:43 +0000 (UTC) Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Subject: svn commit: r1038783 - in /lucene/java/branches/lucene_3_0: ./ contrib/benchmark/src/test/org/apache/lucene/benchmark/byTask/ src/java/org/apache/lucene/index/ src/test/org/apache/lucene/index/ Date: Wed, 24 Nov 2010 19:47:43 -0000 To: java-commits@lucene.apache.org From: mikemccand@apache.org X-Mailer: svnmailer-1.0.8 Message-Id: <20101124194743.C3EB7238890D@eris.apache.org> X-Virus-Checked: Checked by ClamAV on apache.org Author: mikemccand Date: Wed Nov 24 19:47:43 2010 New Revision: 1038783 URL: http://svn.apache.org/viewvc?rev=1038783&view=rev Log: LUCENE-2773: don't build compound files for large merged segments (by default) Modified: lucene/java/branches/lucene_3_0/CHANGES.txt lucene/java/branches/lucene_3_0/common-build.xml lucene/java/branches/lucene_3_0/contrib/benchmark/src/test/org/apache/lucene/benchmark/byTask/TestPerfTasksLogic.java lucene/java/branches/lucene_3_0/src/java/org/apache/lucene/index/LogMergePolicy.java lucene/java/branches/lucene_3_0/src/test/org/apache/lucene/index/TestIndexWriterMergePolicy.java Modified: lucene/java/branches/lucene_3_0/CHANGES.txt URL: http://svn.apache.org/viewvc/lucene/java/branches/lucene_3_0/CHANGES.txt?rev=1038783&r1=1038782&r2=1038783&view=diff ============================================================================== --- lucene/java/branches/lucene_3_0/CHANGES.txt (original) +++ lucene/java/branches/lucene_3_0/CHANGES.txt Wed Nov 24 19:47:43 2010 @@ -15,6 +15,13 @@ Changes in runtime behavior worst-case free disk space required during optimize is now 3X the index size, when compound file is enabled (else 2X). (Mike McCandless) + +* LUCENE-2773: LogMergePolicy accepts a double noCFSRatio (default = + 0.1), which means any time a merged segment is greater than 10% of + the index size, it will be left in non-compound format even if + compound format is on. This change was made to reduce peak + transient disk usage during optimize which increased due to + LUCENE-2762. (Mike McCandless) Bug fixes @@ -108,6 +115,15 @@ Bug fixes * LUCENE-2216: OpenBitSet.hashCode returned different hash codes for sets that only differed by trailing zeros. (Dawid Weiss, yonik) +API Changes + +* LUCENE-2773: LogMergePolicy accepts a double noCFSRatio (default = + 0.1), which means any time a merged segment is greater than 10% of + the index size, it will be left in non-compound format even if + compound format is on. This change was made to reduce peak + transient disk usage during optimize which increased due to + LUCENE-2762. (Mike McCandless) + Optimizations * LUCENE-2556: Improve memory usage after cloning TermAttribute. Modified: lucene/java/branches/lucene_3_0/common-build.xml URL: http://svn.apache.org/viewvc/lucene/java/branches/lucene_3_0/common-build.xml?rev=1038783&r1=1038782&r2=1038783&view=diff ============================================================================== --- lucene/java/branches/lucene_3_0/common-build.xml (original) +++ lucene/java/branches/lucene_3_0/common-build.xml Wed Nov 24 19:47:43 2010 @@ -42,7 +42,7 @@ - + Modified: lucene/java/branches/lucene_3_0/contrib/benchmark/src/test/org/apache/lucene/benchmark/byTask/TestPerfTasksLogic.java URL: http://svn.apache.org/viewvc/lucene/java/branches/lucene_3_0/contrib/benchmark/src/test/org/apache/lucene/benchmark/byTask/TestPerfTasksLogic.java?rev=1038783&r1=1038782&r2=1038783&view=diff ============================================================================== --- lucene/java/branches/lucene_3_0/contrib/benchmark/src/test/org/apache/lucene/benchmark/byTask/TestPerfTasksLogic.java (original) +++ lucene/java/branches/lucene_3_0/contrib/benchmark/src/test/org/apache/lucene/benchmark/byTask/TestPerfTasksLogic.java Wed Nov 24 19:47:43 2010 @@ -17,33 +17,34 @@ package org.apache.lucene.benchmark.byTask; -import java.io.IOException; -import java.io.StringReader; +import java.io.BufferedReader; import java.io.File; import java.io.FileReader; -import java.io.BufferedReader; -import java.util.List; +import java.io.IOException; +import java.io.StringReader; import java.util.Iterator; +import java.util.List; + +import junit.framework.TestCase; import org.apache.lucene.benchmark.byTask.feeds.DocData; import org.apache.lucene.benchmark.byTask.feeds.NoMoreDataException; import org.apache.lucene.benchmark.byTask.feeds.ReutersContentSource; import org.apache.lucene.benchmark.byTask.feeds.ReutersQueryMaker; -import org.apache.lucene.benchmark.byTask.tasks.CountingSearchTestTask; -import org.apache.lucene.benchmark.byTask.tasks.CountingHighlighterTestTask; import org.apache.lucene.benchmark.byTask.stats.TaskStats; +import org.apache.lucene.benchmark.byTask.tasks.CountingHighlighterTestTask; +import org.apache.lucene.benchmark.byTask.tasks.CountingSearchTestTask; import org.apache.lucene.index.IndexReader; import org.apache.lucene.index.IndexWriter; -import org.apache.lucene.index.TermEnum; -import org.apache.lucene.index.TermDocs; -import org.apache.lucene.index.SerialMergeScheduler; import org.apache.lucene.index.LogDocMergePolicy; +import org.apache.lucene.index.SegmentInfos; +import org.apache.lucene.index.SerialMergeScheduler; +import org.apache.lucene.index.TermDocs; +import org.apache.lucene.index.TermEnum; import org.apache.lucene.index.TermFreqVector; -import org.apache.lucene.store.Directory; import org.apache.lucene.search.FieldCache.StringIndex; import org.apache.lucene.search.FieldCache; - -import junit.framework.TestCase; +import org.apache.lucene.store.Directory; /** * Test very simply that perf tasks - simple algorithms - are doing what they should. @@ -776,12 +777,9 @@ public class TestPerfTasksLogic extends ir.close(); // Make sure we have 3 segments: - final String[] files = benchmark.getRunData().getDirectory().listAll(); - int cfsCount = 0; - for(int i=0;i= 10% of + * the index, then we disable compound file for it. + * @see setNoCFSRatio */ + public static final double DEFAULT_NO_CFS_RATIO = 0.1; + private int mergeFactor = DEFAULT_MERGE_FACTOR; long minMergeSize; long maxMergeSize; int maxMergeDocs = DEFAULT_MAX_MERGE_DOCS; + protected double noCFSRatio = DEFAULT_NO_CFS_RATIO; + /* TODO 3.0: change this default to true */ protected boolean calibrateSizeByDeletes = false; @@ -73,6 +80,23 @@ public abstract class LogMergePolicy ext protected boolean verbose() { return writer != null && writer.verbose(); } + + /** @see setNoCFSRatio */ + public double getNoCFSRatio() { + return noCFSRatio; + } + + /** If a merged segment will be more than this percentage + * of the total size of the index, leave the segment as + * non-compound file even if compound file is enabled. + * Set to 1.0 to always use CFS regardless of merge + * size. */ + public void setNoCFSRatio(double noCFSRatio) { + if (noCFSRatio < 0.0 || noCFSRatio > 1.0) { + throw new IllegalArgumentException("noCFSRatio must be 0.0 to 1.0 inclusive; got " + noCFSRatio); + } + this.noCFSRatio = noCFSRatio; + } private void message(String message) { if (verbose()) @@ -203,7 +227,7 @@ public abstract class LogMergePolicy ext return !hasDeletions && !info.hasSeparateNorms() && info.dir == writer.getDirectory() && - info.getUseCompoundFile() == useCompoundFile; + (info.getUseCompoundFile() == useCompoundFile || noCFSRatio < 1.0); } /** Returns the merges necessary to optimize the index. @@ -242,7 +266,7 @@ public abstract class LogMergePolicy ext // First, enroll all "full" merges (size // mergeFactor) to potentially be run concurrently: while (last - maxNumSegments + 1 >= mergeFactor) { - spec.add(new OneMerge(infos.range(last-mergeFactor, last), useCompoundFile)); + spec.add(makeOneMerge(infos, infos.range(last-mergeFactor, last))); last -= mergeFactor; } @@ -254,7 +278,7 @@ public abstract class LogMergePolicy ext // Since we must optimize down to 1 segment, the // choice is simple: if (last > 1 || !isOptimized(infos.info(0))) - spec.add(new OneMerge(infos.range(0, last), useCompoundFile)); + spec.add(makeOneMerge(infos, infos.range(0, last))); } else if (last > maxNumSegments) { // Take care to pick a partial merge that is @@ -282,7 +306,7 @@ public abstract class LogMergePolicy ext } } - spec.add(new OneMerge(infos.range(bestStart, bestStart+finalMergeSize), useCompoundFile)); + spec.add(makeOneMerge(infos, infos.range(bestStart, bestStart+finalMergeSize))); } } @@ -322,7 +346,7 @@ public abstract class LogMergePolicy ext // deletions, so force a merge now: if (verbose()) message(" add merge " + firstSegmentWithDeletions + " to " + (i-1) + " inclusive"); - spec.add(new OneMerge(segmentInfos.range(firstSegmentWithDeletions, i), useCompoundFile)); + spec.add(makeOneMerge(segmentInfos, segmentInfos.range(firstSegmentWithDeletions, i))); firstSegmentWithDeletions = i; } } else if (firstSegmentWithDeletions != -1) { @@ -331,7 +355,7 @@ public abstract class LogMergePolicy ext // mergeFactor segments if (verbose()) message(" add merge " + firstSegmentWithDeletions + " to " + (i-1) + " inclusive"); - spec.add(new OneMerge(segmentInfos.range(firstSegmentWithDeletions, i), useCompoundFile)); + spec.add(makeOneMerge(segmentInfos, segmentInfos.range(firstSegmentWithDeletions, i))); firstSegmentWithDeletions = -1; } } @@ -339,7 +363,7 @@ public abstract class LogMergePolicy ext if (firstSegmentWithDeletions != -1) { if (verbose()) message(" add merge " + firstSegmentWithDeletions + " to " + (numSegments-1) + " inclusive"); - spec.add(new OneMerge(segmentInfos.range(firstSegmentWithDeletions, numSegments), useCompoundFile)); + spec.add(makeOneMerge(segmentInfos, segmentInfos.range(firstSegmentWithDeletions, numSegments))); } return spec; @@ -439,7 +463,7 @@ public abstract class LogMergePolicy ext spec = new MergeSpecification(); if (verbose()) message(" " + start + " to " + end + ": add this merge"); - spec.add(new OneMerge(infos.range(start, end), useCompoundFile)); + spec.add(makeOneMerge(infos, infos.range(start, end))); } else if (verbose()) message(" " + start + " to " + end + ": contains segment over maxMergeSize or maxMergeDocs; skipping"); @@ -453,6 +477,29 @@ public abstract class LogMergePolicy ext return spec; } + protected OneMerge makeOneMerge(SegmentInfos infos, SegmentInfos infosToMerge) throws IOException { + final boolean doCFS; + if (!useCompoundFile) { + doCFS = false; + } else if (noCFSRatio == 1.0) { + doCFS = true; + } else { + + long totSize = 0; + for(SegmentInfo info : infos) { + totSize += size(info); + } + long mergeSize = 0; + for(SegmentInfo info : infosToMerge) { + mergeSize += size(info); + } + + doCFS = mergeSize <= noCFSRatio * totSize; + } + + return new OneMerge(infosToMerge, doCFS); + } + /**

Determines the largest segment (measured by * document count) that may be merged with other segments. * Small values (e.g., less than 10,000) are best for Modified: lucene/java/branches/lucene_3_0/src/test/org/apache/lucene/index/TestIndexWriterMergePolicy.java URL: http://svn.apache.org/viewvc/lucene/java/branches/lucene_3_0/src/test/org/apache/lucene/index/TestIndexWriterMergePolicy.java?rev=1038783&r1=1038782&r2=1038783&view=diff ============================================================================== --- lucene/java/branches/lucene_3_0/src/test/org/apache/lucene/index/TestIndexWriterMergePolicy.java (original) +++ lucene/java/branches/lucene_3_0/src/test/org/apache/lucene/index/TestIndexWriterMergePolicy.java Wed Nov 24 19:47:43 2010 @@ -244,25 +244,5 @@ public class TestIndexWriterMergePolicy if (upperBound * mergeFactor <= maxMergeDocs) { assertTrue(numSegments < mergeFactor); } - - String[] files = writer.getDirectory().listAll(); - int segmentCfsCount = 0; - for (int i = 0; i < files.length; i++) { - if (files[i].endsWith(".cfs")) { - segmentCfsCount++; - } - } - assertEquals("index=" + writer.segString(), segmentCount, segmentCfsCount); - } - - /* - private void printSegmentDocCounts(IndexWriter writer) { - int segmentCount = writer.getSegmentCount(); - System.out.println("" + segmentCount + " segments total"); - for (int i = 0; i < segmentCount; i++) { - System.out.println(" segment " + i + " has " + writer.getDocCount(i) - + " docs"); - } } - */ }