lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Doron Cohen <DOR...@il.ibm.com>
Subject flushRamSegments() is "over merging"?
Date Tue, 15 Aug 2006 08:31:18 GMT

Hi, I ran into this while reviewing the patch for 565.

It appears that closing an index writer with non empty ram segments (at
least 1 doc was added) is causing a merge with the last (most recent) on
disk segment.

This seems to me problematic in the case that an application has a lot of
interleaving - adding / removing documents, or even switching indexes,
therefore the indexWriter would be closed often.

The test case below demonstrates this behavior - all maxBufferedDocs,
maxMergeDocs, mergeFactor are assigned very large values, and in a loop a
few documents are added and the indexWriter is closed and re-opened.

Surprisingly (at least for me) the number of segments on disk remains 1.
In other words, each time the IndexWriter is closed, the single disk
segment is merged with the current ram segments and re-written to a new
disk segments.

The "blame" is in the second line here:
    if (minSegment < 0 ||                   // add one FS segment?
        (docCount + segmentInfos.info(minSegment).docCount) > mergeFactor
||
        !(segmentInfos.info(segmentInfos.size()-1).dir == ramDirectory))

This code in flushRamSegments() merges the (temporary) ram segments with
the most recent non-temporary segment.

I can see how this can make sense in some cases. Perhaps an additional
constraint should be added on the ratio of the size of this non-temp
segment to that of all temporary segments, or the difference, or both.

Here is the test case,
Thanks,
Doron
------------------------------------
package org.apache.lucene.index;

import java.io.IOException;

import junit.framework.TestCase;

import org.apache.lucene.analysis.standard.StandardAnalyzer;
import org.apache.lucene.document.Document;
import org.apache.lucene.document.Field;
import org.apache.lucene.store.Directory;
import org.apache.lucene.store.FSDirectory;
import org.apache.lucene.store.RAMDirectory;

/**
 * Test that the number of segments is as expected.
 * I.e. that there was not too many / too few merges.
 *
 * @author Doron Cohen
 */
public class TestNumSegments extends TestCase {

      protected int nextDocNum = 0;
      protected Directory dir = null;
      protected IndexWriter iw = null;
      protected IndexReader ir = null;

      /* (non-Javadoc)
       * @see junit.framework.TestCase#setUp()
       */
      protected void setUp() throws Exception {
            super.setUp();
            //dir = new RAMDirectory();
            dir = FSDirectory.getDirectory("test.num.segments",true);
            iw = new IndexWriter(dir, new StandardAnalyzer(), true);
            setLimits(iw);
            addSomeDocs(); // some docs in index
      }

      // for now, take these limits out of the "game"
      protected void setLimits(IndexWriter iw) {
            iw.setMaxBufferedDocs(Integer.MAX_VALUE-1);
            iw.setMaxMergeDocs(Integer.MAX_VALUE-1);
            iw.setMergeFactor(Integer.MAX_VALUE-1);
      }

      /* (non-Javadoc)
       * @see junit.framework.TestCase#tearDown()
       */
      protected void tearDown() throws Exception {
            closeW();
            if (dir!=null) {
                  dir.close();
            }
            super.tearDown();
      }

      // count how many segments are on a directory - index writer must be
closed
      protected int countDirSegments() throws IOException {
            assertNull(iw);
            SegmentInfos segmentInfos = new SegmentInfos();
            segmentInfos.read(dir);
            int nSegs = segmentInfos.size();
            segmentInfos.clear();
            return nSegs;
      }

      // open writer
      private void openW() throws IOException {
            iw = new IndexWriter(dir, new StandardAnalyzer(), false);
            setLimits(iw);
      }

      private void closeW() throws IOException {
            if (iw!=null) {
                  iw.close();
                  iw=null;
            }
      }

      public void testNumSegments() throws IOException {
            int numExceptions = 0;
            for (int i=1; i<30; i++) {
                  closeW();
                  try {
                        assertEquals("Oops - wrong number of segments!", i,
countDirSegments());
                  } catch (Throwable t) {
                        numExceptions++;
                        System.err.println(i+":  "+t.getMessage());
                  }
                  openW();
                  addSomeDocs();
            }
            assertEquals("Oops!, so many times numbr of egments was
\"wrong\"",0,numExceptions);
      }

      private void addSomeDocs() throws IOException {
            for (int i=0; i<2; i++) {
                  iw.addDocument(getDoc());
            }
      }

      protected Document getDoc() {
            Document doc = new Document();
            doc.add(new Field("body", new Integer(nextDocNum).toString(),
Field.Store.YES, Field.Index.UN_TOKENIZED));
            doc.add(new Field("all", "x", Field.Store.YES,
Field.Index.UN_TOKENIZED));
            nextDocNum ++;
            return doc;
      }

}


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message