lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Shai Erera <ser...@gmail.com>
Subject FlushPolicy and maxBufDelTerm
Date Thu, 01 Aug 2013 13:03:07 GMT
Hi

I'm a little confused about FlushPolicy and
IndexWriterConfig.setMaxBufferedDeleteTerms documentation. FlushPolicy
jdocs say:

 * Segments are traditionally flushed by:
 * <ul>
 * <li>RAM consumption - configured via
...
 * <li>*Number of buffered delete terms/queries* - configured via
 * {@link IndexWriterConfig#setMaxBufferedDeleteTerms(int)}</li>
 * </ul>

Yet IWC.setMaxBufDelTerm says:

NOTE: This setting won't trigger a segment flush.

And FlushByRamOrCountPolicy says:

 * <li>{@link #onDelete(DocumentsWriterFlushControl,
DocumentsWriterPerThreadPool.ThreadState)} - flushes
 * based on the global number of buffered delete terms iff
 * {@link IndexWriterConfig#getMaxBufferedDeleteTerms()} is enabled</li>

Confused, I wrote a short unit test:

  public void testMaxBufDelTerm() throws Exception {
    Directory dir = new RAMDirectory();
    IndexWriterConfig conf = newIndexWriterConfig(TEST_VERSION_CURRENT, new
MockAnalyzer(random()));
    conf.setMaxBufferedDeleteTerms(1);
    conf.setMaxBufferedDocs(10);
    conf.setRAMBufferSizeMB(IndexWriterConfig.DISABLE_AUTO_FLUSH);
    conf.setInfoStream(new PrintStreamInfoStream(System.out));
    IndexWriter writer = new IndexWriter(dir, conf );
    int numDocs = 4;
    for (int i = 0; i < numDocs; i++) {
      Document doc = new Document();
      doc.add(new StringField("id", "doc-" + i, Store.NO));
      writer.addDocument(doc);
    }

    System.out.println("before delete");
    for (String f : dir.listAll()) System.out.println(f);

    writer.deleteDocuments(new Term("id", "doc-0"));
    writer.deleteDocuments(new Term("id", "doc-1"));

    System.out.println("\nafter delete");
    for (String f : dir.listAll()) System.out.println(f);

    writer.close();
    dir.close();
  }

When InfoStream is turned on, I can see messages regarding terms flushing
(vs if I comment the .setMaxBufDelTerm line), so I know this settings takes
effect.
Yet both before and after the delete operations, the dir.list() returns
only the fdx and fdt files.

So is this a bug that a segment isn't flushed? If not (and I'm ok with
that), is it a documentation inconsistency?
Strangely, I think, if the delTerms RAM accounting exhausts max-RAM-buffer
size, a new segment will be deleted?

Slightly unrelated to FlushPolicy, but do I understand correctly that
maxBufDelTerm does not apply to delete-by-query operations?
BufferedDeletes doesn't increment any counter on addQuery(), so is it
correct to assume that if I only delete-by-query, this setting has no
effect?
And the delete queries are buffered until the next segment is flushed due
to other operations (constraints, commit, NRT-reopen)?

Shai

Mime
View raw message