lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Shai Erera <ser...@gmail.com>
Subject Re: FlushPolicy and maxBufDelTerm
Date Thu, 01 Aug 2013 15:24:42 GMT
> I think the doc is correct

Wait, one of the docs is wrong. I guess according to what you write, it's
FlushPolicy, as a new segment is not flushed per this setting?
Or perhaps they should be clarified that the deletes are flushed == applied
on existing segments?

I disabled reader pooling and I still don't see .del files. But I think
that's explained due to there are no segments in the index yet.
All documents are still in the RAM buffer, and according to what you write,
I shouldn't see any segment cause of delTerms?

Shai


On Thu, Aug 1, 2013 at 5:40 PM, Michael McCandless <
lucene@mikemccandless.com> wrote:

> First off, it's bad that you don't see .del files when
> conf.setMaxBufferedDeleteTerms is 1.
>
> But, it could be that newIndexWriterConfig turned on readerPooling
> which would mean the deletes are held in the SegmentReader and not
> flushed to disk.  Can you make sure that's off?
>
> Second off, I think the doc is correct: a segment will not be flushed;
> rather, new .del files should appear against older segments.
>
> And yes, if RAM usage of the buffered del Term/Query s is too high,
> then a segment is flushed along with the deletes being applied
> (creating the .del files).
>
> I think buffered delete Querys are not counted towards
> setMaxBufferedDeleteTerms; so they are only flushed by RAM usage
> (rough rough estimate) or by other ops (merging, NRT reopen, commit,
> etc.).
>
> Mike McCandless
>
> http://blog.mikemccandless.com
>
>
> On Thu, Aug 1, 2013 at 9:03 AM, Shai Erera <serera@gmail.com> wrote:
> > Hi
> >
> > I'm a little confused about FlushPolicy and
> > IndexWriterConfig.setMaxBufferedDeleteTerms documentation. FlushPolicy
> jdocs
> > say:
> >
> >  * Segments are traditionally flushed by:
> >  * <ul>
> >  * <li>RAM consumption - configured via
> > ...
> >  * <li>Number of buffered delete terms/queries - configured via
> >  * {@link IndexWriterConfig#setMaxBufferedDeleteTerms(int)}</li>
> >  * </ul>
> >
> > Yet IWC.setMaxBufDelTerm says:
> >
> > NOTE: This setting won't trigger a segment flush.
> >
> > And FlushByRamOrCountPolicy says:
> >
> >  * <li>{@link #onDelete(DocumentsWriterFlushControl,
> > DocumentsWriterPerThreadPool.ThreadState)} - flushes
> >  * based on the global number of buffered delete terms iff
> >  * {@link IndexWriterConfig#getMaxBufferedDeleteTerms()} is enabled</li>
> >
> > Confused, I wrote a short unit test:
> >
> >   public void testMaxBufDelTerm() throws Exception {
> >     Directory dir = new RAMDirectory();
> >     IndexWriterConfig conf = newIndexWriterConfig(TEST_VERSION_CURRENT,
> new
> > MockAnalyzer(random()));
> >     conf.setMaxBufferedDeleteTerms(1);
> >     conf.setMaxBufferedDocs(10);
> >     conf.setRAMBufferSizeMB(IndexWriterConfig.DISABLE_AUTO_FLUSH);
> >     conf.setInfoStream(new PrintStreamInfoStream(System.out));
> >     IndexWriter writer = new IndexWriter(dir, conf );
> >     int numDocs = 4;
> >     for (int i = 0; i < numDocs; i++) {
> >       Document doc = new Document();
> >       doc.add(new StringField("id", "doc-" + i, Store.NO));
> >       writer.addDocument(doc);
> >     }
> >
> >     System.out.println("before delete");
> >     for (String f : dir.listAll()) System.out.println(f);
> >
> >     writer.deleteDocuments(new Term("id", "doc-0"));
> >     writer.deleteDocuments(new Term("id", "doc-1"));
> >
> >     System.out.println("\nafter delete");
> >     for (String f : dir.listAll()) System.out.println(f);
> >
> >     writer.close();
> >     dir.close();
> >   }
> >
> > When InfoStream is turned on, I can see messages regarding terms flushing
> > (vs if I comment the .setMaxBufDelTerm line), so I know this settings
> takes
> > effect.
> > Yet both before and after the delete operations, the dir.list() returns
> only
> > the fdx and fdt files.
> >
> > So is this a bug that a segment isn't flushed? If not (and I'm ok with
> > that), is it a documentation inconsistency?
> > Strangely, I think, if the delTerms RAM accounting exhausts
> max-RAM-buffer
> > size, a new segment will be deleted?
> >
> > Slightly unrelated to FlushPolicy, but do I understand correctly that
> > maxBufDelTerm does not apply to delete-by-query operations?
> > BufferedDeletes doesn't increment any counter on addQuery(), so is it
> > correct to assume that if I only delete-by-query, this setting has no
> > effect?
> > And the delete queries are buffered until the next segment is flushed
> due to
> > other operations (constraints, commit, NRT-reopen)?
> >
> > Shai
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org
>
>

Mime
View raw message