lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Shai Erera <ser...@gmail.com>
Subject Re: FlushPolicy and maxBufDelTerm
Date Thu, 01 Aug 2013 15:48:54 GMT
I set maxBufDocs=2 so that I get a segment flushed, and indeed after delete
I see _0.del.

So I guess this is just docs inconsistency. I'll clarify FlushPolicy docs.

Shai


On Thu, Aug 1, 2013 at 6:24 PM, Shai Erera <serera@gmail.com> wrote:

> > I think the doc is correct
>
> Wait, one of the docs is wrong. I guess according to what you write, it's
> FlushPolicy, as a new segment is not flushed per this setting?
> Or perhaps they should be clarified that the deletes are flushed ==
> applied on existing segments?
>
> I disabled reader pooling and I still don't see .del files. But I think
> that's explained due to there are no segments in the index yet.
> All documents are still in the RAM buffer, and according to what you
> write, I shouldn't see any segment cause of delTerms?
>
> Shai
>
>
> On Thu, Aug 1, 2013 at 5:40 PM, Michael McCandless <
> lucene@mikemccandless.com> wrote:
>
>> First off, it's bad that you don't see .del files when
>> conf.setMaxBufferedDeleteTerms is 1.
>>
>> But, it could be that newIndexWriterConfig turned on readerPooling
>> which would mean the deletes are held in the SegmentReader and not
>> flushed to disk.  Can you make sure that's off?
>>
>> Second off, I think the doc is correct: a segment will not be flushed;
>> rather, new .del files should appear against older segments.
>>
>> And yes, if RAM usage of the buffered del Term/Query s is too high,
>> then a segment is flushed along with the deletes being applied
>> (creating the .del files).
>>
>> I think buffered delete Querys are not counted towards
>> setMaxBufferedDeleteTerms; so they are only flushed by RAM usage
>> (rough rough estimate) or by other ops (merging, NRT reopen, commit,
>> etc.).
>>
>> Mike McCandless
>>
>> http://blog.mikemccandless.com
>>
>>
>> On Thu, Aug 1, 2013 at 9:03 AM, Shai Erera <serera@gmail.com> wrote:
>> > Hi
>> >
>> > I'm a little confused about FlushPolicy and
>> > IndexWriterConfig.setMaxBufferedDeleteTerms documentation. FlushPolicy
>> jdocs
>> > say:
>> >
>> >  * Segments are traditionally flushed by:
>> >  * <ul>
>> >  * <li>RAM consumption - configured via
>> > ...
>> >  * <li>Number of buffered delete terms/queries - configured via
>> >  * {@link IndexWriterConfig#setMaxBufferedDeleteTerms(int)}</li>
>> >  * </ul>
>> >
>> > Yet IWC.setMaxBufDelTerm says:
>> >
>> > NOTE: This setting won't trigger a segment flush.
>> >
>> > And FlushByRamOrCountPolicy says:
>> >
>> >  * <li>{@link #onDelete(DocumentsWriterFlushControl,
>> > DocumentsWriterPerThreadPool.ThreadState)} - flushes
>> >  * based on the global number of buffered delete terms iff
>> >  * {@link IndexWriterConfig#getMaxBufferedDeleteTerms()} is enabled</li>
>> >
>> > Confused, I wrote a short unit test:
>> >
>> >   public void testMaxBufDelTerm() throws Exception {
>> >     Directory dir = new RAMDirectory();
>> >     IndexWriterConfig conf = newIndexWriterConfig(TEST_VERSION_CURRENT,
>> new
>> > MockAnalyzer(random()));
>> >     conf.setMaxBufferedDeleteTerms(1);
>> >     conf.setMaxBufferedDocs(10);
>> >     conf.setRAMBufferSizeMB(IndexWriterConfig.DISABLE_AUTO_FLUSH);
>> >     conf.setInfoStream(new PrintStreamInfoStream(System.out));
>> >     IndexWriter writer = new IndexWriter(dir, conf );
>> >     int numDocs = 4;
>> >     for (int i = 0; i < numDocs; i++) {
>> >       Document doc = new Document();
>> >       doc.add(new StringField("id", "doc-" + i, Store.NO));
>> >       writer.addDocument(doc);
>> >     }
>> >
>> >     System.out.println("before delete");
>> >     for (String f : dir.listAll()) System.out.println(f);
>> >
>> >     writer.deleteDocuments(new Term("id", "doc-0"));
>> >     writer.deleteDocuments(new Term("id", "doc-1"));
>> >
>> >     System.out.println("\nafter delete");
>> >     for (String f : dir.listAll()) System.out.println(f);
>> >
>> >     writer.close();
>> >     dir.close();
>> >   }
>> >
>> > When InfoStream is turned on, I can see messages regarding terms
>> flushing
>> > (vs if I comment the .setMaxBufDelTerm line), so I know this settings
>> takes
>> > effect.
>> > Yet both before and after the delete operations, the dir.list() returns
>> only
>> > the fdx and fdt files.
>> >
>> > So is this a bug that a segment isn't flushed? If not (and I'm ok with
>> > that), is it a documentation inconsistency?
>> > Strangely, I think, if the delTerms RAM accounting exhausts
>> max-RAM-buffer
>> > size, a new segment will be deleted?
>> >
>> > Slightly unrelated to FlushPolicy, but do I understand correctly that
>> > maxBufDelTerm does not apply to delete-by-query operations?
>> > BufferedDeletes doesn't increment any counter on addQuery(), so is it
>> > correct to assume that if I only delete-by-query, this setting has no
>> > effect?
>> > And the delete queries are buffered until the next segment is flushed
>> due to
>> > other operations (constraints, commit, NRT-reopen)?
>> >
>> > Shai
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: dev-help@lucene.apache.org
>>
>>
>

Mime
View raw message