Return-Path: X-Original-To: apmail-lucene-dev-archive@www.apache.org Delivered-To: apmail-lucene-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id BDE0710643 for ; Thu, 1 Aug 2013 15:49:42 +0000 (UTC) Received: (qmail 26204 invoked by uid 500); 1 Aug 2013 15:49:41 -0000 Delivered-To: apmail-lucene-dev-archive@lucene.apache.org Received: (qmail 26134 invoked by uid 500); 1 Aug 2013 15:49:41 -0000 Mailing-List: contact dev-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@lucene.apache.org Delivered-To: mailing list dev@lucene.apache.org Received: (qmail 26127 invoked by uid 99); 1 Aug 2013 15:49:41 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 01 Aug 2013 15:49:41 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of serera@gmail.com designates 74.125.82.53 as permitted sender) Received: from [74.125.82.53] (HELO mail-wg0-f53.google.com) (74.125.82.53) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 01 Aug 2013 15:49:34 +0000 Received: by mail-wg0-f53.google.com with SMTP id c11so1821001wgh.8 for ; Thu, 01 Aug 2013 08:49:14 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :content-type; bh=gwLZr5qdMo4RVKFHoX15xLbnhky3xra5PrZl7VBWuNY=; b=VLPlafgN+pe3rljgIDy0kmWx1rtPQZG7FqT7pBr8jf3DOFMRBlhF+X1SyQL0Q4cxWA BECR+LH40hEHjH1ohw7ZFTspUY7JWa4qID+QGiLZgjvoopERKL0Yw/4E9ErM2kPdCLXw 8m2cTWE6rTj/U+Rcbm5GmdRYILMp8jOq6bO1Z3mC5mITt9dXTVO11inFRe/+GYx6bg14 T9YZklzC/ziLNSzgM5sHOEUs9+5GY8SCwulgzUhbsEiAECtr+rSk6vkYplSXN2kv9Tnq GwU525RWP7WVSKV6zDevYgui4b+hJztsz06MF6bVB7iXDLe+8pLZcKC+8WDzyQQx7X1x 1Msw== X-Received: by 10.180.211.7 with SMTP id my7mr8346465wic.26.1375372154567; Thu, 01 Aug 2013 08:49:14 -0700 (PDT) MIME-Version: 1.0 Received: by 10.194.169.161 with HTTP; Thu, 1 Aug 2013 08:48:54 -0700 (PDT) In-Reply-To: References: From: Shai Erera Date: Thu, 1 Aug 2013 18:48:54 +0300 Message-ID: Subject: Re: FlushPolicy and maxBufDelTerm To: "dev@lucene.apache.org" Content-Type: multipart/alternative; boundary=001a11c259f0b502a704e2e4c89d X-Virus-Checked: Checked by ClamAV on apache.org --001a11c259f0b502a704e2e4c89d Content-Type: text/plain; charset=ISO-8859-1 I set maxBufDocs=2 so that I get a segment flushed, and indeed after delete I see _0.del. So I guess this is just docs inconsistency. I'll clarify FlushPolicy docs. Shai On Thu, Aug 1, 2013 at 6:24 PM, Shai Erera wrote: > > I think the doc is correct > > Wait, one of the docs is wrong. I guess according to what you write, it's > FlushPolicy, as a new segment is not flushed per this setting? > Or perhaps they should be clarified that the deletes are flushed == > applied on existing segments? > > I disabled reader pooling and I still don't see .del files. But I think > that's explained due to there are no segments in the index yet. > All documents are still in the RAM buffer, and according to what you > write, I shouldn't see any segment cause of delTerms? > > Shai > > > On Thu, Aug 1, 2013 at 5:40 PM, Michael McCandless < > lucene@mikemccandless.com> wrote: > >> First off, it's bad that you don't see .del files when >> conf.setMaxBufferedDeleteTerms is 1. >> >> But, it could be that newIndexWriterConfig turned on readerPooling >> which would mean the deletes are held in the SegmentReader and not >> flushed to disk. Can you make sure that's off? >> >> Second off, I think the doc is correct: a segment will not be flushed; >> rather, new .del files should appear against older segments. >> >> And yes, if RAM usage of the buffered del Term/Query s is too high, >> then a segment is flushed along with the deletes being applied >> (creating the .del files). >> >> I think buffered delete Querys are not counted towards >> setMaxBufferedDeleteTerms; so they are only flushed by RAM usage >> (rough rough estimate) or by other ops (merging, NRT reopen, commit, >> etc.). >> >> Mike McCandless >> >> http://blog.mikemccandless.com >> >> >> On Thu, Aug 1, 2013 at 9:03 AM, Shai Erera wrote: >> > Hi >> > >> > I'm a little confused about FlushPolicy and >> > IndexWriterConfig.setMaxBufferedDeleteTerms documentation. FlushPolicy >> jdocs >> > say: >> > >> > * Segments are traditionally flushed by: >> > *
    >> > *
  • RAM consumption - configured via >> > ... >> > *
  • Number of buffered delete terms/queries - configured via >> > * {@link IndexWriterConfig#setMaxBufferedDeleteTerms(int)}
  • >> > *
>> > >> > Yet IWC.setMaxBufDelTerm says: >> > >> > NOTE: This setting won't trigger a segment flush. >> > >> > And FlushByRamOrCountPolicy says: >> > >> > *
  • {@link #onDelete(DocumentsWriterFlushControl, >> > DocumentsWriterPerThreadPool.ThreadState)} - flushes >> > * based on the global number of buffered delete terms iff >> > * {@link IndexWriterConfig#getMaxBufferedDeleteTerms()} is enabled
  • >> > >> > Confused, I wrote a short unit test: >> > >> > public void testMaxBufDelTerm() throws Exception { >> > Directory dir = new RAMDirectory(); >> > IndexWriterConfig conf = newIndexWriterConfig(TEST_VERSION_CURRENT, >> new >> > MockAnalyzer(random())); >> > conf.setMaxBufferedDeleteTerms(1); >> > conf.setMaxBufferedDocs(10); >> > conf.setRAMBufferSizeMB(IndexWriterConfig.DISABLE_AUTO_FLUSH); >> > conf.setInfoStream(new PrintStreamInfoStream(System.out)); >> > IndexWriter writer = new IndexWriter(dir, conf ); >> > int numDocs = 4; >> > for (int i = 0; i < numDocs; i++) { >> > Document doc = new Document(); >> > doc.add(new StringField("id", "doc-" + i, Store.NO)); >> > writer.addDocument(doc); >> > } >> > >> > System.out.println("before delete"); >> > for (String f : dir.listAll()) System.out.println(f); >> > >> > writer.deleteDocuments(new Term("id", "doc-0")); >> > writer.deleteDocuments(new Term("id", "doc-1")); >> > >> > System.out.println("\nafter delete"); >> > for (String f : dir.listAll()) System.out.println(f); >> > >> > writer.close(); >> > dir.close(); >> > } >> > >> > When InfoStream is turned on, I can see messages regarding terms >> flushing >> > (vs if I comment the .setMaxBufDelTerm line), so I know this settings >> takes >> > effect. >> > Yet both before and after the delete operations, the dir.list() returns >> only >> > the fdx and fdt files. >> > >> > So is this a bug that a segment isn't flushed? If not (and I'm ok with >> > that), is it a documentation inconsistency? >> > Strangely, I think, if the delTerms RAM accounting exhausts >> max-RAM-buffer >> > size, a new segment will be deleted? >> > >> > Slightly unrelated to FlushPolicy, but do I understand correctly that >> > maxBufDelTerm does not apply to delete-by-query operations? >> > BufferedDeletes doesn't increment any counter on addQuery(), so is it >> > correct to assume that if I only delete-by-query, this setting has no >> > effect? >> > And the delete queries are buffered until the next segment is flushed >> due to >> > other operations (constraints, commit, NRT-reopen)? >> > >> > Shai >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org >> For additional commands, e-mail: dev-help@lucene.apache.org >> >> > --001a11c259f0b502a704e2e4c89d Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable
    I set maxBufDocs=3D2 so that I get a segment flushed, and = indeed after delete I see _0.del.

    So I guess this is just docs incon= sistency. I'll clarify FlushPolicy docs.

    Shai


    On Thu, Aug 1, 2013 at 6:24 PM, Shai Ere= ra <serera@gmail.com> wrote:
    > I think the doc= is correct

    Wait, one of the docs is wrong. I guess acco= rding to what you write, it's FlushPolicy, as a new segment is not flus= hed per this setting?
    Or perhaps they should be clarified that the deletes are flushed =3D= =3D applied on existing segments?

    I disabled reader pooling and I st= ill don't see .del files. But I think that's explained due to there= are no segments in the index yet.
    All documents are still in the RAM buffer, and according to what you = write, I shouldn't see any segment cause of delTerms?

    Shai


    On Thu, Aug 1, 2013 at 5:40 PM, Michael M= cCandless <lucene@mikemccandless.com> wrote:
    First off, it's bad that you don't s= ee .del files when
    conf.setMaxBufferedDeleteTerms is 1.

    But, it could be that newIndexWriterConfig turned on readerPooling
    which would mean the deletes are held in the SegmentReader and not
    flushed to disk. =A0Can you make sure that's off?

    Second off, I think the doc is correct: a segment will not be flushed;
    rather, new .del files should appear against older segments.

    And yes, if RAM usage of the buffered del Term/Query s is too high,
    then a segment is flushed along with the deletes being applied
    (creating the .del files).

    I think buffered delete Querys are not counted towards
    setMaxBufferedDeleteTerms; so they are only flushed by RAM usage
    (rough rough estimate) or by other ops (merging, NRT reopen, commit,
    etc.).

    Mike McCandless

    http://blog.mi= kemccandless.com


    On Thu, Aug 1, 20= 13 at 9:03 AM, Shai Erera <serera@gmail.com> wrote:
    > Hi
    >
    > I'm a little confused about FlushPolicy and
    > IndexWriterConfig.setMaxBufferedDeleteTerms documentation. FlushPolicy= jdocs
    > say:
    >
    > =A0* Segments are traditionally flushed by:
    > =A0* <ul>
    > =A0* <li>RAM consumption - configured via
    > ...
    > =A0* <li>Number of buffered delete terms/queries - configured vi= a
    > =A0* {@link IndexWriterConfig#setMaxBufferedDeleteTerms(int)}</li&g= t;
    > =A0* </ul>
    >
    > Yet IWC.setMaxBufDelTerm says:
    >
    > NOTE: This setting won't trigger a segment flush.
    >
    > And FlushByRamOrCountPolicy says:
    >
    > =A0* <li>{@link #onDelete(DocumentsWriterFlushControl,
    > DocumentsWriterPerThreadPool.ThreadState)} - flushes
    > =A0* based on the global number of buffered delete terms iff
    > =A0* {@link IndexWriterConfig#getMaxBufferedDeleteTerms()} is enabled&= lt;/li>
    >
    > Confused, I wrote a short unit test:
    >
    > =A0 public void testMaxBufDelTerm() throws Exception {
    > =A0 =A0 Directory dir =3D new RAMDirectory();
    > =A0 =A0 IndexWriterConfig conf =3D newIndexWriterConfig(TEST_VERSION_C= URRENT, new
    > MockAnalyzer(random()));
    > =A0 =A0 conf.setMaxBufferedDeleteTerms(1);
    > =A0 =A0 conf.setMaxBufferedDocs(10);
    > =A0 =A0 conf.setRAMBufferSizeMB(IndexWriterConfig.DISABLE_AUTO_FLUSH);=
    > =A0 =A0 conf.setInfoStream(new PrintStreamInfoStream(System.out));
    > =A0 =A0 IndexWriter writer =3D new IndexWriter(dir, conf );
    > =A0 =A0 int numDocs =3D 4;
    > =A0 =A0 for (int i =3D 0; i < numDocs; i++) {
    > =A0 =A0 =A0 Document doc =3D new Document();
    > =A0 =A0 =A0 doc.add(new StringField("id", "doc-" += i, Store.NO));
    > =A0 =A0 =A0 writer.addDocument(doc);
    > =A0 =A0 }
    >
    > =A0 =A0 System.out.println("before delete");
    > =A0 =A0 for (String f : dir.listAll()) System.out.println(f);
    >
    > =A0 =A0 writer.deleteDocuments(new Term("id", "doc-0&qu= ot;));
    > =A0 =A0 writer.deleteDocuments(new Term("id", "doc-1&qu= ot;));
    >
    > =A0 =A0 System.out.println("\nafter delete");
    > =A0 =A0 for (String f : dir.listAll()) System.out.println(f);
    >
    > =A0 =A0 writer.close();
    > =A0 =A0 dir.close();
    > =A0 }
    >
    > When InfoStream is turned on, I can see messages regarding terms flush= ing
    > (vs if I comment the .setMaxBufDelTerm line), so I know this settings = takes
    > effect.
    > Yet both before and after the delete operations, the dir.list() return= s only
    > the fdx and fdt files.
    >
    > So is this a bug that a segment isn't flushed? If not (and I'm= ok with
    > that), is it a documentation inconsistency?
    > Strangely, I think, if the delTerms RAM accounting exhausts max-RAM-bu= ffer
    > size, a new segment will be deleted?
    >
    > Slightly unrelated to FlushPolicy, but do I understand correctly that<= br> > maxBufDelTerm does not apply to delete-by-query operations?
    > BufferedDeletes doesn't increment any counter on addQuery(), so is= it
    > correct to assume that if I only delete-by-query, this setting has no<= br> > effect?
    > And the delete queries are buffered until the next segment is flushed = due to
    > other operations (constraints, commit, NRT-reopen)?
    >
    > Shai

    ---------------------------------------------------------------= ------
    To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: dev-help@lucene.apache.org



    --001a11c259f0b502a704e2e4c89d--