Return-Path: Delivered-To: apmail-lucene-java-dev-archive@www.apache.org Received: (qmail 75939 invoked from network); 5 Mar 2010 15:26:10 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 5 Mar 2010 15:26:10 -0000 Received: (qmail 85819 invoked by uid 500); 5 Mar 2010 15:25:55 -0000 Delivered-To: apmail-lucene-java-dev-archive@lucene.apache.org Received: (qmail 85783 invoked by uid 500); 5 Mar 2010 15:25:55 -0000 Mailing-List: contact java-dev-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-dev@lucene.apache.org Delivered-To: mailing list java-dev@lucene.apache.org Received: (qmail 85776 invoked by uid 99); 5 Mar 2010 15:25:55 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 05 Mar 2010 15:25:55 +0000 X-ASF-Spam-Status: No, hits=0.7 required=10.0 tests=SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (nike.apache.org: local policy) Received: from [74.125.83.48] (HELO mail-gw0-f48.google.com) (74.125.83.48) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 05 Mar 2010 15:25:48 +0000 Received: by gwaa11 with SMTP id a11so1706102gwa.35 for ; Fri, 05 Mar 2010 07:25:27 -0800 (PST) MIME-Version: 1.0 Received: by 10.151.89.31 with SMTP id r31mr1265165ybl.57.1267802727105; Fri, 05 Mar 2010 07:25:27 -0800 (PST) In-Reply-To: <962ea6661003050618m38556f93u5570ec7960528d24@mail.gmail.com> References: <962ea6661003050618m38556f93u5570ec7960528d24@mail.gmail.com> Date: Fri, 5 Mar 2010 10:25:27 -0500 Message-ID: <9ac0c6aa1003050725q77b5708au21dba560a27f8765@mail.gmail.com> Subject: Re: IndexWriter.applyDeletes performance From: Michael McCandless To: java-dev@lucene.apache.org Content-Type: text/plain; charset=ISO-8859-1 X-Virus-Checked: Checked by ClamAV on apache.org Currently you can't tell IW to use the pool (ie, pool is only enabled if you use NRT readers). We should probably make this an option at ctor time, for situations like this. (In fact, in followon discussions about further improvements to NRT we've already discussed having such an option to IW's ctors). I'll open an issue for this. Indeed from that profiler output it looks like most of the time is being spent opening the SegmentReaders (to do deletes), specifically loading the terms dict index (64% overall) and loading the deleted docs (10%). But... how long does step 2 take? Is it an option to not commit on every update? How many docs do you typically update? If you are committing only so that an outside reader can reopen, you should consider just using an NRT reader instead (assuming the reader is in same JVM as IndexWriter). Roughly how much more RAM consumption do you see when you force pooling? Mike On Fri, Mar 5, 2010 at 9:18 AM, Bogdan Ghidireac wrote: > Hi, > > I have an index with 100 million docs that has around 20GB on disk and > an update rate of few hundred docs per minute. The new docs are > grouped in batches and indexed once every few minutes. My problem is > that the update performance degraded too much over time as the index > increased in size (distinct docs). > > My indexing flow looks like this .. > > 0. create indexWriter (only once) > 1. get the open indexWriter > 2. for each doc call indexWriter.updateDocument(pkTerm, doc) > 3. indexWriter.commit > 4. indexWriter.waitForMerges > 5. wait for new docs and goto 1. > > I ran a profiler for several minutes and I noticed that most of the > time the indexer is busy applying the deletes. This takes so much time > because all terms are loaded for every commit (see the attached > profiler screenshot). > > The index writer has a pool or readers but they are not used unless > near real time is enabled. I changed my code to force the pool to be > used but the only way I can do this is to request a reader that is > never used writer.getReader(). Of course, the memory consumption is > higher now because I have terms in memory but the steps 3+4 compete in > 1-2 secs compared to 8-10 secs. > > Is is possible to enable the readers pool at the IndexWriter > constructor level? My current method looks like a hack ... > I am using Lucene 2.9.2. on Linux. > > Regards, > Bogdan > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org > For additional commands, e-mail: java-dev-help@lucene.apache.org > --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org For additional commands, e-mail: java-dev-help@lucene.apache.org