Return-Path: Delivered-To: apmail-lucene-java-dev-archive@www.apache.org Received: (qmail 10424 invoked from network); 8 Mar 2010 11:30:41 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 8 Mar 2010 11:30:41 -0000 Received: (qmail 97426 invoked by uid 500); 8 Mar 2010 11:30:17 -0000 Delivered-To: apmail-lucene-java-dev-archive@lucene.apache.org Received: (qmail 97387 invoked by uid 500); 8 Mar 2010 11:30:17 -0000 Mailing-List: contact java-dev-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-dev@lucene.apache.org Delivered-To: mailing list java-dev@lucene.apache.org Received: (qmail 97380 invoked by uid 99); 8 Mar 2010 11:30:17 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 08 Mar 2010 11:30:17 +0000 X-ASF-Spam-Status: No, hits=0.7 required=10.0 tests=RCVD_IN_DNSWL_NONE,SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (nike.apache.org: local policy) Received: from [209.85.218.217] (HELO mail-bw0-f217.google.com) (209.85.218.217) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 08 Mar 2010 11:30:11 +0000 Received: by bwz9 with SMTP id 9so177813bwz.5 for ; Mon, 08 Mar 2010 03:29:50 -0800 (PST) MIME-Version: 1.0 Received: by 10.204.134.6 with SMTP id h6mr86313bkt.160.1268047788902; Mon, 08 Mar 2010 03:29:48 -0800 (PST) In-Reply-To: <9ac0c6aa1003050725q77b5708au21dba560a27f8765@mail.gmail.com> References: <962ea6661003050618m38556f93u5570ec7960528d24@mail.gmail.com> <9ac0c6aa1003050725q77b5708au21dba560a27f8765@mail.gmail.com> Date: Mon, 8 Mar 2010 13:29:48 +0200 Message-ID: <962ea6661003080329w4ad53612ma72280a1f9802c15@mail.gmail.com> Subject: Re: IndexWriter.applyDeletes performance From: Bogdan Ghidireac To: java-dev@lucene.apache.org Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable X-Virus-Checked: Checked by ClamAV on apache.org Mike, > > But... how long does step 2 take? =A0Is it an option to not commit on > every update? =A0How many docs do you typically update? I do not commit on every update, I call commit once every 10k documents. Indexing 10k docs takes around 10 secs. > > If you are committing only so that an outside reader can reopen, you > should consider just using an NRT reader instead (assuming the reader > is in same JVM as IndexWriter). My service is just an indexer, I don't need a reader. The new segments are pushed to a searcher box after each commit. > > Roughly how much more RAM consumption do you see when you force pooling? pooling not forced -> memory after explicit GC: 50 MB pooling forced -> memory after explicit GC: 250MB Thank you for opening the JIRA issue. Bogdan > > Mike > > On Fri, Mar 5, 2010 at 9:18 AM, Bogdan Ghidireac wro= te: >> Hi, >> >> I have an index with 100 million docs that has around 20GB on disk and >> an update rate of few hundred docs per minute. The new docs are >> grouped in batches and indexed once every few minutes. My problem is >> that the update performance degraded too much over time as the index >> increased in size (distinct docs). >> >> My indexing flow looks like this .. >> >> 0. create indexWriter (only once) >> 1. get the open indexWriter >> 2. for each doc call indexWriter.updateDocument(pkTerm, doc) >> 3. indexWriter.commit >> 4. indexWriter.waitForMerges >> 5. wait for new docs and goto 1. >> >> I ran a profiler for several minutes and I noticed that most of the >> time the indexer is busy applying the deletes. This takes so much time >> because all terms are loaded for every commit (see the attached >> profiler screenshot). >> >> The index writer has a pool or readers but they are not used unless >> near real time is enabled. I changed my code to force the pool to be >> used but the only way I can do this is to request a reader that is >> never used writer.getReader(). Of course, the memory consumption is >> higher now because I have terms in memory but the steps 3+4 compete in >> 1-2 secs compared to 8-10 secs. >> >> Is is possible to enable the readers pool at the IndexWriter >> constructor level? My current method looks like a hack ... >> I am using Lucene 2.9.2. on Linux. >> >> Regards, >> Bogdan >> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org >> For additional commands, e-mail: java-dev-help@lucene.apache.org >> > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org > For additional commands, e-mail: java-dev-help@lucene.apache.org > > --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org For additional commands, e-mail: java-dev-help@lucene.apache.org