From java-user-return-47697-apmail-lucene-java-user-archive=lucene.apache.org@lucene.apache.org Wed Nov 03 19:03:25 2010 Return-Path: Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: (qmail 47536 invoked from network); 3 Nov 2010 19:03:25 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 3 Nov 2010 19:03:25 -0000 Received: (qmail 59303 invoked by uid 500); 3 Nov 2010 19:03:54 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 59238 invoked by uid 500); 3 Nov 2010 19:03:54 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 59225 invoked by uid 99); 3 Nov 2010 19:03:54 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 03 Nov 2010 19:03:54 +0000 X-ASF-Spam-Status: No, hits=0.7 required=10.0 tests=RCVD_IN_DNSWL_NONE,SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (nike.apache.org: local policy) Received: from [74.125.82.48] (HELO mail-ww0-f48.google.com) (74.125.82.48) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 03 Nov 2010 19:03:47 +0000 Received: by wwb39 with SMTP id 39so966903wwb.5 for ; Wed, 03 Nov 2010 12:03:26 -0700 (PDT) MIME-Version: 1.0 Received: by 10.216.19.134 with SMTP id n6mr467206wen.9.1288811005682; Wed, 03 Nov 2010 12:03:25 -0700 (PDT) Received: by 10.216.93.14 with HTTP; Wed, 3 Nov 2010 12:03:25 -0700 (PDT) In-Reply-To: <8ED783FE-BBE2-415A-A3B0-847D7500509E@smartsheet.com> References: <79A8B638-16FD-486C-BE62-76CCAE58D620@smartsheet.com> <8ED783FE-BBE2-415A-A3B0-847D7500509E@smartsheet.com> Date: Wed, 3 Nov 2010 15:03:25 -0400 Message-ID: Subject: Re: IndexWriter.close() performance issue From: Michael McCandless To: java-user@lucene.apache.org Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable X-Virus-Checked: Checked by ClamAV on apache.org Can you run CheckIndex (command line tool) and post the output? How long does it take to open a reader on this same index, and perform a simple query (eg TermQuery)? Mike On Wed, Nov 3, 2010 at 2:53 PM, Mark Kristensson wrote: > I've successfully reproduced the issue in our lab with a copy from produc= tion and have broken the close() call into parts, as suggested, with one ad= dition. > > Previously, the call was simply > > =A0 =A0 =A0 =A0... > =A0 =A0 =A0 =A0} finally { > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0// Close > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0if (indexWriter !=3D null) { > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0try { > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0indexWrite= r.close(); > =A0 =A0 =A0 =A0... > > > Now, that is broken into the various parts, including a prepareCommit(); > > =A0 =A0 =A0 =A0... > =A0 =A0 =A0 =A0} finally { > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0// Close > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0if (indexWriter !=3D null) { > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0try { > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0indexWrite= r.prepareCommit(); > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0Logger.deb= ug("prepareCommit() complete"); > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0indexWrite= r.commit(); > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0Logger.deb= ug("commit() complete"); > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0indexWrite= r.maybeMerge(); > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0Logger.deb= ug("maybeMerge() complete"); > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0indexWrite= r.waitForMerges(); > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0Logger.deb= ug("waitForMerges() complete"); > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0indexWrite= r.close(); > =A0 =A0 =A0 =A0... > > > It turns out that the prepareCommit() is the slow call here, taking sever= al seconds to complete. > > I've done some reading about it, but have not found anything that might b= e helpful here. The fact that it is slow every single time, even when I'm a= dding exactly one document to the index, is perplexing and leads to me to t= hink something must be corrupt with the index. > > Furthermore, I tried optimizing the index to see if that would have any i= mpact (I wasn't expecting much) and it did not. > > I'm stumped at this point and am thinking I may have to rebuild the index= , though I would definitely prefer to avoid doing that and would like to kn= ow why this is happening. > > Thanks for your help, > Mark > > > On Nov 2, 2010, at 9:26 AM, Mark Kristensson wrote: > >> >> Wonderful information on what happens during indexWriter.close(), thank = you very much! I've got some testing to do as a result. >> >> We are on Lucene 3.0.0 right now. >> >> One other detail that I neglected to mention is that the batch size does= not seem to have any relation to the time it takes to close on the index w= here we are having issues. We've had batches add as few as 3 documents and = batches add as many as 2500 documents in the last hour and every single clo= se() call for that index takes 6 to 8 seconds. While I won't know until I a= m able to individually test the different pieces of the close() operation, = I'd be very surprised if a batch that adds just 3 new documents results in = very much merge work being done. It seems as if there is some task happenin= g during merge that the indexWriter is never able to successfully complete = and so it tries to complete that task every single time close() is called. >> >> So, my working theory until I can dig deeper is that something is mildly= corrupt with the index (though not serious enough to affect most operation= s on the index). Are there any good utilities for examining the health of a= n index? >> >> I've dabbled with the experimental checkIndex object in the past (before= we upgraded to 3.0), but have found it to be incredibly slow and of margin= al value. Does anyone have any experience using CheckIndex to track down an= issue with a production index? >> >> Thanks again! >> Mark >> >> On Nov 2, 2010, at 2:20 AM, Shai Erera wrote: >> >>> When you close IndexWriter, it performs several operations that might h= ave a >>> connection to the problem you describe: >>> >>> * Commit all the pending updates -- if your update batch size is more o= r >>> less the same (i.e., comparable # of docs and total # bytes indexed), t= hen >>> you should not see a performance difference between closes. >>> >>> * Consults the MergePolicy and runs any merges it returns as candidates= . >>> >>> * Waits for the merges to finish. >>> >>> Roughly, IndexWriter.close() can be substituted w/: >>> writer.commit(false); // commits the changes, but does not run merges. >>> writer.maybeMerge(); // runs merges returned by MergePolicy. >>> writer.waitForMerges(); // if you use ConcurrentMergeScheduler, the abo= ve >>> call returns immediately, not waiting for the merges to finish. >>> writer.close(); // at this point, commit + merging has finished, so it = does >>> very little. >>> >>> As your index grows in size, so does its # of segments, and the segment= s >>> size as well. So tweaking some parameters on the MergePolicy (such as >>> mergeFactor, maxMergeMB etc.) can result in not attempting to merge too >>> large segments. >>> >>> Alternatively, you can try the following: >>> >>> 1) Replace the call to writer.close() w/ the above sequence. Then, meas= ure >>> each call and report back which of them takes the suspicious amount of = time. >>> >>> 2) Not running optimize() on a regular basis doesn't mean merges don't >>> happen in the background. So if you want close() to return as fast as >>> possible, you should call close(false). Note though that from time to t= ime >>> you should allow merges to finish, in order to reduce the # of segments= . >>> >>> 3) If you want to control when the merges are run, you can open IndexWr= iter >>> with NoMergePolicy, which always returns 0 merges to perform, or >>> NoMergeScheduler which never executes merges. But be aware that this is >>> dangerous as the # of segments in the index will continue to grow and s= earch >>> performance will degrade. >>> >>> The answers above is relevant for 3x, but most of them are also relevan= t for >>> 2.9. If you have an older version of Lucene, then some of the solutions >>> might still apply (such as close(false)). >>> >>> Hope this helps, >>> Shai >>> >>> On Tue, Nov 2, 2010 at 12:55 AM, Mark Kristensson < >>> mark.kristensson@smartsheet.com> wrote: >>> >>>> Hello, >>>> >>>> One of our =A0Lucene indexes has started misbehaving on indexWriter.cl= ose and >>>> I'm searching for ideas about what may have happened and how to fix it= . >>>> >>>> Here's our scenario: >>>> - We have seven Lucene indexes that contain different sets of data fro= m a >>>> web application are indexed for searching by end users >>>> - A java service runs twice a minute to pull changes from SQL DB queue >>>> tables and update the relevant Lucene index(es) >>>> - The two largest indexes (3.4GB and 3.8GB in size with 8 million and = 6 >>>> million records, respectively) contain similar sets of data, but are >>>> structured differently for different consumption (e.g. one has an All = field >>>> for general purpose searching, the other does not; one has numeric fie= lds >>>> for ranges, the other does not, etc.) >>>> - We expunge deletes from our indexes twice per day >>>> - A =A0couple of weeks ago, one of the two large indexes became very s= low to >>>> close after each round of changes is applied by our indexing service. >>>> Specifically, all of our indexes usually close in no more than 200 >>>> milliseconds, but this one index is now taking 6 to 8 seconds to close= with >>>> every single pass (and is leading to delays which are affecting end us= ers). >>>> >>>> Questions from my attempts to troubleshoot the problem: >>>> - Has anyone else seen behavior similar to this? What did you do to re= solve >>>> it? >>>> - Does the size of an index or its documents (record count, disk size,= avg >>>> document size, max document size, etc.) have any correlation to the le= ngth >>>> of time it takes to close an index? >>>> - We are not currently optimizing any of our indexes on a regular basi= s, >>>> could that have any impact upon indexing operations (e.g. >>>> indexWriter.close())? My understanding is that optimization only affec= ts >>>> search performance, not indexing performance and to date we have not s= een >>>> any need to optimize based upon the performance of the search queries. >>>> >>>> Thanks, >>>> Mark >>>> --------------------------------------------------------------------- >>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org >>>> For additional commands, e-mail: java-user-help@lucene.apache.org >>>> >>>> >> > > --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org For additional commands, e-mail: java-user-help@lucene.apache.org