Return-Path: Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: (qmail 2334 invoked from network); 3 Nov 2010 20:28:07 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 3 Nov 2010 20:28:07 -0000 Received: (qmail 71451 invoked by uid 500); 3 Nov 2010 20:28:36 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 71385 invoked by uid 500); 3 Nov 2010 20:28:36 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 71372 invoked by uid 99); 3 Nov 2010 20:28:36 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 03 Nov 2010 20:28:35 +0000 X-ASF-Spam-Status: No, hits=-0.0 required=10.0 tests=RCVD_IN_DNSWL_NONE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of mark.kristensson@smartsheet.com designates 209.85.213.48 as permitted sender) Received: from [209.85.213.48] (HELO mail-yw0-f48.google.com) (209.85.213.48) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 03 Nov 2010 20:28:27 +0000 Received: by ywp4 with SMTP id 4so908233ywp.35 for ; Wed, 03 Nov 2010 13:28:06 -0700 (PDT) Received: by 10.100.47.1 with SMTP id u1mr2406816anu.113.1288816086147; Wed, 03 Nov 2010 13:28:06 -0700 (PDT) Received: from [10.0.2.104] (75-145-24-62-Washington.hfc.comcastbusiness.net [75.145.24.62]) by mx.google.com with ESMTPS id g29sm5777613anh.36.2010.11.03.13.27.57 (version=TLSv1/SSLv3 cipher=RC4-MD5); Wed, 03 Nov 2010 13:28:00 -0700 (PDT) Content-Type: text/plain; charset=utf-8 Mime-Version: 1.0 (Apple Message framework v1081) Subject: Re: IndexWriter.close() performance issue From: Mark Kristensson In-Reply-To: Date: Wed, 3 Nov 2010 13:27:53 -0700 Content-Transfer-Encoding: quoted-printable Message-Id: <1D4E4B59-1D3E-4CC2-B655-2729861E179B@smartsheet.com> References: <79A8B638-16FD-486C-BE62-76CCAE58D620@smartsheet.com> <8ED783FE-BBE2-415A-A3B0-847D7500509E@smartsheet.com> To: java-user@lucene.apache.org X-Mailer: Apple Mail (2.1081) X-Virus-Checked: Checked by ClamAV on apache.org I've run checkIndex against the index and the results are below. That = net is that it's telling me nothing is wrong with the index. I did not have any instrumentation around the opening of the = IndexSearcher (we don't use an IndexReader), just around the actual = query execution so I had to add some additional logging. What I found = surprised me, opening a search against this index takes the same 6 to 8 = seconds that closing the indexWriter takes. Thanks for your help, Mark ----------------------------------- Segments file=3Dsegments_9hir numSegments=3D4 version=3DFORMAT_DIAGNOSTICS= [Lucene 2.9] 1 of 4: name=3D_aiaz docCount=3D5886773 compound=3Dtrue hasProx=3Dtrue numFiles=3D2 size (MB)=3D3,628.377 diagnostics =3D {optimize=3Dtrue, mergeFactor=3D4, = os.version=3D2.6.18-194.17.1.el5, os=3DLinux, mergeDocStores=3Dtrue, = lucene.version=3D3.0.0 883080 - 2009-11-22 15:43:58, source=3Dmerge, = os.arch=3Damd64, java.version=3D1.6.0_20, java.vendor=3DSun Microsystems = Inc.} has deletions [delFileName=3D_aiaz_1.del] test: open reader.........OK [1 deleted docs] test: fields..............OK [296713 fields] test: field norms.........OK [296713 fields] test: terms, freq, prox...OK [43933037 terms; 669926342 terms/docs = pairs; 669957608 tokens] test: stored fields.......OK [173363203 total field count; avg 29.45 = fields per doc] test: term vectors........OK [0 total vector count; avg 0 term/freq = vector fields per doc] 2 of 4: name=3D_aib0 docCount=3D1 compound=3Dtrue hasProx=3Dtrue numFiles=3D2 size (MB)=3D0.002 diagnostics =3D {os.version=3D2.6.18-194.17.1.el5, os=3DLinux, = lucene.version=3D3.0.0 883080 - 2009-11-22 15:43:58, source=3Dflush, = os.arch=3Damd64, java.version=3D1.6.0_20, java.vendor=3DSun Microsystems = Inc.} has deletions [delFileName=3D_aib0_1.del] test: open reader.........OK [1 deleted docs] test: fields..............OK [28 fields] test: field norms.........OK [28 fields] test: terms, freq, prox...OK [79 terms; 79 terms/docs pairs; 0 = tokens] test: stored fields.......OK [0 total field count; avg =EF=BF=BD = fields per doc] test: term vectors........OK [0 total vector count; avg =EF=BF=BD = term/freq vector fields per doc] 3 of 4: name=3D_aib1 docCount=3D3 compound=3Dtrue hasProx=3Dtrue numFiles=3D1 size (MB)=3D0.003 diagnostics =3D {os.version=3D2.6.18-194.17.1.el5, os=3DLinux, = lucene.version=3D3.0.0 883080 - 2009-11-22 15:43:58, source=3Dflush, = os.arch=3Damd64, java.version=3D1.6.0_20, java.vendor=3DSun Microsystems = Inc.} no deletions test: open reader.........OK test: fields..............OK [28 fields] test: field norms.........OK [28 fields] test: terms, freq, prox...OK [109 terms; 237 terms/docs pairs; 237 = tokens] test: stored fields.......OK [60 total field count; avg 20 fields = per doc] test: term vectors........OK [0 total vector count; avg 0 term/freq = vector fields per doc] 4 of 4: name=3D_aib2 docCount=3D1 compound=3Dtrue hasProx=3Dtrue numFiles=3D1 size (MB)=3D0.002 diagnostics =3D {os.version=3D2.6.18-194.17.1.el5, os=3DLinux, = lucene.version=3D3.0.0 883080 - 2009-11-22 15:43:58, source=3Dflush, = os.arch=3Damd64, java.version=3D1.6.0_20, java.vendor=3DSun Microsystems = Inc.} no deletions test: open reader.........OK test: fields..............OK [28 fields] test: field norms.........OK [28 fields] test: terms, freq, prox...OK [79 terms; 79 terms/docs pairs; 79 = tokens] test: stored fields.......OK [20 total field count; avg 20 fields = per doc] test: term vectors........OK [0 total vector count; avg 0 term/freq = vector fields per doc] No problems were detected with this index. On Nov 3, 2010, at 12:03 PM, Michael McCandless wrote: > Can you run CheckIndex (command line tool) and post the output? >=20 > How long does it take to open a reader on this same index, and perform > a simple query (eg TermQuery)? >=20 > Mike >=20 > On Wed, Nov 3, 2010 at 2:53 PM, Mark Kristensson > wrote: >> I've successfully reproduced the issue in our lab with a copy from = production and have broken the close() call into parts, as suggested, = with one addition. >>=20 >> Previously, the call was simply >>=20 >> ... >> } finally { >> // Close >> if (indexWriter !=3D null) { >> try { >> indexWriter.close(); >> ... >>=20 >>=20 >> Now, that is broken into the various parts, including a = prepareCommit(); >>=20 >> ... >> } finally { >> // Close >> if (indexWriter !=3D null) { >> try { >> indexWriter.prepareCommit(); >> Logger.debug("prepareCommit() = complete"); >> indexWriter.commit(); >> Logger.debug("commit() complete"); >> indexWriter.maybeMerge(); >> Logger.debug("maybeMerge() complete"); >> indexWriter.waitForMerges(); >> Logger.debug("waitForMerges() = complete"); >> indexWriter.close(); >> ... >>=20 >>=20 >> It turns out that the prepareCommit() is the slow call here, taking = several seconds to complete. >>=20 >> I've done some reading about it, but have not found anything that = might be helpful here. The fact that it is slow every single time, even = when I'm adding exactly one document to the index, is perplexing and = leads to me to think something must be corrupt with the index. >>=20 >> Furthermore, I tried optimizing the index to see if that would have = any impact (I wasn't expecting much) and it did not. >>=20 >> I'm stumped at this point and am thinking I may have to rebuild the = index, though I would definitely prefer to avoid doing that and would = like to know why this is happening. >>=20 >> Thanks for your help, >> Mark >>=20 >>=20 >> On Nov 2, 2010, at 9:26 AM, Mark Kristensson wrote: >>=20 >>>=20 >>> Wonderful information on what happens during indexWriter.close(), = thank you very much! I've got some testing to do as a result. >>>=20 >>> We are on Lucene 3.0.0 right now. >>>=20 >>> One other detail that I neglected to mention is that the batch size = does not seem to have any relation to the time it takes to close on the = index where we are having issues. We've had batches add as few as 3 = documents and batches add as many as 2500 documents in the last hour and = every single close() call for that index takes 6 to 8 seconds. While I = won't know until I am able to individually test the different pieces of = the close() operation, I'd be very surprised if a batch that adds just 3 = new documents results in very much merge work being done. It seems as if = there is some task happening during merge that the indexWriter is never = able to successfully complete and so it tries to complete that task = every single time close() is called. >>>=20 >>> So, my working theory until I can dig deeper is that something is = mildly corrupt with the index (though not serious enough to affect most = operations on the index). Are there any good utilities for examining the = health of an index? >>>=20 >>> I've dabbled with the experimental checkIndex object in the past = (before we upgraded to 3.0), but have found it to be incredibly slow and = of marginal value. Does anyone have any experience using CheckIndex to = track down an issue with a production index? >>>=20 >>> Thanks again! >>> Mark >>>=20 >>> On Nov 2, 2010, at 2:20 AM, Shai Erera wrote: >>>=20 >>>> When you close IndexWriter, it performs several operations that = might have a >>>> connection to the problem you describe: >>>>=20 >>>> * Commit all the pending updates -- if your update batch size is = more or >>>> less the same (i.e., comparable # of docs and total # bytes = indexed), then >>>> you should not see a performance difference between closes. >>>>=20 >>>> * Consults the MergePolicy and runs any merges it returns as = candidates. >>>>=20 >>>> * Waits for the merges to finish. >>>>=20 >>>> Roughly, IndexWriter.close() can be substituted w/: >>>> writer.commit(false); // commits the changes, but does not run = merges. >>>> writer.maybeMerge(); // runs merges returned by MergePolicy. >>>> writer.waitForMerges(); // if you use ConcurrentMergeScheduler, the = above >>>> call returns immediately, not waiting for the merges to finish. >>>> writer.close(); // at this point, commit + merging has finished, so = it does >>>> very little. >>>>=20 >>>> As your index grows in size, so does its # of segments, and the = segments >>>> size as well. So tweaking some parameters on the MergePolicy (such = as >>>> mergeFactor, maxMergeMB etc.) can result in not attempting to merge = too >>>> large segments. >>>>=20 >>>> Alternatively, you can try the following: >>>>=20 >>>> 1) Replace the call to writer.close() w/ the above sequence. Then, = measure >>>> each call and report back which of them takes the suspicious amount = of time. >>>>=20 >>>> 2) Not running optimize() on a regular basis doesn't mean merges = don't >>>> happen in the background. So if you want close() to return as fast = as >>>> possible, you should call close(false). Note though that from time = to time >>>> you should allow merges to finish, in order to reduce the # of = segments. >>>>=20 >>>> 3) If you want to control when the merges are run, you can open = IndexWriter >>>> with NoMergePolicy, which always returns 0 merges to perform, or >>>> NoMergeScheduler which never executes merges. But be aware that = this is >>>> dangerous as the # of segments in the index will continue to grow = and search >>>> performance will degrade. >>>>=20 >>>> The answers above is relevant for 3x, but most of them are also = relevant for >>>> 2.9. If you have an older version of Lucene, then some of the = solutions >>>> might still apply (such as close(false)). >>>>=20 >>>> Hope this helps, >>>> Shai >>>>=20 >>>> On Tue, Nov 2, 2010 at 12:55 AM, Mark Kristensson < >>>> mark.kristensson@smartsheet.com> wrote: >>>>=20 >>>>> Hello, >>>>>=20 >>>>> One of our Lucene indexes has started misbehaving on = indexWriter.close and >>>>> I'm searching for ideas about what may have happened and how to = fix it. >>>>>=20 >>>>> Here's our scenario: >>>>> - We have seven Lucene indexes that contain different sets of data = from a >>>>> web application are indexed for searching by end users >>>>> - A java service runs twice a minute to pull changes from SQL DB = queue >>>>> tables and update the relevant Lucene index(es) >>>>> - The two largest indexes (3.4GB and 3.8GB in size with 8 million = and 6 >>>>> million records, respectively) contain similar sets of data, but = are >>>>> structured differently for different consumption (e.g. one has an = All field >>>>> for general purpose searching, the other does not; one has numeric = fields >>>>> for ranges, the other does not, etc.) >>>>> - We expunge deletes from our indexes twice per day >>>>> - A couple of weeks ago, one of the two large indexes became very = slow to >>>>> close after each round of changes is applied by our indexing = service. >>>>> Specifically, all of our indexes usually close in no more than 200 >>>>> milliseconds, but this one index is now taking 6 to 8 seconds to = close with >>>>> every single pass (and is leading to delays which are affecting = end users). >>>>>=20 >>>>> Questions from my attempts to troubleshoot the problem: >>>>> - Has anyone else seen behavior similar to this? What did you do = to resolve >>>>> it? >>>>> - Does the size of an index or its documents (record count, disk = size, avg >>>>> document size, max document size, etc.) have any correlation to = the length >>>>> of time it takes to close an index? >>>>> - We are not currently optimizing any of our indexes on a regular = basis, >>>>> could that have any impact upon indexing operations (e.g. >>>>> indexWriter.close())? My understanding is that optimization only = affects >>>>> search performance, not indexing performance and to date we have = not seen >>>>> any need to optimize based upon the performance of the search = queries. >>>>>=20 >>>>> Thanks, >>>>> Mark >>>>> = --------------------------------------------------------------------- >>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org >>>>> For additional commands, e-mail: java-user-help@lucene.apache.org >>>>>=20 >>>>>=20 >>>=20 >>=20 >>=20 >=20 > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org > For additional commands, e-mail: java-user-help@lucene.apache.org >=20 --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org For additional commands, e-mail: java-user-help@lucene.apache.org