Return-Path: Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: (qmail 84555 invoked from network); 2 Nov 2010 09:20:48 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 2 Nov 2010 09:20:48 -0000 Received: (qmail 40805 invoked by uid 500); 2 Nov 2010 09:21:17 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 40609 invoked by uid 500); 2 Nov 2010 09:21:14 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 40601 invoked by uid 99); 2 Nov 2010 09:21:13 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 02 Nov 2010 09:21:13 +0000 X-ASF-Spam-Status: No, hits=2.2 required=10.0 tests=FREEMAIL_FROM,HTML_MESSAGE,RCVD_IN_DNSWL_NONE,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of serera@gmail.com designates 209.85.160.48 as permitted sender) Received: from [209.85.160.48] (HELO mail-pw0-f48.google.com) (209.85.160.48) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 02 Nov 2010 09:21:07 +0000 Received: by pwj9 with SMTP id 9so1654884pwj.35 for ; Tue, 02 Nov 2010 02:20:47 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:received:in-reply-to :references:date:message-id:subject:from:to:content-type; bh=c6klyREIM+7zOHnGo97O4SPxPyxztDIoBny/kNtlxzc=; b=BCPk2eQeFGDibf5SBFYeU4fkRp/P+LQmrPphuP2c6lb0YI0HFaU8sU9eepQLoj9qej 6jYTHZF+6/JUzIi8gKkGFxazp2vPn3SnjRINoJtOJ3dpsA09h73YYYZvnRwBaj6aSzBi v2Pe0Cs0vNlHrCDDJ1S+r5ofdYS+gIv0mc0kk= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; b=qxLDHwUt+KnFjpF0N6mKW3JBaTUATr8y2j0nJMTxrQW3OtgrE9o9hBXfVK4Csnz9yw oRHTNdwtUYpEFKZDjg8JApPkYL6xxw9WRxIto2bD30v93Spl8ALN1wk5iWZ1lbSgons9 XFs7s4Bwnd8/j8YlbofGaqq9WPXSHRB3s6wko= MIME-Version: 1.0 Received: by 10.142.211.21 with SMTP id j21mr4454201wfg.123.1288689647750; Tue, 02 Nov 2010 02:20:47 -0700 (PDT) Received: by 10.142.200.1 with HTTP; Tue, 2 Nov 2010 02:20:47 -0700 (PDT) In-Reply-To: References: Date: Tue, 2 Nov 2010 11:20:47 +0200 Message-ID: Subject: Re: IndexWriter.close() performance issue From: Shai Erera To: java-user@lucene.apache.org Content-Type: multipart/alternative; boundary=000e0cd2bf1aae11ff04940e7006 --000e0cd2bf1aae11ff04940e7006 Content-Type: text/plain; charset=ISO-8859-1 When you close IndexWriter, it performs several operations that might have a connection to the problem you describe: * Commit all the pending updates -- if your update batch size is more or less the same (i.e., comparable # of docs and total # bytes indexed), then you should not see a performance difference between closes. * Consults the MergePolicy and runs any merges it returns as candidates. * Waits for the merges to finish. Roughly, IndexWriter.close() can be substituted w/: writer.commit(false); // commits the changes, but does not run merges. writer.maybeMerge(); // runs merges returned by MergePolicy. writer.waitForMerges(); // if you use ConcurrentMergeScheduler, the above call returns immediately, not waiting for the merges to finish. writer.close(); // at this point, commit + merging has finished, so it does very little. As your index grows in size, so does its # of segments, and the segments size as well. So tweaking some parameters on the MergePolicy (such as mergeFactor, maxMergeMB etc.) can result in not attempting to merge too large segments. Alternatively, you can try the following: 1) Replace the call to writer.close() w/ the above sequence. Then, measure each call and report back which of them takes the suspicious amount of time. 2) Not running optimize() on a regular basis doesn't mean merges don't happen in the background. So if you want close() to return as fast as possible, you should call close(false). Note though that from time to time you should allow merges to finish, in order to reduce the # of segments. 3) If you want to control when the merges are run, you can open IndexWriter with NoMergePolicy, which always returns 0 merges to perform, or NoMergeScheduler which never executes merges. But be aware that this is dangerous as the # of segments in the index will continue to grow and search performance will degrade. The answers above is relevant for 3x, but most of them are also relevant for 2.9. If you have an older version of Lucene, then some of the solutions might still apply (such as close(false)). Hope this helps, Shai On Tue, Nov 2, 2010 at 12:55 AM, Mark Kristensson < mark.kristensson@smartsheet.com> wrote: > Hello, > > One of our Lucene indexes has started misbehaving on indexWriter.close and > I'm searching for ideas about what may have happened and how to fix it. > > Here's our scenario: > - We have seven Lucene indexes that contain different sets of data from a > web application are indexed for searching by end users > - A java service runs twice a minute to pull changes from SQL DB queue > tables and update the relevant Lucene index(es) > - The two largest indexes (3.4GB and 3.8GB in size with 8 million and 6 > million records, respectively) contain similar sets of data, but are > structured differently for different consumption (e.g. one has an All field > for general purpose searching, the other does not; one has numeric fields > for ranges, the other does not, etc.) > - We expunge deletes from our indexes twice per day > - A couple of weeks ago, one of the two large indexes became very slow to > close after each round of changes is applied by our indexing service. > Specifically, all of our indexes usually close in no more than 200 > milliseconds, but this one index is now taking 6 to 8 seconds to close with > every single pass (and is leading to delays which are affecting end users). > > Questions from my attempts to troubleshoot the problem: > - Has anyone else seen behavior similar to this? What did you do to resolve > it? > - Does the size of an index or its documents (record count, disk size, avg > document size, max document size, etc.) have any correlation to the length > of time it takes to close an index? > - We are not currently optimizing any of our indexes on a regular basis, > could that have any impact upon indexing operations (e.g. > indexWriter.close())? My understanding is that optimization only affects > search performance, not indexing performance and to date we have not seen > any need to optimize based upon the performance of the search queries. > > Thanks, > Mark > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org > For additional commands, e-mail: java-user-help@lucene.apache.org > > --000e0cd2bf1aae11ff04940e7006--