Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm
Precedence: bulk
Reply-To: java-user@lucene.apache.org
Received-SPF: pass (nike.apache.org: domain of mark.kristensson@smartsheet.com
 designates 209.85.213.48 as permitted sender)
Content-Type: text/plain; charset=utf-8
Mime-Version: 1.0 (Apple Message framework v1081)
Subject: Re: IndexWriter.close() performance issue
From: Mark Kristensson <mark.kristensson@smartsheet.com>
In-Reply-To: <AANLkTi=N-4VbdG1qDRUH+0rMyOjSQ8f_GTZJMJ_=_s4B@mail.gmail.com>
Date: Wed, 3 Nov 2010 13:27:53 -0700
Content-Transfer-Encoding: quoted-printable
Message-Id: <1D4E4B59-1D3E-4CC2-B655-2729861E179B@smartsheet.com>
References: <B0BF640B-3986-42D6-B081-9C8B51B8E07E@smartsheet.com>
 <AANLkTi=JbCXFPSNucwnY5OK4fKTfQojk_5yt0==avpr8@mail.gmail.com>
 <79A8B638-16FD-486C-BE62-76CCAE58D620@smartsheet.com>
 <8ED783FE-BBE2-415A-A3B0-847D7500509E@smartsheet.com>
 <AANLkTi=N-4VbdG1qDRUH+0rMyOjSQ8f_GTZJMJ_=_s4B@mail.gmail.com>
To: java-user@lucene.apache.org


I've run checkIndex against the index and the results are below. That =
net is that it's telling me nothing is wrong with the index.

I did not have any instrumentation around the opening of the =
IndexSearcher (we don't use an IndexReader), just around the actual =
query execution so I had to add some additional logging. What I found =
surprised me, opening a search against this index takes the same 6 to 8 =
seconds that closing the indexWriter takes.

Thanks for your help,
Mark

-----------------------------------

Segments file=3Dsegments_9hir numSegments=3D4 version=3DFORMAT_DIAGNOSTICS=
 [Lucene 2.9]
  1 of 4: name=3D_aiaz docCount=3D5886773
    compound=3Dtrue
    hasProx=3Dtrue
    numFiles=3D2
    size (MB)=3D3,628.377
    diagnostics =3D {optimize=3Dtrue, mergeFactor=3D4, =
os.version=3D2.6.18-194.17.1.el5, os=3DLinux, mergeDocStores=3Dtrue, =
lucene.version=3D3.0.0 883080 - 2009-11-22 15:43:58, source=3Dmerge, =
os.arch=3Damd64, java.version=3D1.6.0_20, java.vendor=3DSun Microsystems =
Inc.}
    has deletions [delFileName=3D_aiaz_1.del]
    test: open reader.........OK [1 deleted docs]
    test: fields..............OK [296713 fields]
    test: field norms.........OK [296713 fields]
    test: terms, freq, prox...OK [43933037 terms; 669926342 terms/docs =
pairs; 669957608 tokens]
    test: stored fields.......OK [173363203 total field count; avg 29.45 =
fields per doc]
    test: term vectors........OK [0 total vector count; avg 0 term/freq =
vector fields per doc]

  2 of 4: name=3D_aib0 docCount=3D1
    compound=3Dtrue
    hasProx=3Dtrue
    numFiles=3D2
    size (MB)=3D0.002
    diagnostics =3D {os.version=3D2.6.18-194.17.1.el5, os=3DLinux, =
lucene.version=3D3.0.0 883080 - 2009-11-22 15:43:58, source=3Dflush, =
os.arch=3Damd64, java.version=3D1.6.0_20, java.vendor=3DSun Microsystems =
Inc.}
    has deletions [delFileName=3D_aib0_1.del]
    test: open reader.........OK [1 deleted docs]
    test: fields..............OK [28 fields]
    test: field norms.........OK [28 fields]
    test: terms, freq, prox...OK [79 terms; 79 terms/docs pairs; 0 =
tokens]
    test: stored fields.......OK [0 total field count; avg =EF=BF=BD =
fields per doc]
    test: term vectors........OK [0 total vector count; avg =EF=BF=BD =
term/freq vector fields per doc]

  3 of 4: name=3D_aib1 docCount=3D3
    compound=3Dtrue
    hasProx=3Dtrue
    numFiles=3D1
    size (MB)=3D0.003
    diagnostics =3D {os.version=3D2.6.18-194.17.1.el5, os=3DLinux, =
lucene.version=3D3.0.0 883080 - 2009-11-22 15:43:58, source=3Dflush, =
os.arch=3Damd64, java.version=3D1.6.0_20, java.vendor=3DSun Microsystems =
Inc.}
    no deletions
    test: open reader.........OK
    test: fields..............OK [28 fields]
    test: field norms.........OK [28 fields]
    test: terms, freq, prox...OK [109 terms; 237 terms/docs pairs; 237 =
tokens]
    test: stored fields.......OK [60 total field count; avg 20 fields =
per doc]
    test: term vectors........OK [0 total vector count; avg 0 term/freq =
vector fields per doc]

  4 of 4: name=3D_aib2 docCount=3D1
    compound=3Dtrue
    hasProx=3Dtrue
    numFiles=3D1
    size (MB)=3D0.002
    diagnostics =3D {os.version=3D2.6.18-194.17.1.el5, os=3DLinux, =
lucene.version=3D3.0.0 883080 - 2009-11-22 15:43:58, source=3Dflush, =
os.arch=3Damd64, java.version=3D1.6.0_20, java.vendor=3DSun Microsystems =
Inc.}
    no deletions
    test: open reader.........OK
    test: fields..............OK [28 fields]
    test: field norms.........OK [28 fields]
    test: terms, freq, prox...OK [79 terms; 79 terms/docs pairs; 79 =
tokens]
    test: stored fields.......OK [20 total field count; avg 20 fields =
per doc]
    test: term vectors........OK [0 total vector count; avg 0 term/freq =
vector fields per doc]

No problems were detected with this index.


On Nov 3, 2010, at 12:03 PM, Michael McCandless wrote:

> Can you run CheckIndex (command line tool) and post the output?
>=20
> How long does it take to open a reader on this same index, and perform
> a simple query (eg TermQuery)?
>=20
> Mike
>=20
> On Wed, Nov 3, 2010 at 2:53 PM, Mark Kristensson
> <mark.kristensson@smartsheet.com> wrote:
>> I've successfully reproduced the issue in our lab with a copy from =
production and have broken the close() call into parts, as suggested, =
with one addition.
>>=20
>> Previously, the call was simply
>>=20
>>        ...
>>        } finally {
>>                // Close
>>                if (indexWriter !=3D null) {
>>                        try {
>>                                indexWriter.close();
>>        ...
>>=20
>>=20
>> Now, that is broken into the various parts, including a =
prepareCommit();
>>=20
>>        ...
>>        } finally {
>>                // Close
>>                if (indexWriter !=3D null) {
>>                        try {
>>                                indexWriter.prepareCommit();
>>                                Logger.debug("prepareCommit() =
complete");
>>                                indexWriter.commit();
>>                                Logger.debug("commit() complete");
>>                                indexWriter.maybeMerge();
>>                                Logger.debug("maybeMerge() complete");
>>                                indexWriter.waitForMerges();
>>                                Logger.debug("waitForMerges() =
complete");
>>                                indexWriter.close();
>>        ...
>>=20
>>=20
>> It turns out that the prepareCommit() is the slow call here, taking =
several seconds to complete.
>>=20
>> I've done some reading about it, but have not found anything that =
might be helpful here. The fact that it is slow every single time, even =
when I'm adding exactly one document to the index, is perplexing and =
leads to me to think something must be corrupt with the index.
>>=20
>> Furthermore, I tried optimizing the index to see if that would have =
any impact (I wasn't expecting much) and it did not.
>>=20
>> I'm stumped at this point and am thinking I may have to rebuild the =
index, though I would definitely prefer to avoid doing that and would =
like to know why this is happening.
>>=20
>> Thanks for your help,
>> Mark
>>=20
>>=20
>> On Nov 2, 2010, at 9:26 AM, Mark Kristensson wrote:
>>=20
>>>=20
>>> Wonderful information on what happens during indexWriter.close(), =
thank you very much! I've got some testing to do as a result.
>>>=20
>>> We are on Lucene 3.0.0 right now.
>>>=20
>>> One other detail that I neglected to mention is that the batch size =
does not seem to have any relation to the time it takes to close on the =
index where we are having issues. We've had batches add as few as 3 =
documents and batches add as many as 2500 documents in the last hour and =
every single close() call for that index takes 6 to 8 seconds. While I =
won't know until I am able to individually test the different pieces of =
the close() operation, I'd be very surprised if a batch that adds just 3 =
new documents results in very much merge work being done. It seems as if =
there is some task happening during merge that the indexWriter is never =
able to successfully complete and so it tries to complete that task =
every single time close() is called.
>>>=20
>>> So, my working theory until I can dig deeper is that something is =
mildly corrupt with the index (though not serious enough to affect most =
operations on the index). Are there any good utilities for examining the =
health of an index?
>>>=20
>>> I've dabbled with the experimental checkIndex object in the past =
(before we upgraded to 3.0), but have found it to be incredibly slow and =
of marginal value. Does anyone have any experience using CheckIndex to =
track down an issue with a production index?
>>>=20
>>> Thanks again!
>>> Mark
>>>=20
>>> On Nov 2, 2010, at 2:20 AM, Shai Erera wrote:
>>>=20
>>>> When you close IndexWriter, it performs several operations that =
might have a
>>>> connection to the problem you describe:
>>>>=20
>>>> * Commit all the pending updates -- if your update batch size is =
more or
>>>> less the same (i.e., comparable # of docs and total # bytes =
indexed), then
>>>> you should not see a performance difference between closes.
>>>>=20
>>>> * Consults the MergePolicy and runs any merges it returns as =
candidates.
>>>>=20
>>>> * Waits for the merges to finish.
>>>>=20
>>>> Roughly, IndexWriter.close() can be substituted w/:
>>>> writer.commit(false); // commits the changes, but does not run =
merges.
>>>> writer.maybeMerge(); // runs merges returned by MergePolicy.
>>>> writer.waitForMerges(); // if you use ConcurrentMergeScheduler, the =
above
>>>> call returns immediately, not waiting for the merges to finish.
>>>> writer.close(); // at this point, commit + merging has finished, so =
it does
>>>> very little.
>>>>=20
>>>> As your index grows in size, so does its # of segments, and the =
segments
>>>> size as well. So tweaking some parameters on the MergePolicy (such =
as
>>>> mergeFactor, maxMergeMB etc.) can result in not attempting to merge =
too
>>>> large segments.
>>>>=20
>>>> Alternatively, you can try the following:
>>>>=20
>>>> 1) Replace the call to writer.close() w/ the above sequence. Then, =
measure
>>>> each call and report back which of them takes the suspicious amount =
of time.
>>>>=20
>>>> 2) Not running optimize() on a regular basis doesn't mean merges =
don't
>>>> happen in the background. So if you want close() to return as fast =
as
>>>> possible, you should call close(false). Note though that from time =
to time
>>>> you should allow merges to finish, in order to reduce the # of =
segments.
>>>>=20
>>>> 3) If you want to control when the merges are run, you can open =
IndexWriter
>>>> with NoMergePolicy, which always returns 0 merges to perform, or
>>>> NoMergeScheduler which never executes merges. But be aware that =
this is
>>>> dangerous as the # of segments in the index will continue to grow =
and search
>>>> performance will degrade.
>>>>=20
>>>> The answers above is relevant for 3x, but most of them are also =
relevant for
>>>> 2.9. If you have an older version of Lucene, then some of the =
solutions
>>>> might still apply (such as close(false)).
>>>>=20
>>>> Hope this helps,
>>>> Shai
>>>>=20
>>>> On Tue, Nov 2, 2010 at 12:55 AM, Mark Kristensson <
>>>> mark.kristensson@smartsheet.com> wrote:
>>>>=20
>>>>> Hello,
>>>>>=20
>>>>> One of our  Lucene indexes has started misbehaving on =
indexWriter.close and
>>>>> I'm searching for ideas about what may have happened and how to =
fix it.
>>>>>=20
>>>>> Here's our scenario:
>>>>> - We have seven Lucene indexes that contain different sets of data =
from a
>>>>> web application are indexed for searching by end users
>>>>> - A java service runs twice a minute to pull changes from SQL DB =
queue
>>>>> tables and update the relevant Lucene index(es)
>>>>> - The two largest indexes (3.4GB and 3.8GB in size with 8 million =
and 6
>>>>> million records, respectively) contain similar sets of data, but =
are
>>>>> structured differently for different consumption (e.g. one has an =
All field
>>>>> for general purpose searching, the other does not; one has numeric =
fields
>>>>> for ranges, the other does not, etc.)
>>>>> - We expunge deletes from our indexes twice per day
>>>>> - A  couple of weeks ago, one of the two large indexes became very =
slow to
>>>>> close after each round of changes is applied by our indexing =
service.
>>>>> Specifically, all of our indexes usually close in no more than 200
>>>>> milliseconds, but this one index is now taking 6 to 8 seconds to =
close with
>>>>> every single pass (and is leading to delays which are affecting =
end users).
>>>>>=20
>>>>> Questions from my attempts to troubleshoot the problem:
>>>>> - Has anyone else seen behavior similar to this? What did you do =
to resolve
>>>>> it?
>>>>> - Does the size of an index or its documents (record count, disk =
size, avg
>>>>> document size, max document size, etc.) have any correlation to =
the length
>>>>> of time it takes to close an index?
>>>>> - We are not currently optimizing any of our indexes on a regular =
basis,
>>>>> could that have any impact upon indexing operations (e.g.
>>>>> indexWriter.close())? My understanding is that optimization only =
affects
>>>>> search performance, not indexing performance and to date we have =
not seen
>>>>> any need to optimize based upon the performance of the search =
queries.
>>>>>=20
>>>>> Thanks,
>>>>> Mark
>>>>> =
---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>>>=20
>>>>>=20
>>>=20
>>=20
>>=20
>=20
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>=20


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org