lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Scott Smith <ssm...@mainstreamdata.com>
Subject RE: Lucene slow performance -- still broke
Date Wed, 20 Mar 2013 21:48:37 GMT
First, I decided I wasn't comfortable doing closes on the IndexReader.  So, I did what I hope
is better.  I create a singleton SearcherManager (out-of-the-box from the 4.1 release) and
do acquire/releases.  I assume that's more or less equivalent anyway.

Second, it doesn't really matter as I am still seeing the same slow searches.  I'm becoming
convinced that the problem is in the indexer (see below for why). 

 So, briefly, there are two parts to my use of lucene (all running on Windows).  The first
part is a windows service that does the indexing.  It reads a directory which has new items
to be indexed.  The indexing it does is totally serialized (meaning there are not multiple
threads).  It completely indexes one document before it moves onto the next.  Even at that,
I'm averaging about 14 ms per document on a fairly old machine.  Each document is an xml file
and averages about 4k bytes.

The searching happens in a tomcat web server.  Obviously, there may be multiple simultaneous
searches.

Here's what I did today.  I did a full reindex (all the documents are in directories which
I can walk on the local hard drive).  There were roughly 600k documents.  The reindex is a
separate program which simply does the reindex and quits.  It opens the index, indexes all
of the files (no commits), does a forceMerge, and then closes the writer (which I assume forces
a commit).  Neither the web server nor the index service were running while the reindex was
going on (i.e., I don't think there was anything touching the index other than the reindex
program itself).    The last thing the indexer does before closing the index is do a forceMerge(2).
 Here's what the index directory looked like after the reindex completed (the value in parentheses
is the total bytes for those files).

61 CFE (17.7KB)
61 CFS (2.09GB****)
61 si (16.9KB)
42 DEL (23.1KB
10 FDT (32.2 MB
10 FDX(12.8KB)
10 FRM 11.1KB
10 pos (157MB)
10 tim ( 28.7MB)
10 tip (582KB)
10 tvd (254kb)
10 tvf (232 MB)
10 tvx (2MB)
10 doc (62.5MB)
1 segment_1 (2KB)
1 segments.gen (1KB)

So, 377 files for a total 2.6GB and most of it in the CFS files.

I then restarted the windows service.  Since then (about 2 hours), there are now 82 CFS files.
 51 of them range from 29.8 to 51.2 MB each (2.09GB total).

So, I'm pretty convinced the issue is in the indexing since I still haven't done any searching
yet.

The index writer is initialized as follows:
            FSDirectory dir = FSDirectory.open(new File(indexDirectory));
            IndexWriterConfig iwc = new IndexWriterConfig(Constants.LUCENE_VERSION, 
                                                          oAnalyzer);
            
            LogByteSizeMergePolicy lbsm = new LogByteSizeMergePolicy();
            lbsm.setMaxMergeDocs(10);
            lbsm.setUseCompoundFile(true);
            iwc.setMergePolicy(lbsm);
            
            _oWriter = new IndexWriter(dir, iwc);

But I also notice that I added the following.  The intent was to have the writer flush the
buffer when it had indexed enough documents to reach 50MB (an arbitrary number I picked out
of the air because it felt right :-) ).  It seems odd to me that the maximum size of the CFS
files is also about 50 MB.  So, I'm wondering if this affects the writer's ability to merge
files.  

        // don't flush based on number of documents
        // flush based on buffer size
        _oWriter.getConfig().setMaxBufferedDocs(IndexWriterConfig.DISABLE_AUTO_FLUSH)
                          .setRAMBufferSizeMB(50.0);

Any help in figuring out what is causing this problem would be appreciated.  I do now have
an offline system that I can play with so I can do some intrusive things if need be.

Scott




-----Original Message-----
From: Scott Smith [mailto:ssmith@mainstreamdata.com] 
Sent: Saturday, March 16, 2013 1:28 PM
To: java-user@lucene.apache.org
Subject: RE: Lucene slow performance

Thanks for the help.

The reindex was done this morning and searches now take less than a second.

I will make the change to the code.

Cheers

Scott

-----Original Message-----
From: Uwe Schindler [mailto:uwe@thetaphi.de]
Sent: Friday, March 15, 2013 11:17 PM
To: java-user@lucene.apache.org
Subject: RE: Lucene slow performance

Please forceMerge only one time not every time (only to clean up your index)! If you are doing
a reindex already, just fix your close logic as discussed before. 



Scott Smith <ssmith@mainstreamdata.com> schrieb:

>Unfortunately, this is a production system which I can't touch (though 
>I was able to get a full reindex scheduled for tomorrow morning).
>
>Are you suggesting that I do:
>
>writer.forceMerge(1);
>writer.close();
>
>instead of just doing the close()?
>
>-----Original Message-----
>From: Simon Willnauer [mailto:simon.willnauer@gmail.com]
>Sent: Friday, March 15, 2013 5:08 PM
>To: java-user@lucene.apache.org
>Subject: Re: Lucene slow performance
>
>On Sat, Mar 16, 2013 at 12:02 AM, Scott Smith 
><ssmith@mainstreamdata.com> wrote:
>> " Do you always close IndexWriter after adding few documents and when
>closing, disable "wait for merge"? In that case, all merges are 
>interrupted and the merge policy never has a chance to merge at all 
>(because you are opening and closing IndexWriter all the time with 
>cancelling all merges)?"
>>
>> Frankly I don't quite understand what this means.  When I "close" the
>indexwriter, I simply call close().  Is that the wrong thing?
>that should be fine...
>
>this sounds very odd though, do you see file that get actually removed 
>/ merged if you call IndexWriter#forceMerge(1)
>
>simon
>>
>> Thanks
>>
>> Scott
>>
>> -----Original Message-----
>> From: Uwe Schindler [mailto:uwe@thetaphi.de]
>> Sent: Friday, March 15, 2013 4:49 PM
>> To: java-user@lucene.apache.org
>> Subject: RE: Lucene slow performance
>>
>> Hi,
>>
>> with standard configuartion, this cannot happen. What merge policy do
>you use? This looks to me like a misconfigured merge policy or using 
>the NoMergePolicy. With 3,000 segments, it will be slow, the question 
>is, why do you get those?
>>
>> Another thing could be: Do you always close IndexWriter after adding
>few documents and when closing, disable "wait for merge"? In that case, 
>all merges are interrupted and the merge policy never has a chance to 
>merge at all (because you are opening and closing IndexWriter all the 
>time with cancelling all merges)?
>>
>> Uwe
>>
>> -----
>> Uwe Schindler
>> H.-H.-Meier-Allee 63, D-28213 Bremen
>> http://www.thetaphi.de
>> eMail: uwe@thetaphi.de
>>
>>> -----Original Message-----
>>> From: Scott Smith [mailto:ssmith@mainstreamdata.com]
>>> Sent: Friday, March 15, 2013 11:15 PM
>>> To: java-user@lucene.apache.org
>>> Subject: Lucene slow performance
>>>
>>> We have a system that is using lucene and the searches are very
>slow.
>>> The number of documents is fairly small (less than 30,000) and each 
>>> document is typically only 2 to 10 kilo-characters.  Yet, searches
>are taking 15-16 seconds.
>>>
>>> One of the things I noticed was that the index directory has several
>
>>> thousand
>>> (3000+) .cfs files.  We do optimize the index once per day.  This is
>
>>> a system that probably gets several thousand document deletes and 
>>> additions per day (spread out across the day).
>>>
>>> Any thoughts.  We didn't really notice this until we went to 4.x.
>>>
>>> Scott
>>>
>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>
>---------------------------------------------------------------------
>To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
>---------------------------------------------------------------------
>To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>For additional commands, e-mail: java-user-help@lucene.apache.org

--
Uwe Schindler
H.-H.-Meier-Allee 63, 28213 Bremen
http://www.thetaphi.de
B KKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKCB  [  X  ܚX KK[XZ[
  ] K]\ \ ][  X  ܚX PX [ K \X K ܙ B  ܈Y][ۘ[  [X[  K[XZ[
  ] K]\ \ Z[X [ K \X K ܙ B B
Mime
View raw message