lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From kiwi clive <kiwi_cl...@yahoo.com>
Subject Re: A large number of files in an index (3.6)
Date Mon, 29 Oct 2012 20:23:10 GMT
Hi Lance,

File handles can be a problem but the instantaneous opening of a great many files at exactly
the same time give a big I/O hit during a query. This is compounded by many indexes on the
server than can get hit at the same time. Limiting the number of files per index directory
makes a difference.

Clive





________________________________
 From: Lance Norskog <goksron@gmail.com>
To: java-user@lucene.apache.org; kiwi clive <kiwi_clive@yahoo.com> 
Sent: Sunday, October 28, 2012 11:09 PM
Subject: Re: A large number of files in an index (3.6)
 
An option: instead of merging continuously as you run, you can optimize with 'maxSegments=10'.
This mean 'optimize but only until there are 10 segments'. If there are fewer than 10 segments,
nothing happens. This lets you schedule merging I/O.

Is the number of files a problem due to file space breakage?

----- Original Message -----
| From: "kiwi clive" <kiwi_clive@yahoo.com>
| To: java-user@lucene.apache.org
| Sent: Saturday, October 27, 2012 12:44:34 PM
| Subject: A large number of files in an index (3.6)
| 
| Hi guys,
| 
| I've recently moved from lucene 2.3 to 3.6. The application uses CF
| format. With lucene 2.3, I understood the interaction of merge
| factor etc with repect to how many files were created in the index
| directory. With a merge factor of 10, the number of files in the
| index directory could sometimes get up to 30, but you can see the
| merging happen and  the numeber of files would roll up after a while
| and settle around 10-15.
| 
| 
| With lucene 3.6, this is not the case. Firstly, even with MergePolicy
| set to useCFS, the index appears to be a hybrid of cfs and raw index
| format. I can understand that may have been done for performance
| reasons, but it does increase the file count considerably. Also the
| rollup of the merged segments is not occurring as it did on the
| previous version.  Originally I set the CFSRatio to 1.0 and found
| the behaviour similar to lucene2.3 (file number wise) but this came
| at a i/o cost and the machines ran with a higher load average. The
| higher i/o starts to affect query performance.  Reducing cfsRatio to
| 0.1 (default), helped reduce i/o load but I  am running several
| thousand concurrent indexes across many disks on the  servers and
| the larger number of files per index means a large number of files
| are being opened when a query hits the index, in addition to the
| indexing load.
| 
| I'm sure this is probably down to Merge policies and schedules, but
| there are quite a few knobs to tweak here so some guidance as to the
| the most beneficial parameters to tweak would be very helpful.
| 
| I'm using the LogByteSizeMergePolicy with 3 background merge threads.
| I'm considering using TieredMergePolicy and even reducing the number
| of merge threads, but there is not much point if it does not roll up
| the segments as expected. I can tweak with the cfsRatio but this
| strikes me a large hammer and there may be more subtle ways to do
| this !
| 
| So tell me I'm being stupid, just say 'derr- why dont you do
| this....' and I'll be a happy man!!
| 
| Thanks,
| Clive

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message