lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Lance Norskog <>
Subject Re: A large number of files in an index (3.6)
Date Sun, 28 Oct 2012 23:09:07 GMT
An option: instead of merging continuously as you run, you can optimize with 'maxSegments=10'.
This mean 'optimize but only until there are 10 segments'. If there are fewer than 10 segments,
nothing happens. This lets you schedule merging I/O.

Is the number of files a problem due to file space breakage?

----- Original Message -----
| From: "kiwi clive" <>
| To:
| Sent: Saturday, October 27, 2012 12:44:34 PM
| Subject: A large number of files in an index (3.6)
| Hi guys,
| I've recently moved from lucene 2.3 to 3.6. The application uses CF
| format. With lucene 2.3, I understood the interaction of merge
| factor etc with repect to how many files were created in the index
| directory. With a merge factor of 10, the number of files in the
| index directory could sometimes get up to 30, but you can see the
| merging happen and  the numeber of files would roll up after a while
| and settle around 10-15.
| With lucene 3.6, this is not the case. Firstly, even with MergePolicy
| set to useCFS, the index appears to be a hybrid of cfs and raw index
| format. I can understand that may have been done for performance
| reasons, but it does increase the file count considerably. Also the
| rollup of the merged segments is not occurring as it did on the
| previous version.  Originally I set the CFSRatio to 1.0 and found
| the behaviour similar to lucene2.3 (file number wise) but this came
| at a i/o cost and the machines ran with a higher load average. The
| higher i/o starts to affect query performance.  Reducing cfsRatio to
| 0.1 (default), helped reduce i/o load but I  am running several
| thousand concurrent indexes across many disks on the  servers and
| the larger number of files per index means a large number of files
| are being opened when a query hits the index, in addition to the
| indexing load.
| I'm sure this is probably down to Merge policies and schedules, but
| there are quite a few knobs to tweak here so some guidance as to the
| the most beneficial parameters to tweak would be very helpful.
| I'm using the LogByteSizeMergePolicy with 3 background merge threads.
| I'm considering using TieredMergePolicy and even reducing the number
| of merge threads, but there is not much point if it does not roll up
| the segments as expected. I can tweak with the cfsRatio but this
| strikes me a large hammer and there may be more subtle ways to do
| this !
| So tell me I'm being stupid, just say 'derr- why dont you do
| this....' and I'll be a happy man!!
| Thanks,
| Clive

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message