lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From kiwi clive <>
Subject A large number of files in an index (3.6)
Date Sat, 27 Oct 2012 19:44:34 GMT
Hi guys,

I've recently moved from lucene 2.3 to 3.6. The application uses CF format. With lucene 2.3,
I understood the interaction of merge factor etc with repect to how many files were created
in the index directory. With a merge factor of 10, the number of files in the index directory
could sometimes get up to 30, but you can see the merging happen and  the numeber of files
would roll up after a while and settle around 10-15. 

With lucene 3.6, this is not the case. Firstly, even with MergePolicy set to useCFS, the index
appears to be a hybrid of cfs and raw index format. I can understand that may have been done
for performance reasons, but it does increase the file count considerably. Also the rollup
of the merged segments is not occurring as it did on the previous version.  Originally I
set the CFSRatio to 1.0 and found the behaviour similar to lucene2.3 (file number wise) but
this came at a i/o cost and the machines ran with a higher load average. The higher i/o starts
to affect query performance.  Reducing cfsRatio to 0.1 (default), helped reduce i/o load
but I  am running several thousand concurrent indexes across many disks on the  servers
and the larger number of files per index means a large number of files are being opened when
a query hits the index, in addition to the indexing load.

I'm sure this is probably down to Merge policies and schedules, but there are quite a few
knobs to tweak here so some guidance as to the the most beneficial parameters to tweak would
be very helpful.

I'm using the LogByteSizeMergePolicy with 3 background merge threads. I'm considering using
TieredMergePolicy and even reducing the number of merge threads, but there is not much point
if it does not roll up the segments as expected. I can tweak with the cfsRatio but this strikes
me a large hammer and there may be more subtle ways to do this !

So tell me I'm being stupid, just say 'derr- why dont you do this....' and I'll be a happy

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message