lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bernhard Messer <>
Subject Re: possible SegmentMerger optimization
Date Sun, 08 Aug 2004 09:31:49 GMT

yeap, you're right Dmitry. Switch on/off compound file would be the 
trick to simulate the same behavior i described. I did some test on that 
and found that it working perfect. I think we can leave everything as it 
is, maybe we should document it somewhere.

Does there exists something like a "tips and tricks" section on the 
lucene website ?


Dmitry Serebrennikov wrote:

> Bernhard Messer wrote:
>> hi developers,
>> may be there is a small, but effective possibility to optimize the 
>> SegmentMerger class when compound file option is enabled, which is 
>> default since lucene 1.4.
>> The current implementation creates and writes the compound index file 
>> every time the merge() method is called. Due to the fact, that io 
>> operations are expensive and time consuming, it would be cool to 
>> write the compound index file just when optimizing the index. The 
>> change itself wouldn't be a big deal, adding a boolean parameter to 
>> SegmenMerger.merge(boolean finalize). Only if finalize==true and 
>> compound option is enabled, the compound file will be created. To 
>> fullfill the implementation, the same parameter could be added to 
>> mergeSegments(int minSegment, boolean finalize) within IndexWriter. 
>> When mergeSegments is called from flushRamSegments() or 
>> maybeMergeSegments(), finalize is set to false. Only when called from 
>> optimize(), finalize will be set to true and the compound file will 
>> be written.
>> The dark side will be to explain developers, if they are not 
>> optimizing the index before closing, compound file option has no 
>> effect. The other thing is, that we might run into the problem of too 
>> many open files, which sometimes was reported before the compound 
>> option was introduced.
> Yea, that was kind of the point of having the compound files - to 
> avoid too many file handles, especially during indexing. I hear you on 
> inefficient use of disk IO, though.
>> The negative side could be solved when making the optimization 
>> optionally available thru IndexWriter. So developers using lucene 
>> could decide themself if they want to use the "single compound write" 
>> option or not.
> One could do that today. Just setUseCompoundFiles(false) during 
> indexing and call setUseCompoundFiles(true) before the final optimize. 
> Would that do the trick?


>> If wanted and you would like to see the patch, leave me a note and 
>> i'll create it.
>> best regards
>> Bernhard
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail:
>> For additional commands, e-mail:
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message