lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dmitry Serebrennikov <dmit...@earthlink.net>
Subject Re: possible SegmentMerger optimization
Date Sat, 07 Aug 2004 20:10:26 GMT
Bernhard Messer wrote:

> hi developers,
>
> may be there is a small, but effective possibility to optimize the 
> SegmentMerger class when compound file option is enabled, which is 
> default since lucene 1.4.
>
> The current implementation creates and writes the compound index file 
> every time the merge() method is called. Due to the fact, that io 
> operations are expensive and time consuming, it would be cool to write 
> the compound index file just when optimizing the index. The change 
> itself wouldn't be a big deal, adding a boolean parameter to 
> SegmenMerger.merge(boolean finalize). Only if finalize==true and 
> compound option is enabled, the compound file will be created. To 
> fullfill the implementation, the same parameter could be added to 
> mergeSegments(int minSegment, boolean finalize) within IndexWriter. 
> When mergeSegments is called from flushRamSegments() or 
> maybeMergeSegments(), finalize is set to false. Only when called from 
> optimize(), finalize will be set to true and the compound file will be 
> written.
>
> The dark side will be to explain developers, if they are not 
> optimizing the index before closing, compound file option has no 
> effect. The other thing is, that we might run into the problem of too 
> many open files, which sometimes was reported before the compound 
> option was introduced.

Yea, that was kind of the point of having the compound files - to avoid 
too many file handles, especially during indexing. I hear you on 
inefficient use of disk IO, though.

>
> The negative side could be solved when making the optimization 
> optionally available thru IndexWriter. So developers using lucene 
> could decide themself if they want to use the "single compound write" 
> option or not.

One could do that today. Just setUseCompoundFiles(false) during indexing 
and call setUseCompoundFiles(true) before the final optimize. Would that 
do the trick?

Dmitry.

>
> If wanted and you would like to see the patch, leave me a note and 
> i'll create it.
>
> best regards
> Bernhard
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-dev-help@jakarta.apache.org
>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org


Mime
View raw message