lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Yonik Seeley" <yo...@apache.org>
Subject Re: flushRamSegments possible perf improvement?
Date Thu, 19 Oct 2006 14:00:50 GMT
Interesting idea... that would extend to doing an optimize in one
swoop instead of in groups of mergeFactor segments.
There would be a certain amoun of increased memory usage.
I also don't know if there are any negative performance implications
of merging segments with sizes an order of magnitude apart.

It should be relatively easy to test different scenarios by
manipulating mergeFactor and maxBufferedDocs at the right time.

-Yonik

On 10/18/06, Doron Cohen <DORONC@il.ibm.com> wrote:
> Currently IndexWriter.flushRamSegments() always merge all ram segments to
> disk. Later it may merge more, depending on the maybe-merge algorithm. This
> happens at closing the index and when the number of (1 doc) (ram) segments
> exceeds max-buffered-docs.
>
> Can there be a performance penalty for always merging to disk first?
>
> Assume the following merges take place:
>   merging segments _ram_0 (1 docs) _ram_1 (1 docs) ... _ram_N (1_docs) into
> _a (N docs)
>   merging segments _6 (M docs) _7 (K docs) _8 (L docs) into _b (N+M+K+L
> docs)
>
> Alternatively, we could tell (compute) that this is going to happen, and
> have a single merge:
>   merging segments _ram_0 (1 docs) _ram_1 (1 docs) ... _ram_N (1_docs)
>                    _6 (M docs) _7 (K docs) _8 (L docs) into _b (N+M+K+L
> docs)
>
> This would save writing the segemnt of size N to disk and reading it again.
> For large enough N, Is there really potential save here?

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message