lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ning Li" <ning.li...@gmail.com>
Subject Re: flushRamSegments possible perf improvement?
Date Fri, 20 Oct 2006 01:48:03 GMT
There is, however, an opportunity of reducing number merges for disk segments.

Assume maxBufferedDocs is 10 and mergeFactor is 3. Assume the segment
sizes = 90, 30, 30, 10, 10. When a new disk segment of 10 is added,
two merges are triggered. First, 3 segments of size 10 are merged and
the segment sizes = 90, 30, 30, 30. Then 3 segments of size 30 are
merged and the segment sizes = 90, 90. In other words, the addition of
one disk segment triggers multiple merges on consecutive merge levels.
Let's call it cascade merges.

Alternatively, we could avoid cascade merges by doing one merge of
segments from consecutive levels that should be merged, in this case
(30, 30, 10, 10, 10).

There are some concerns:
1 Increase of memory usage. We could set a limit on number of segments
to be merged.
2 Is the intermediate state useful to any applications? In this case
segment sizes = 90, 30, 30, 30.
3 Occasionally when we think cascade merges will occur, they don't
because of deletes. We may over-merge a bit in that case.

Overall, if no applications really need the intermediate state of
cascade merges, this should be a good optimization to have. Opinions?

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message