lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From robert engels <reng...@ix.netcom.com>
Subject Re: Concurrent merge
Date Tue, 20 Feb 2007 19:30:34 GMT
What about a queue of segments to merge. The add document will add  
segments to the queue, if the queue contains too many segments it  
blocks.

Another thread reads the segments from the queue and merges them.

This would effectively block adding of documents some times, but that  
is not different than what happens now.

On Feb 20, 2007, at 1:22 PM, Ning Li wrote:

> I think it's possible for another version of IndexWriter to have
> a concurrent merge thread so that disk segments could be merged
> while documents are being added or deleted.
>
> This would be beneficial not only because it will improve indexing
> performance when there are enough system resources, but more
> importantly, disk segment merges will no longer block document
> additions or deletions.
>
> I'd like to get feedback on this idea and after we agree on a best
> design I can submit a full patch.
>
> I have an initial implementation based on an earlier version of
> Lucene (but with deletes via IndexWriter). The basic idea is to
> separate a merge process into three steps:
>  1 select disk segments to merge
>  2 merge selected segments into one segment
>  3 apply document deletions committed during the merge if any
>    and replace selected segments with the result segment
> The merge process is carried out in the merge thread. Steps 1 and
> 3 are executed in the critical section, but step 2, in which most
> time is spent, is not.
>
> There are three main challenges in enabling concurrent merge:
>  1 a robust merge policy
>  2 detect when merge lags document additions/deletions
>  3 how to slow down document additions/deletions (and amortize
>    the cost) when merge falls behind
>
> Because new disk segments (flushed from ram) can continue to be
> produced while a disk merge is going on, it is difficult to hold
> the two invariants guaranteed by the current IndexWriter. Thus it
> is important and challenging to detect when merge starts to lag
> behind and to slow down document additions/deletions properly.
>
> Several merge strategies are possible. In the initial implementation,
> I adopted one similar to the merge policy in current IndexWriter.
> Two limits on the total number of disk segments are used to detect
> merge's lag and to slow down document additions/deletions: a soft
> limit and a hard limit. When the number of disk segments reaches
> the soft limit, a document addition/deletion will be slowed down
> for time T. As the number of disk segments continues to grow, the
> time for which an addition/deletion is slowed down will increase.
> When the number of disk segments reaches the hard limit, document
> additions/deletions will be blocked until the number falls under
> the hard limit. The hard limit is almost never reached with proper
> slow down.
>
> Other ideas are most welcome!
>
> I also experimented with a concurrent flush thread, which flushes
> ram segments into a disk segment, and multiple disk merge threads.
> The flush thread provides limited additional benefit when the ram
> size (buffered documents) is not too big. And multiple disk merge
> threads require significant system resources to add benefit.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message