lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Michael Busch (JIRA)" <>
Subject [jira] Commented: (LUCENE-983) Enable IndexReader to merge tail segments on demand, in RAM, when opening
Date Fri, 17 Aug 2007 07:24:32 GMT


Michael Busch commented on LUCENE-983:

I like this idea. Merging small segments in memory is probably fast,
and only necessary during open()/reopen() and it will improve search 

Lucene-743 will become a bit more difficult. We'll have to keep a 
list of segments that are part of the merged index that is in the 
RAMDirectory. During reopen() we have to check if any of those 
segments changed. If yes, we have to empty the RAMDirectory and
load/merge the small segments again. Otherwise we just add new
segments to the RAMDirectory in case buffer size permits.

Hmm, we could even do more sophisticated things, e. g. if only the
deleted bits of a segment changed we could map them to the merged
RAM index, so we could avoid opening/merging the small segments
again during reopen(). But probably the small performance gain is 
not even worth the extra code complexity, as the segments should be 
quite small.

> Enable IndexReader to merge tail segments on demand, in RAM, when opening
> -------------------------------------------------------------------------
>                 Key: LUCENE-983
>                 URL:
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>            Priority: Minor
>             Fix For: 2.3
> Spinoff from LUCENE-845.
> In LUCENE-845, the IndexWriter must pay a high cost (O(N^2) merge
> cost) for keeping the number of segments "always small" in the case
> where flushes of very small segments (1 doc as worst case) happen
> frequently.  This happens in "low latency" applications.
> This is because IndexWriter must be ready "at every moment" for an
> IndexReader to open the index.
> But, if we allow IndexReader to use some RAM (give it a RAM buffer) to
> load the long tail of small segments into a RAMDirectory, and then
> merge them (in RAM), this allows IndexReader to still have good
> performance on the index without IndexWriter paying this high merge
> cost.  This effectively allows us to optimize the tail segments "on
> demand" when a reader needs to use them.
> When we combine this with LUCENE-743 (efficient "re-open" of a reader)
> then we should be able to efficiently handle low latency applications.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message