lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ning Li (JIRA)" <>
Subject [jira] Commented: (LUCENE-528) Optimization for IndexWriter.addIndexes()
Date Fri, 20 Oct 2006 07:01:38 GMT
    [ ] 
Ning Li commented on LUCENE-528:

We want a robust algorithm for the version of addIndexes() which
does not call optimize().

The robustness can be expressed as the two invariants guaranteed
by the merge policy for adding documents (if mergeFactor M does not
change and segment doc count is not reaching maxMergeDocs):
      B for maxBufferedDocs, f(n) defined as ceil(log_M(ceil(n/B)))
      1: If i (left*) and i+1 (right*) are two consecutive segments of doc
          counts x and y, then f(x) >= f(y).
      2: The number of committed segments on the same level (f(n)) <= M.

References are at,
LUCENE-565 and LUCENE-672.

AddIndexes() can be viewed as adding a sequence of segments S to
a sequence of segments T. Segments in T follow the invariants but
segments in S may not since they could come from multiple indexes.
Here is the merge algorithm for addIndexes():

1. Flush ram segments.

2. Consider a combined sequence with segments from T followed
    by segments from S (same as current addIndexes()).

3. Assume the highest level for segments in S is h. Call maybeMergeSegments(),
    but instead of starting w/ lowerBound = -1 and upperBound = maxBufferedDocs,
    start w/ lowerBound = -1 and upperBound = upperBound of level h.
    After this, the invariants are guaranteed except for the last < M segments
    whose levels <= h.

4. If the invariants hold for the last < M segments whose levels <= h, done.
    Otherwise, simply merge those segments. If the merge results in
    a segment of level <= h, done. Otherwise, it's of level h+1 and call
    maybeMergeSegments() starting w/ upperBound = upperBound of level h+1.


> Optimization for IndexWriter.addIndexes()
> -----------------------------------------
>                 Key: LUCENE-528
>                 URL:
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>            Reporter: Steven Tamm
>         Assigned To: Otis Gospodnetic
>            Priority: Minor
>         Attachments: AddIndexes.patch
> One big performance problem with IndexWriter.addIndexes() is that it has to optimize
the index both before and after adding the segments.  When you have a very large index, to
which you are adding batches of small updates, these calls to optimize make using addIndexes()
impossible.  It makes parallel updates very frustrating.
> Here is an optimized function that helps out by calling mergeSegments only on the newly
added documents.  It will try to avoid calling mergeSegments until the end, unless you're
adding a lot of documents at once.
> I also have an extensive unit test that verifies that this function works correctly if
people are interested.  I gave it a different name because it has very different performance
characteristics which can make querying take longer.

This message is automatically generated by JIRA.
If you think it was sent incorrectly contact one of the administrators:
For more information on JIRA, see:


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message