lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Michael McCandless" <>
Subject Re: Realtime Search
Date Fri, 09 Jan 2009 13:59:51 GMT
Marvin Humphrey <> wrote:

> The goal is to improve worst-case write performance.
> ...
> In between the time when the background merge writer starts up and the time it finishes
consolidating segment data, we assume that the primary writer will have modified the index.
>   * New docs have been added in new segments.
>   * Tombstones have been added which suppress documents in segments which didn't even
exist when the background merge writer started up.
>   * Tombstones have been added which suppress documents in segments which existed when
the background merge writer started up, but were not merged.
>   * Tombstones have been added which suppress documents in segments which have just been
> Only the last category of deletions matters.
> At this point, the background merge writer aquires an exclusive write lock on the index.
It examines recently added tombstones, translates the document numbers and writes a tombstone
file against itself. Then it writes the snapshot file to commit its changes and releases the
write lock.

OK, now I understand KS's two-writer model.  Lucene has already solved
this with the ConcurrentMergeScheduler -- all segment merges are done
in the BG (by default).

We also have to compute the deletions against the new segment to
include deletions that happened to the merged segments after the merge
kicked off.

Still, it's not a panacea since often the IO system has horrible
degradation in performance while a merge is running.  If only we could
mark all IO (reads & writes) associated with merging as low priority
and have the OS actually do the right thing...

> It's true that we are decoupling the process of making logical changes to the index from
the process of internal consolidation. I probably wouldn't describe that as being done from
the reader's standpoint, though.

Right, we have a different problem in Lucene (because we must warm a
reader before using it): after a large merge, warming the new
IndexReader that includes that segment can be costly (though that cost
is going down with LUCENE-1483, and eventually column-stride fields).

But we can solve this by allowing a reopened reader to use the old
segments, until the new segment is warmed.


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message