lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael McCandless <>
Subject Re: MergePolicy public but SegmentInfos package protected?
Date Fri, 27 Mar 2009 19:59:09 GMT
On Fri, Mar 27, 2009 at 1:12 PM, Marvin Humphrey <> wrote:

>> Why must merge policy be made public for realtime search? [In Lucy]
> Because real-time search under Lucy needs to be able to operate using multiple
> write processes, since threads will not always be available.
> You need to be able to tell one indexer *not* to merge anything when
> performing fast updates, and you need to be able to tell another indexer what
> to merge when performing background consolidation.

Is this because you want to not swamp IO system?  Ie you're emulating
IO prioritization.  (Which I think makes sense, but, it's more of an
optimization than purely necessary for realtime search).

In the prototype near realtime search in Lucene (on LUCENE-1516), it's
fully independent of the merge policy (but, yes, a smarter merge
policy can reduce the turnaround times).

>> > Actually, if you're not warming sort caches, launching a Lucene IndexReader
>> > isn't obscenely expensive any more -- just expensive.  Right?
>> We load deleted docs on init (1 bit per doc = fast), terms index (=
>> alot of stuff every 128 terms = maybe slow), norms on the first search
>> that hits that field (1 byte per doc = probably OK), and FieldCache on
>> first search that uses it.  So "it depends" I guess?
> For the purposes of MergePolicy, all you would need are the doc counts and the
> delcounts, and optionally other stuff in SegmentInfos.  In theory you could
> lazy load the other stuff like the term dictionary index.  Obviously that
> would be an unacceptable behavioral change, but it's worth noting.



To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message