lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Marvin Humphrey <mar...@rectangular.com>
Subject Re: MergePolicy public but SegmentInfos package protected?
Date Fri, 27 Mar 2009 17:12:31 GMT
On Fri, Mar 27, 2009 at 12:39:05PM -0400, Michael McCandless wrote:

> Why must merge policy be made public for realtime search? [In Lucy]

Because real-time search under Lucy needs to be able to operate using multiple
write processes, since threads will not always be available.

You need to be able to tell one indexer *not* to merge anything when
performing fast updates, and you need to be able to tell another indexer what
to merge when performing background consolidation.

Looking down from a high level, what I think will work is to supply an
"IndexManager" argument to the indexer's constructor which controls all
merge-related behavior, and to provide FastUpdateManager and
BackgroundMergeManager classes which implement the desired policies.

> > Actually, if you're not warming sort caches, launching a Lucene IndexReader
> > isn't obscenely expensive any more -- just expensive.  Right?
> 
> We load deleted docs on init (1 bit per doc = fast), terms index (=
> alot of stuff every 128 terms = maybe slow), norms on the first search
> that hits that field (1 byte per doc = probably OK), and FieldCache on
> first search that uses it.  So "it depends" I guess?

For the purposes of MergePolicy, all you would need are the doc counts and the
delcounts, and optionally other stuff in SegmentInfos.  In theory you could
lazy load the other stuff like the term dictionary index.  Obviously that
would be an unacceptable behavioral change, but it's worth noting.

Marvin Humphrey

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message