lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael McCandless <>
Subject Re: MergePolicy public but SegmentInfos package protected?
Date Fri, 27 Mar 2009 16:39:05 GMT
On Fri, Mar 27, 2009 at 12:13 PM, Marvin Humphrey
<> wrote:

>> Whereas in Lucene neither MultiSegmentReader nor SegmentReader is public.
> I had thought making SegmentReader public was at least under consideration as
> part of the implementation for segment-centric sorted search, but I guess it
> turned out not to be necessary.  Still, you have
> IndexReader.getSequentialSubReaders().  That might be enough -- at least for
> this part of the problem.  :)

Yes, enough for now I suppose.  Though we have LUCENE-831 up next
(fixing FieldCache API).

>> > As for the actual implementation of MergePolicy, I haven't prototyped that out
>> > yet.  Right now in KS, the infrastructure is reasonably primitive:
>> > IndexManager has a method called SegReaders_To_Merge() which accepts a
>> > PolyReader as an argument and returns an array of SegReaders representing
>> > content that should be merged.
>> KS does the fibonacci merge policy right?
> Yes.
> SegReaders_To_Merge is overridden in certain parts of the test suite, but it's
> not yet public.  However, control over merging policy will soon *have* to be
> made public somehow in order to support real-time indexing, so working out an
> API is on my near-term agenda.

Why must merge policy be made public for realtime search?

>> >> Even though Lucy's SegmentReader is lighter weight, it still seems
>> >> like you shouldn't be opening them in the writer (except for realtime
>> >> search)?
>> >
>> > I don't see why not.
>> But it still ties up resources?
> Not enough to worry about, I believe.

Hmm OK.

>> EG mmap uses up chunks of your address space (possibly important on 32 bit
>> machines,
> This is an important concern, but I believe that design-wise, we have a
> solution[1] -- on 32-bit systems, we only mmap sliding windows rather than
> whole files.


> Furthermore, mmap is called with the MAP_SHARED flag, so IndexReaders across
> multiple processes hitting the same exact memory segment get to share it.
> (This is more important under 64-bit systems, where we do map the whole file
> straightaway.)


>> opening files takes time & descriptors, etc.
> Launching an IndexReader is still plenty fast.
> Actually, if you're not warming sort caches, launching a Lucene IndexReader
> isn't obscenely expensive any more -- just expensive.  Right?

We load deleted docs on init (1 bit per doc = fast), terms index (=
alot of stuff every 128 terms = maybe slow), norms on the first search
that hits that field (1 byte per doc = probably OK), and FieldCache on
first search that uses it.  So "it depends" I guess?

> [1] At least on Unixen.  I believe we can support all of this using Windows
>    MapViewOfFile and friends, and I had a crude prototype working before, but
>    right now Windows is still using the old-school load-into-process-memory
>    style.



To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message