lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael McCandless <luc...@mikemccandless.com>
Subject Re: MergePolicy public but SegmentInfos package protected?
Date Fri, 27 Mar 2009 16:39:05 GMT
On Fri, Mar 27, 2009 at 12:13 PM, Marvin Humphrey
<marvin@rectangular.com> wrote:

>> Whereas in Lucene neither MultiSegmentReader nor SegmentReader is public.
>
> I had thought making SegmentReader public was at least under consideration as
> part of the implementation for segment-centric sorted search, but I guess it
> turned out not to be necessary.  Still, you have
> IndexReader.getSequentialSubReaders().  That might be enough -- at least for
> this part of the problem.  :)

Yes, enough for now I suppose.  Though we have LUCENE-831 up next
(fixing FieldCache API).

>> > As for the actual implementation of MergePolicy, I haven't prototyped that out
>> > yet.  Right now in KS, the infrastructure is reasonably primitive:
>> > IndexManager has a method called SegReaders_To_Merge() which accepts a
>> > PolyReader as an argument and returns an array of SegReaders representing
>> > content that should be merged.
>>
>> KS does the fibonacci merge policy right?
>
> Yes.
>
> SegReaders_To_Merge is overridden in certain parts of the test suite, but it's
> not yet public.  However, control over merging policy will soon *have* to be
> made public somehow in order to support real-time indexing, so working out an
> API is on my near-term agenda.

Why must merge policy be made public for realtime search?

>> >> Even though Lucy's SegmentReader is lighter weight, it still seems
>> >> like you shouldn't be opening them in the writer (except for realtime
>> >> search)?
>> >
>> > I don't see why not.
>>
>> But it still ties up resources?
>
> Not enough to worry about, I believe.

Hmm OK.

>> EG mmap uses up chunks of your address space (possibly important on 32 bit
>> machines,
>
> This is an important concern, but I believe that design-wise, we have a
> solution[1] -- on 32-bit systems, we only mmap sliding windows rather than
> whole files.

Nice!

> Furthermore, mmap is called with the MAP_SHARED flag, so IndexReaders across
> multiple processes hitting the same exact memory segment get to share it.
> (This is more important under 64-bit systems, where we do map the whole file
> straightaway.)

Great.

>> opening files takes time & descriptors, etc.
>
> Launching an IndexReader is still plenty fast.
>
> Actually, if you're not warming sort caches, launching a Lucene IndexReader
> isn't obscenely expensive any more -- just expensive.  Right?

We load deleted docs on init (1 bit per doc = fast), terms index (=
alot of stuff every 128 terms = maybe slow), norms on the first search
that hits that field (1 byte per doc = probably OK), and FieldCache on
first search that uses it.  So "it depends" I guess?

> [1] At least on Unixen.  I believe we can support all of this using Windows
>    MapViewOfFile and friends, and I had a crude prototype working before, but
>    right now Windows is still using the old-school load-into-process-memory
>    style.

Excellent!

Mike

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message