lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael McCandless <>
Subject Re: MergePolicy public but SegmentInfos package protected?
Date Fri, 27 Mar 2009 12:08:20 GMT
Actually, how will Lucy do this?

Even though Lucy's SegmentReader is lighter weight, it still seems
like you shouldn't be opening them in the writer (except for realtime
search)?  What's your plan?  Are you going to simply make the segment
metadata public?


On Thu, Mar 26, 2009 at 9:51 PM, Marvin Humphrey <> wrote:
> On Thu, Mar 26, 2009 at 07:06:26AM -0400, Michael McCandless wrote:
>> We'd need to add a few methods to IndexReader,
> Eep.  IndexReader's too big as it is.
>> eg querying whether
>> compound file format is in use, whether separate norms are stored,
>> "get me total size in bytes of all files" (or maybe just "get me all
>> files", plus utility method somewhere to add up the sizes), so this
>> approach seems doable.
> Do you really need all that?  I think the crucial info is already available:
>  * The number of docs in each segment.
>  * The number of deletions in each segment, allowing you to calculate the
>    deletion percentage.
> I think it's reasonable to assume an average distribution of document sizes
> across segments.  Sure, that'll be wrong at the long tail of the curve, but
> most of the time it will be right -- and even when it's not, it won't cause
> big problems.
>> But: we don't yet have IndexWriter holding open a reader for every
>> segment.  We are working on realtime search (LUCENE-1516), but even
>> then, if you don't ask for a realtime reader from IndexWriter, it
>> won't hold open SegmentReaders for all segments.
> Yeah, that's gonna be a bigger problem.  :(  It's cake to give Lucy's indexer
> a reader, because opening readers is cheap.  But the Lucene heavy-IndexReader
> model messes that up -- IndexWriter has traditionally been a fast class to
> open.
> Marvin Humphrey
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message