lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael McCandless <luc...@mikemccandless.com>
Subject Re: Actual min and max-value of NumericField during codec flush
Date Mon, 17 Feb 2014 15:58:15 GMT
On Mon, Feb 17, 2014 at 8:33 AM, Ravikumar Govindarajan
<ravikumar.govindarajan@gmail.com> wrote:
>>
>> Well, this will change your scores?  MultiReader will sum up all term
>> statistics across all SegmentReaders "up front", and then scoring per
>> segment will use those top-level weights.
>
>
> Our app needs to do only matching and sorting. In-fact, it would be fully
> OK to by-pass scoring. But I feel scoring must be blazing fast, that there
> should be no gains of avoiding it. Can you please confirm if this is the
> case

You should avoid it if in fact you don't use it.  What are you sorting
by?  If you sort by field, and don't ask for scores, then scores won't
be computed.

> Which addIndexes method are you using?  The one taking Directory[]
>> does file-level copies, assigning sequential segment names (but this
>> is not guaranteed), and the one taking IndexReader[] merges all the
>> incoming indices into a single segment.
>
>
> I am planning to use the IndexReader[] to merge out-of-order segments,
> which makes it go easier on timestamp based merges

OK.

>> You may need to just impl a custom MergePolicy that sorts all segments in
>> the index by timestamp and picks the merge order accordingly...
>
>
> Yes, this is what I think I will do, with a SortingMP wrapper. I hope
> merges will work fine, after accumulating considerable data over a period
> of time.

OK good luck and have fun :)

Mike McCandless

http://blog.mikemccandless.com

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message