lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael McCandless <luc...@mikemccandless.com>
Subject Re: Actual min and max-value of NumericField during codec flush
Date Fri, 14 Feb 2014 20:23:53 GMT
On Fri, Feb 14, 2014 at 12:14 AM, Ravikumar Govindarajan
<ravikumar.govindarajan@gmail.com> wrote:

> Early-Query termination quits by throwing an Exception right?. Is it ok to
> individually search using SegmentReader and then break-off, instead of
> using a MultiReader, especially when the order is known before search
> begins?

Well, this will change your scores?  MultiReader will sum up all term
statistics across all SegmentReaders "up front", and then scoring per
segment will use those top-level weights.

> The reason why I insisted on a time-stamp based merging is because there is
> a possiblility of an out-of-order segment added via addIndex(...) call.
> That segment can be of any older time-stamp [month ago, year-ago etc...],
> albeit extremely rare. Should I worry about it during merges, or just
> handle overlaps during search

Which addIndexes method are you using?  The one taking Directory[]
does file-level copies, assigning sequential segment names (but this
is not guaranteed), and the one taking IndexReader[] merges all the
incoming indices into a single segment.

You may need to just impl a custom MergePolicy that sorts all segments
in the index by timestamp and picks the merge order accordingly...

Mike McCandless

http://blog.mikemccandless.com

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message