lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael McCandless <luc...@mikemccandless.com>
Subject Re: NRT + static rank based sorting
Date Mon, 15 Jul 2013 11:43:58 GMT
Also, it's in general not good to check for IR reopen on every search
request: this could be way too often if you suddenly hit high search
load, and if it's a big reopen (a large segment merge just completed)
you slow down that one unlucky search too much; it's better to have a
background thread that does so periodically.

The SearcherManager class simplifies this for you ...

Mike McCandless

http://blog.mikemccandless.com


On Fri, Jul 12, 2013 at 3:55 PM, Sriram Sankar <sankar@gmail.com> wrote:
> Thanks!
>
>
> On Tue, Jul 9, 2013 at 2:13 PM, Adrien Grand <jpountz@gmail.com> wrote:
>
>> Hi Sriram,
>>
>> On Tue, Jul 9, 2013 at 5:06 AM, Sriram Sankar <sankar@gmail.com> wrote:
>> > I've finally got something running and will send you some performance
>> > numbers as promised shortly.  In the meanwhile, I've a question regarding
>> > the use of real time indexing along with ordering by static rank.  Before
>> > each search, I do the reopen as follows:
>> >
>> >     public void refresh() throws IOException {
>> > DirectoryReader r = DirectoryReader.openIfChanged(reader);
>> > if (r != null) {
>> >     reader.close();
>> >     reader = r;
>> >     this.live = SortingAtomicReader.wrap(
>> >                 new SlowCompositeReaderWrapper(reader),
>> > new StaticRankSorter());
>> > }
>> >     }
>> >
>> > This works fine.  However, I believe the index is resorted everytime I
>> > reopen the index.  Ideally, it would be nice to do the sort more
>> > incrementally each time a new document gets added.  I assume that this is
>> > not easy - but just in case you have ideas, I'd like to hear them.
>>
>> I think a good trade-off could be to fully collect the small segments
>> that come from incremental updates. Since they are small, collecting
>> them will be fast anyway. One the opposite, the bottleneck is likely
>> the collection of large segments. This is why we chose to tackle the
>> problem of online sorting using a merge policy (SortingMergePolicy).
>> Segments are only sorted when merging, meaning that small NRT
>> (flushed) segments won't be sorted but large (merged) segments will
>> be.
>>
>> Then computing the top hits is just a matter of computing the best
>> hits on every segment and merging them into a single hit list:
>>  - for flushed segments, you need to fully collect them like Lucene
>> does by default,
>>  - for sorted segments, you can early-terminate collection on a
>> per-segment basis when enough matchs have been collected.
>>
>> --
>> Adrien
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message