lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael McCandless <luc...@mikemccandless.com>
Subject Re: What does "out of order" mean?
Date Fri, 27 Nov 2009 13:49:07 GMT
On Fri, Nov 27, 2009 at 8:13 AM, Stefan Trcek <wzzelfzzel@abas.de> wrote:
> On Friday 27 November 2009 12:07:07 Michael McCandless wrote:
>> Re: What does "out of order" mean?
>>
>> It refers to the order in which the docIDs are delivered to your
>> Collector.
>>
>> "Normally" they are always delivered in increasing order.
>>
>> However, some queries (well, currently only certain BooleanQuery
>> cases) can achieve substantial search speedup if they are allowed to
>> deliver docIDs to your collector out of order.  In this case, docs
>> are processed in batches (chunks of 1024 docIDs at once), and within
>> a batch you may receive docIDs out of order.
>>
>> Many collectors don't mind getting docIDs out of order, and so it's
>> important to return "true" from your acceptDocsOutOfOrder method so
>> Lucene can allow BooleanQuery to run faster.
>
> Can this paragraph go to the docs?

OK I just committed this to the javadocs.  Thanks!

> May be I missed it, but I stumpled upon "out of order" and "in order"
> several times and wasn't sure what will be the consequence of the
> decision. Not even sure what will be the "don't care" case.
>
> I like "don't care" options like "Version.LUCENE_CURRENT" very much.
> It allows the library to do the best if I don't care.

Right, this is in general an important effect of the
Version.LUCENE_CURRENT option -- you give Lucene the freedom to 1) fix
bugs from past versions and 2) improve defaults for settings for
better out-of-the-box performance.

But if precise back compat is important to your app, so important that
you want newer versions of Lucene to emulate the bugs of past
releases, then you set the Version to a specific release (eg
Version.LUCENE_24).

For this particualr setting (in- vs out-of-order docIDs during
collection), Lucene's core collectors (that sort by relevance score,
and by field values) are carefully picked depending on whether the
query itself would like to score docIDs out of order.  We do this
because there's a small performance gain for these collectors if they
know the docIDs will arrive in order.

So the "don't care" equivalent here is to use IndexSearcher's normal
search APIs (ie, we don't use Version to switch this on or off).

Mike

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message