lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Shai Erera (JIRA)" <j...@apache.org>
Subject [jira] Commented: (LUCENE-1630) Mating Collector and Scorer on doc Id orderness
Date Wed, 17 Jun 2009 02:52:07 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-1630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12720477#action_12720477
] 

Shai Erera commented on LUCENE-1630:
------------------------------------

Thanks for the review Mike. Answers below. The comments which I did not answer will be fixed.

bq. Can we make Collector.supportsDocsOutOfOrder abstract? Defaulting to false isn't great
(I'd rather subclass think about the question).

In general, I tried to avoid it since that would require changing all core Collectors. There
aren't many, but still ...

This goes for QueryWeight.scoresOutOfOrder - wanted to avoid changing all core Weights to
impl the method w/ "return false". I actually think that many Weights/Scorers do score documents
in-order, hence the default impl.

bq. If a given Scorer.scoresOutOfOrder returns true, does that mean nextDoc is allowed to
return docs out of order?

When you deal with a Scorer which returns out-of-order, you can only call scorer.score(Collector).
If you're going to iterate, you're going to have to create a Scorer in-order, and that's what
IndexSearcher does. I'll spell it out clearly in the javadocs.

bq. Should scoresOutOfOrder() move from QueryWeight --> Scorer?

We've discussed it few posts up. When this information in in Scorer, I should first ask for
a Scorer, and only then I can create a Collector. If I'll use the Scorer immediately, then
that'll be ok. However, that's not the case in IndexSearcher, and results in a bug in Spatial,
and unless we want to uglify IndexSearcher code, it seemed that this can sit in QueryWeight.

But I do think it's a problematic method in QW too, since if it returns false by default,
I'll create a Collector which expects docs in-order, but then I'd lose the optimization in
BooleanWeight which may return an out-of-order superior Scorer. If I return true, I'll create
a Collector which expects out-of-order, and the Scorer (again, an example from BW) may be
actually in-order, and I've wasted unnecessary 'if doc > topDoc' cycles.

So I don't know what's better: make IndexSearcher code more complicated or sacrifice a potential
loss of this optimization?

bq. Actually I think the way to factor the static setting in is backwards? Shouldn't it be
scoreDocsInOrder |= !allowDocsOutOfOrder?

Yes, nice catch :)

bq. Shouldn't Searchable cutover to QueryWeight too? (We are keeping Searchable, but allowing
changes to it)

I wrote that above too - I don't think we can declare and execute right in 2.9 that Searchable
can be changed unexpectedly. So I added a NOTE to its javadocs and thought to do the change
post 2.9, when we remove Weight. We'd be forced to change these methods to QueryWeight, and
fix RemoteSearchable too. And it will be consistent w/ our back-compat policy (at least the
part where we declare on an upcoming change before it happens).

But if you think otherwise, I don't mind deprecating and adding new methods (I've got used
to it already, I almost do it blindly :) ).

I'll fix the other comments, and post a patch back after we resolve the remaining open issues.

> Mating Collector and Scorer on doc Id orderness
> -----------------------------------------------
>
>                 Key: LUCENE-1630
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1630
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>            Reporter: Shai Erera
>            Assignee: Michael McCandless
>             Fix For: 2.9
>
>         Attachments: LUCENE-1630.patch, LUCENE-1630.patch
>
>
> This is a spin off of LUCENE-1593. This issue proposes to expose appropriate API on Scorer
and Collector such that one can create an optimized Collector based on a given Scorer's doc-id
orderness and vice versa. Copied from LUCENE-1593, here is the list of changes:
> # Deprecate Weight and create QueryWeight (abstract class) with a new scorer(reader,
scoreDocsInOrder), replacing the current scorer(reader) method. QueryWeight implements Weight,
while score(reader) calls score(reader, false /* out-of-order */) and scorer(reader, scoreDocsInOrder)
is defined abstract.
> #* Also add QueryWeightWrapper to wrap a given Weight implementation. This one will also
be deprecated, as well as package-private.
> #* Add to Query variants of createWeight and weight which return QueryWeight. For now,
I prefer to add a default impl which wraps the Weight variant instead of overriding in all
Query extensions, and in 3.0 when we remove the Weight variants - override in all extending
classes.
> # Add to Scorer isOutOfOrder with a default to false, and override in BS to true.
> # Modify BooleanWeight to extend QueryWeight and implement the new scorer method to return
BS2 or BS based on the number of required scorers and setAllowOutOfOrder.
> # Add to Collector an abstract _acceptsDocsOutOfOrder_ which returns true/false.
> #* Use it in IndexSearcher.search methods, that accept a Collector, in order to create
the appropriate Scorer, using the new QueryWeight.
> #* Provide a static create method to TFC and TSDC which accept this as an argument and
creates the proper instance.
> #* Wherever we create a Collector (TSDC or TFC), always ask for out-of-order Scorer and
check on the resulting Scorer isOutOfOrder(), so that we can create the optimized Collector
instance.
> # Modify IndexSearcher to use all of the above logic.
> The only class I'm worried about, and would like to verify with you, is Searchable. If
we want to deprecate all the search methods on IndexSearcher, Searcher and Searchable which
accept Weight and add new ones which accept QueryWeight, we must do the following:
> * Deprecate Searchable in favor of Searcher.
> * Add to Searcher the new QueryWeight variants. Here we have two choices: (1) break back-compat
and add them as abstract (like we've done with the new Collector method) or (2) add them with
a default impl to call the Weight versions, documenting these will become abstract in 3.0.
> * Have Searcher extend UnicastRemoteObject and have RemoteSearchable extend Searcher.
That's the part I'm a little bit worried about - Searchable implements java.rmi.Remote, which
means there could be an implementation out there which implements Searchable and extends something
different than UnicastRemoteObject, like Activeable. I think there is very small chance this
has actually happened, but would like to confirm with you guys first.
> * Add a deprecated, package-private, SearchableWrapper which extends Searcher and delegates
all calls to the Searchable member.
> * Deprecate all uses of Searchable and add Searcher instead, defaulting the old ones
to use SearchableWrapper.
> * Make all the necessary changes to IndexSearcher, MultiSearcher etc. regarding overriding
these new methods.
> One other optimization that was discussed in LUCENE-1593 is to expose a topScorer() API
(on Weight) which returns a Scorer that its score(Collector) will be called, and additionally
add a start() method to DISI. That will allow Scorers to initialize either on start() or score(Collector).
This was proposed mainly because of BS and BS2 which check if they are initialized in every
call to next(), skipTo() and score(). Personally I prefer to see that in a separate issue,
following that one (as it might add methods to QueryWeight).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message