lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Michael McCandless (JIRA)" <j...@apache.org>
Subject [jira] Commented: (LUCENE-1652) Enhancements to Scorers following the changes to DocIdSetIterator
Date Sun, 24 May 2009 12:44:46 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-1652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12712549#action_12712549
] 

Michael McCandless commented on LUCENE-1652:
--------------------------------------------

I hate to say this (heaping yet more limitations on our back-compat
"constraints"), but... it makes me nervous making a runtime-only
semantic change to an API (DISI in this case), even in 3.0.

Likewise, the "doc() returns -1 before next/advance have been
called" would be a runtime only change.

If we did these, you could upgrade to 2.9, fix all deprecations, then
upgrade to 3.0, recompile just fine, and hit weird problems since
Lucene is suddenly expecting different behavior from your DISI.doc().

Such "semantics-only" changes invite subtle bugs.  I'd much prefer to
find a migration path that's based on static checking, ie you get
catastrophic compilation errors if you've failed to migrate.

If external code is iterating through a Lucene DISI, these
semantics-only changes are harmless, since we are only defining
behavior "outside" the bounds of what's currently defined.  But if
Lucene is interacting w/ an external DISI, then we are in trouble.

However, it's not clear to me what's the best way to make this
migration "catastrophic" ... maybe we add DISI.document(), with
the new semantics, and with a default impl in DISI that overlays our
new semantics?  (And deprecate doc()).  We could do this for 2.9.


> Enhancements to Scorers following the changes to DocIdSetIterator
> -----------------------------------------------------------------
>
>                 Key: LUCENE-1652
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1652
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>            Reporter: Shai Erera
>             Fix For: 3.0
>
>
> In LUCENE-1614, we changed the semantics of DocIdSetIterator's methods to return a sentinel
NO_MORE_DOCS (= Integer.MAX_VALUE) when the iterator has exhausted. Due to backward compatibility
issues, we couldn't implement that semantics in doc(). Therefore this issue, which can be
introduced in 3.0 only will:
> # Implement the new semantics in all extending classes, such that doc() will return NO_MORE_DOCS
when the iterator has exhausted.
> # Change BooleanScorer to take advantage of that by removing sub.done from SubScorer
and operate under the assumption that NO_MORE_DOCS is larger than any doc ID (Integer.MAX_VALUE).
> # Change ConjunctionScorer to operate under the same assumptions and remove 'more'.
> # Change ReqExclScorer to not rely on reqScorer in doc(), since the latter may be null.
> # Make more changes to ConjunctionScorer's init() and remove 'firstTime' to improve the
performance of nextDoc(), score(), advance().
> # Add start()/finish() to DISI?
> A snippet from LUCENE-1614 regarding the change in BooleanScorer
> {code}
> int doc = sub.done ? -1 : scorer.doc();
> while (!sub.done && doc < end) {
>   sub.collector.collect(doc);
>   doc = scorer.nextDoc();
>   sub.done = doc < 0;
> }
> {code}
> To this:
> {code}
> int doc = scorer.doc();
> while (doc < end) {
>   sub.collector.collect(doc);
>   doc = scorer.nextDoc();
> }
> {code}
> And in ConjunctionScorer, change this:
> {code}
> while (more && (firstScorer=scorers[first]).doc() < (lastDoc=lastScorer.doc()))
{
>   more = firstScorer.advance(lastDoc) >= 0;
>   lastScorer = firstScorer;
>   first = (first == (scorers.length-1)) ? 0 : first+1;
> }
> return more;
> {code}
> To this:
> {code}
> while ((firstScorer=scorers[first]).doc() < (lastDoc=lastScorer.doc())) {
>   firstScorer.advance(lastDoc);
>   lastScorer = firstScorer;
>   first = (first == (scorers.length-1)) ? 0 : first+1;
> }
> return lastDoc != DOC_SENTINEL;
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message