lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Michael McCandless (JIRA)" <j...@apache.org>
Subject [jira] Commented: (LUCENE-1652) Enhancements to Scorers following the changes to DocIdSetIterator
Date Mon, 25 May 2009 16:28:45 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-1652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12712745#action_12712745
] 

Michael McCandless commented on LUCENE-1652:
--------------------------------------------

bq. I'm not sure about it. In 3.0, we'll make nextDoc() abstract (for sure, since the default
impl calls next()) and probably advance() also. So when you upgrade to 2.9, you can switch
to calling nextDoc() and advance(), but if you implemented DISI, you won't be required to
implement nextDoc() and/or advance(), so when you upgrade to 3.0 your code won't compile.

You're right -- on making nextDoc & advance abstract in 3.0, your code
won't compile on upgrading to 3.0 and you'd have to go fix any custom
DISIs you have.  But: if we leave doc() as is, you wouldn't be forced
to do anything on that.  You just implement nextDoc/advance and think
you're done...

bq. When upgrading, I think we should assume (or even require) users reading CHANGES. When
they notice that DISI has changed and that they need to implement two new methods, they should
also notice the change in semantics of doc().

Relying only on this (seeing CHANGES.txt) is what makes me nervous.

bq. I take it that by "catastrophic" you mean that you're ok with people upgrading to 3.0
and don't compile, since that will force them to read CHANGES or javadocs and understand what
they are now supposed to implement. Therefore if document() documents the new semantics, it
is ok for us to rely on that, and if something fails, it's the user's problem.

Right that's what I mean by "catastrophic" (note: Marvin used it
first, but I like it ;) ) But: I want the catastrophe specifically to
apply to doc() as well, so that you are forced to make that a new
method.  Ie, I'm hoping that the extra step of having a newly named
method is enough to get you to go and understand that we subtly
changed its semantics.

bq. If we add document() (note the longer method name, compared to doc()) we can implement
it following the new semantics and take advantage of that in 2.9 already (I think?).

Exactly, another benefit of this approach (besides bringing
catastrophe) is that we can do all of this in 2.9, including taking
advantage of the new semantics.  Which is great.

bq. If this indeed should work, where should I do it - in this issue (I need 1614 to be committed
first) or in 1614?

I think do this as another iteration of the patch on LUCENE-1614?


> Enhancements to Scorers following the changes to DocIdSetIterator
> -----------------------------------------------------------------
>
>                 Key: LUCENE-1652
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1652
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>            Reporter: Shai Erera
>             Fix For: 3.0
>
>
> In LUCENE-1614, we changed the semantics of DocIdSetIterator's methods to return a sentinel
NO_MORE_DOCS (= Integer.MAX_VALUE) when the iterator has exhausted. Due to backward compatibility
issues, we couldn't implement that semantics in doc(). Therefore this issue, which can be
introduced in 3.0 only will:
> # Implement the new semantics in all extending classes, such that doc() will return NO_MORE_DOCS
when the iterator has exhausted.
> # Change BooleanScorer to take advantage of that by removing sub.done from SubScorer
and operate under the assumption that NO_MORE_DOCS is larger than any doc ID (Integer.MAX_VALUE).
> # Change ConjunctionScorer to operate under the same assumptions and remove 'more'.
> # Change ReqExclScorer to not rely on reqScorer in doc(), since the latter may be null.
> # Make more changes to ConjunctionScorer's init() and remove 'firstTime' to improve the
performance of nextDoc(), score(), advance().
> # Add start()/finish() to DISI?
> A snippet from LUCENE-1614 regarding the change in BooleanScorer
> {code}
> int doc = sub.done ? -1 : scorer.doc();
> while (!sub.done && doc < end) {
>   sub.collector.collect(doc);
>   doc = scorer.nextDoc();
>   sub.done = doc < 0;
> }
> {code}
> To this:
> {code}
> int doc = scorer.doc();
> while (doc < end) {
>   sub.collector.collect(doc);
>   doc = scorer.nextDoc();
> }
> {code}
> And in ConjunctionScorer, change this:
> {code}
> while (more && (firstScorer=scorers[first]).doc() < (lastDoc=lastScorer.doc()))
{
>   more = firstScorer.advance(lastDoc) >= 0;
>   lastScorer = firstScorer;
>   first = (first == (scorers.length-1)) ? 0 : first+1;
> }
> return more;
> {code}
> To this:
> {code}
> while ((firstScorer=scorers[first]).doc() < (lastDoc=lastScorer.doc())) {
>   firstScorer.advance(lastDoc);
>   lastScorer = firstScorer;
>   first = (first == (scorers.length-1)) ? 0 : first+1;
> }
> return lastDoc != DOC_SENTINEL;
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message