lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Shai Erera (JIRA)" <j...@apache.org>
Subject [jira] Commented: (LUCENE-1614) Add next() and skipTo() variants to DocIdSetIterator that return the current doc, instead of boolean
Date Mon, 01 Jun 2009 12:00:07 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-1614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12715070#action_12715070
] 

Shai Erera commented on LUCENE-1614:
------------------------------------

bq. I think we should require that nextDoc/advance not be called again once NO_MORE_DOCS has
already been returned?

So you mean add something like this to the javadocs "after NO_MORE_DOCS was returned, you
should not call this method again, or it may result in unpredicted behavior"?

bq. Some things already seem to require this (eg DocIdBitSet's DISI).

You mean that this will remove the d == -1 check?

bq. I think the new code for applying a filter during searching (~line 282 of IndexSearcher)
...

So you mean to change the last line here:

{code}
if (scorerDoc == filterDoc) { // permitted by filter
        collector.collect(scorerDoc);
        if ((filterDoc = filterDocIdIterator.nextDoc()) == DocIdSetIterator.NO_MORE_DOCS)
break;
{code}

to

{code}
if ((filterDoc = filterDocIdIterator.nextDoc()) == DocIdSetIterator.NO_MORE_DOCS) {
  break;
} else if ((scorerDoc = scorer.advance(filterDoc)) == DocIdSetIterator.NO_MORE_DOCS) {
  break;
}
{code}

I don't see what it will give us since after the loop wraps around, we check anyway if filterDoc
> scoreDoc.

But .. I think the while loop can be changed to:

{code}
int filterDoc = filterDocIdIterator.nextDoc();
int scorerDoc = scorer.advance(filterDoc);
if (filterDoc == DocIdSetIterator.NO_MORE_DOCS
    || scorerDoc == DocIdSetIterator.NO_MORE_DOCS) {
  return;
}

collector.setScorer(scorer);
while (true) {
  // scorerDoc >= filterDoc
  if (scorerDoc == filterDoc) { // permitted by filter
    collector.collect(scorerDoc);
    if ((filterDoc = filterDocIdIterator.nextDoc()) == DocIdSetIterator.NO_MORE_DOCS) break;
  } else if ((filterDoc = filterDocIdIterator.advance(scorerDoc)) == DocIdSetIterator.NO_MORE_DOCS)
break;

  // The above code may have moved filterDoc beyond scorerDoc, so advance scorerDoc
  if ((scorerDoc = scorer.advance(filterDoc)) == DocIdSetIterator.NO_MORE_DOCS) break;
}
{code}

What do you think?

Also Mike - the patch you posted is 152KB, while my last patch is 201KB. It's hard to compare
our patches since the order of the classes is different, so until I apply the patch and check
it, I wanted to make sure you included all the changes in the patch. (comparing our previous
2 patches from May 27 - they were of same size, so the last one is a bit fishy).

> Add next() and skipTo() variants to DocIdSetIterator that return the current doc, instead
of boolean
> ----------------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-1614
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1614
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>            Reporter: Shai Erera
>             Fix For: 2.9
>
>         Attachments: LUCENE-1614.patch, LUCENE-1614.patch, LUCENE-1614.patch, LUCENE-1614.patch,
LUCENE-1614.patch, LUCENE-1614.patch, LUCENE-1614.patch, LUCENE-1614.patch, LUCENE-1614.patch,
LUCENE-1614.patch
>
>
> See http://www.nabble.com/Another-possible-optimization---now-in-DocIdSetIterator-p23223319.html
for the full discussion. The basic idea is to add variants to those two methods that return
the current doc they are at, to save successive calls to doc(). If there are no more docs,
return -1. A summary of what was discussed so far:
> # Deprecate those two methods.
> # Add nextDoc() and skipToDoc(int) that return doc, with default impl in DISI (calls
next() and skipTo() respectively, and will be changed to abstract in 3.0).
> #* I actually would like to propose an alternative to the names: advance() and advance(int)
- the first advances by one, the second advances to target.
> # Wherever these are used, do something like '(doc = advance()) >= 0' instead of comparing
to -1 for improved performance.
> I will post a patch shortly

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message