lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Shai Erera (JIRA)" <j...@apache.org>
Subject [jira] Commented: (LUCENE-1614) Add next() and skipTo() variants to DocIdSetIterator that return the current doc, instead of boolean
Date Tue, 26 May 2009 20:59:45 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-1614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12713251#action_12713251
] 

Shai Erera commented on LUCENE-1614:
------------------------------------

I added testEmptyBucketWithMoreDocs in TestBooleanScorer which creates a BooleanScorer that
returns 3000 the first time nextDoc is called (or advance with target <= 3000) and then
NO_MORE_DOCS. It asserts that calling nextDoc() once returns 3000 and the second time returns
NO_MORE_DOCS.

Indeed, with the change I've made above it fails. So it's good that this test is there now.
It makes BS/BS2 one more testcase-proof. That means I cannot remove 'more' ...

bq. The tests for boolean queries are quite comprehensive nowadays. So when all tests pass,
don't worry too much.

Paul, I guess we should always worry :). Anyway, now we can say that sentence with a bit more
confidence :).

bq. Also, I wonder if this can be removed, by init'ing the sub-scorers up front.

How about if I call nextDoc() when addScorer is called? I tried it and all tests pass, which
at least means there is no test case that fails (see above comment). If I do this, I can also
check if the result is NO_MORE_DOCS and don't add it to the sub scorers list in the first
place.

What do you think?

BTW, funny thing just happened - turns out we deprecated all of DISIs methods, but not DISI
itself. This is actually good since it allows us to keep DISI and not deprecate Scorer and
all the public DISIs that extend DISI. And it saves us the effort of finding alternative names
... Just a thought.

Another thought of an optimization. Somewhere up this issue, we discussed adding start() to
DISI, just to get rid of firstTime in DisjunctionMaxScorer and ConjunctionScorer. At least
for DMS, I think I can remove add(Scorer) and add to its ctor an array of Scorer[]. DMS is
used by DMQ, and the number of scorers is knows in advance.
Also, DMS can be changed quite a bit, not using ArrayList but an array. ArrayList get/set
methods do range checks every time you call them, and we're calling them quite a lot.

> Add next() and skipTo() variants to DocIdSetIterator that return the current doc, instead
of boolean
> ----------------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-1614
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1614
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>            Reporter: Shai Erera
>             Fix For: 2.9
>
>         Attachments: LUCENE-1614.patch, LUCENE-1614.patch, LUCENE-1614.patch
>
>
> See http://www.nabble.com/Another-possible-optimization---now-in-DocIdSetIterator-p23223319.html
for the full discussion. The basic idea is to add variants to those two methods that return
the current doc they are at, to save successive calls to doc(). If there are no more docs,
return -1. A summary of what was discussed so far:
> # Deprecate those two methods.
> # Add nextDoc() and skipToDoc(int) that return doc, with default impl in DISI (calls
next() and skipTo() respectively, and will be changed to abstract in 3.0).
> #* I actually would like to propose an alternative to the names: advance() and advance(int)
- the first advances by one, the second advances to target.
> # Wherever these are used, do something like '(doc = advance()) >= 0' instead of comparing
to -1 for improved performance.
> I will post a patch shortly

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message