lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Shai Erera (JIRA)" <j...@apache.org>
Subject [jira] Commented: (LUCENE-1614) Add next() and skipTo() variants to DocIdSetIterator that return the current doc, instead of boolean
Date Wed, 27 May 2009 20:41:46 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-1614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12713723#action_12713723
] 

Shai Erera commented on LUCENE-1614:
------------------------------------

Regarding the nocommit in ConjunctionScorer's ctor - I think the problem with moving doNext()
to the end of the ctor is the order of the scorers that will be iterated on. I.e., currently
the scorers are first advanced to their first doc, then sorted based on their docID, then
advanced to the doc ID that all agree on, in the order of the sort.

If you move doNext() to the end of the method, then the scorers are sorted based on their
docID, and then immediately re-sorted (based on their sparseness, which is a heuristic applied
based on their first docID).

If I understand correctly, the problem with calling doNext() after the re-sort is this: assume
you have 5 scorers, whose first docs are 1, 2, 3, 5, 5 respectively. Sorting leaves the array
as is (or changes it to be in that order). If you call doNext() after array.sort, then you
advance all the first scorers to 5 (or a larger doc ID they all agree on). However, if you
do the re-sort, the order will be 5, 3, 2, 1, 5 and then if you call doNext(), it will stop
immediately, since the first scorer's docs equals the last one. So you break an invariant,
that after calling doNext() all scorers are on the same doc ID.

That's at least how I understand it. I hope I didn't write too much to explain the obvious
(to others).

So I'll delete the nocommit tag.

Regarding nocommit in ReqExclScorer, you're right. I'll remove firstTime entirely.

bq. Can you add an "assert scorer.docID() == -1" in IndexSearcher right when it gets the scorer?

Done.

bq. BS2.nextDoc is still needing to check if it's supposed to call initCountingSumScorer?
Can we do this in ctor?

It can, if I delete add() and do all the work in the ctor. I'll need to pass 3 arrays (Scorer[],
boolean[] /*required*/, boolean[] /*prohibited*/) though, or pass the list of Weights, Clauses
and IndexReader. I'm not too fan of either. I check which is the lesser evil.

bq. BooleanScorer2.score(Collector, int) makes me a bit nervous

I didn't think it's such a big deal since it's one 'if' before scoring starts. It's not like
it's called for every score(). Do you think we should resolve it?

bq. Hmm... maybe we permit DISI.docID() to also return NO_MORE_DOCS after the DISI is created?
EG we do this in ConjunctionScorer. (Or, maybe, any docID that's < the first real docID...hmmm.)

I do not fully follow you. By allowing to return -1, we already allow any DISI to return a
doc ID that's < the first real doc ID. And allowing to return NO_MORE_DOCS looks strange
to me .. I mean imagine a code which creates a DISI and calls doc() and gets back NO_MORE_DOCS
.. that'd be strange, won't it?

> Add next() and skipTo() variants to DocIdSetIterator that return the current doc, instead
of boolean
> ----------------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-1614
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1614
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>            Reporter: Shai Erera
>             Fix For: 2.9
>
>         Attachments: LUCENE-1614.patch, LUCENE-1614.patch, LUCENE-1614.patch, LUCENE-1614.patch,
LUCENE-1614.patch, LUCENE-1614.patch
>
>
> See http://www.nabble.com/Another-possible-optimization---now-in-DocIdSetIterator-p23223319.html
for the full discussion. The basic idea is to add variants to those two methods that return
the current doc they are at, to save successive calls to doc(). If there are no more docs,
return -1. A summary of what was discussed so far:
> # Deprecate those two methods.
> # Add nextDoc() and skipToDoc(int) that return doc, with default impl in DISI (calls
next() and skipTo() respectively, and will be changed to abstract in 3.0).
> #* I actually would like to propose an alternative to the names: advance() and advance(int)
- the first advances by one, the second advances to target.
> # Wherever these are used, do something like '(doc = advance()) >= 0' instead of comparing
to -1 for improved performance.
> I will post a patch shortly

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message