lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Michael McCandless (JIRA)" <j...@apache.org>
Subject [jira] Commented: (LUCENE-1614) Add next() and skipTo() variants to DocIdSetIterator that return the current doc, instead of boolean
Date Wed, 27 May 2009 20:59:45 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-1614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12713732#action_12713732
] 

Michael McCandless commented on LUCENE-1614:
--------------------------------------------

{quote}
Regarding the nocommit in ConjunctionScorer's ctor - I think the problem with moving doNext()
to the end of the ctor is the order of the scorers that will be iterated on. I.e., currently
the scorers are first advanced to their first doc, then sorted based on their docID, then
advanced to the doc ID that all agree on, in the order of the sort.
If you move doNext() to the end of the method, then the scorers are sorted based on their
docID, and then immediately re-sorted (based on their sparseness, which is a heuristic applied
based on their first docID).

If I understand correctly, the problem with calling doNext() after the re-sort is this: assume
you have 5 scorers, whose first docs are 1, 2, 3, 5, 5 respectively. Sorting leaves the array
as is (or changes it to be in that order). If you call doNext() after array.sort, then you
advance all the first scorers to 5 (or a larger doc ID they all agree on). However, if you
do the re-sort, the order will be 5, 3, 2, 1, 5 and then if you call doNext(), it will stop
immediately, since the first scorer's docs equals the last one. So you break an invariant,
that after calling doNext() all scorers are on the same doc ID.

That's at least how I understand it. I hope I didn't write too much to explain the obvious
(to others).

So I'll delete the nocommit tag.
{quote}

Ahh -- good explanation!  Phew.  This must be what leads to the two
test failures.  So leave it where it is (maybe add a comment
explaining it cannot be moved down, lest anyone else gets tempted).

{quote}
> Can you add an "assert scorer.docID() == -1" in IndexSearcher right when it gets the
scorer?

Done.
{quote}

OK, except we may delete it depending on the relaxing (below)...

{quote}
> BS2.nextDoc is still needing to check if it's supposed to call initCountingSumScorer?
Can we do this in ctor?

It can, if I delete add() and do all the work in the ctor.
{quote}

I think we should (lacking a DISI.start).

bq. I'll need to pass 3 arrays (Scorer[], boolean[] /required/, boolean[] /prohibited/) though,
or pass the list of Weights, Clauses and IndexReader. I'm not too fan of either. I check which
is the lesser evil.

Hmm -- I don't have a stronger lesser-of-evils preference.  I suppose,
instead, we could add a "start" only to BS2, which BQ would call once
it's done adding?  Or, simply call initCountingSumScorer from BQ and
don't do the check per nextDoc/advance.

{quote}
> BooleanScorer2.score(Collector, int) makes me a bit nervous

I didn't think it's such a big deal since it's one 'if' before scoring starts. It's not like
it's called for every score(). Do you think we should resolve it?
{quote}

What specifically made me nervous that it was even necessary to add
this (having to conditionally next wasn't needed before) in the first
place.  If you remove it, does something actually break?  Like what
caused it to be added?  (Because I want to go explore/understand
*that* cause).

{quote}
> Hmm... maybe we permit DISI.docID() to also return NO_MORE_DOCS after the DISI is created?
EG we do this in ConjunctionScorer. (Or, maybe, any docID that's < the first real docID...hmmm.)

I do not fully follow you. By allowing to return -1, we already allow any DISI to return a
doc ID that's < the first real doc ID. And allowing to return NO_MORE_DOCS looks strange
to me .. I mean imagine a code which creates a DISI and calls doc() and gets back NO_MORE_DOCS
.. that'd be strange, won't it?
{quote}

I'm thinking it's OK to call docID before next, and that all we
require is that call return a docID less than its first real docID.

And, if you know up-front you have no docs, returning NO_MORE_DOCS up
front is also OK.

Ie this is a relaxation over "you must return -1 before nextDoc has
been called".

(EG I think the last patch would return NO_MORE_DOCS from docID() in
ConjunctionScorer if it determines in ctor that no docs match).


> Add next() and skipTo() variants to DocIdSetIterator that return the current doc, instead
of boolean
> ----------------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-1614
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1614
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>            Reporter: Shai Erera
>             Fix For: 2.9
>
>         Attachments: LUCENE-1614.patch, LUCENE-1614.patch, LUCENE-1614.patch, LUCENE-1614.patch,
LUCENE-1614.patch, LUCENE-1614.patch
>
>
> See http://www.nabble.com/Another-possible-optimization---now-in-DocIdSetIterator-p23223319.html
for the full discussion. The basic idea is to add variants to those two methods that return
the current doc they are at, to save successive calls to doc(). If there are no more docs,
return -1. A summary of what was discussed so far:
> # Deprecate those two methods.
> # Add nextDoc() and skipToDoc(int) that return doc, with default impl in DISI (calls
next() and skipTo() respectively, and will be changed to abstract in 3.0).
> #* I actually would like to propose an alternative to the names: advance() and advance(int)
- the first advances by one, the second advances to target.
> # Wherever these are used, do something like '(doc = advance()) >= 0' instead of comparing
to -1 for improved performance.
> I will post a patch shortly

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message