lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Michael McCandless (JIRA)" <>
Subject [jira] Commented: (LUCENE-1614) Add next() and skipTo() variants to DocIdSetIterator that return the current doc, instead of boolean
Date Thu, 30 Apr 2009 10:04:30 GMT


Michael McCandless commented on LUCENE-1614:

bq. Just to clarify for myself, in the example I gave above, suppose thar the scorer is on
"3" and you call check(8).

On check(8), TermScorer would go to 10, stop there, and return false.  (It would not "rewind"
to 3).  Check can only be called on increasing arguments, so it's not truly "random access".
 It's "forward only random access".

bq. You propose this check() so that in case a DISI can save any extra operations it does
in next() (such as reading a payload for example) it will do so. Therefore in the example
you give above with CS, next()'s contract forces it to advance all the sub-scorers, but with
check() it could stop in the middle.


This is important when you have a super-cheap iterator (say a somewhat sparse (<=10%?)
in-memory filter that's represented as list-of-docIDs).  It's very fast for such a filter
to iterate over its docIDs.  But when that iterator is AND'd with a Scorer, as is done today
by IndexSearcher, they effectively play "leap frog", where first it's the filter's turn to
next(), then it's the Scorer's turn, etc.  But for the Scorer, next() can be extremely costly,
only to find the filter doesn't accept it.  So for such situations it's better to let the
filter drive the search, calling Scorer.check() on the docs.

But... once we switch to filter-as-BooleanClause, it's less clear whether check() is worthwhile,
because I think the filter's constraint is more efficiently taken into account.

For filters that support random access (if they are less sparse, say >= 25% or so), we
should push them all the way down to the TermScorers and factor them in just like deletedDocs.

bq. . If the default impl in DISI just uses nextDoc() and returns true if the return value
is the requested, we should be safe back-compat-wise, but this is still dangerous and we need
clear documentation.

Yes it does have a good default impl, I think.

bq. BTW, perhaps a testAndSet-like version can save check(10) followed by a next(10), and
will fit nicer?

Not sure what you mean by "testAndSet-like version"?

> Add next() and skipTo() variants to DocIdSetIterator that return the current doc, instead
of boolean
> ----------------------------------------------------------------------------------------------------
>                 Key: LUCENE-1614
>                 URL:
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>            Reporter: Shai Erera
>             Fix For: 2.9
> See
for the full discussion. The basic idea is to add variants to those two methods that return
the current doc they are at, to save successive calls to doc(). If there are no more docs,
return -1. A summary of what was discussed so far:
> # Deprecate those two methods.
> # Add nextDoc() and skipToDoc(int) that return doc, with default impl in DISI (calls
next() and skipTo() respectively, and will be changed to abstract in 3.0).
> #* I actually would like to propose an alternative to the names: advance() and advance(int)
- the first advances by one, the second advances to target.
> # Wherever these are used, do something like '(doc = advance()) >= 0' instead of comparing
to -1 for improved performance.
> I will post a patch shortly

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message