lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Michael McCandless (JIRA)" <j...@apache.org>
Subject [jira] Commented: (LUCENE-1614) Add next() and skipTo() variants to DocIdSetIterator that return the current doc, instead of boolean
Date Wed, 27 May 2009 09:38:45 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-1614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12713493#action_12713493
] 

Michael McCandless commented on LUCENE-1614:
--------------------------------------------

bq. I added testEmptyBucketWithMoreDocs in TestBooleanScorer which creates a BooleanScorer
that returns 3000 the first time nextDoc is called

Excellent!

bq. How about if I call nextDoc() when addScorer is called? I tried it and all tests pass,
which at least means there is no test case that fails (see above comment). If I do this, I
can also check if the result is NO_MORE_DOCS and don't add it to the sub scorers list in the
first place.

That sounds reasonable.

bq. Another thought of an optimization. Somewhere up this issue, we discussed adding start()
to DISI, just to get rid of firstTime in DisjunctionMaxScorer and ConjunctionScorer.

I think having doc() return -1 before next/advance have been called
was also needed (or maybe just helpful?) for this.

Also, thinking more on this, isn't DISI.start() redundant?  (Since
such init'ing could be done in Weight.scorer(), when the Scorer is
created)?  We're not allowed to reuse a DISI, so...

Oh, I see: it's not redundant for scorers that do incremental
construction (BS, BS2).  But I think we should fix such cases to
accept all sub-queries up front, then DISI.start() is redundant?

bq. Also, DMS can be changed quite a bit, not using ArrayList but an array.

These sound like good optimizations too.

bq. Also, since BS.add() is called by BS2 only, and BS is package-private and instantiated
by BS2 only, I can remove add, and pass to BS ctor two iterators (for prohibited and optional).
That will allow us to compute coordFactor up front and remove the check from score() and score(Collector,
int).

This sounds good, as well as switching BS2 to take all its sub-queries
up front.


> Add next() and skipTo() variants to DocIdSetIterator that return the current doc, instead
of boolean
> ----------------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-1614
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1614
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>            Reporter: Shai Erera
>             Fix For: 2.9
>
>         Attachments: LUCENE-1614.patch, LUCENE-1614.patch, LUCENE-1614.patch
>
>
> See http://www.nabble.com/Another-possible-optimization---now-in-DocIdSetIterator-p23223319.html
for the full discussion. The basic idea is to add variants to those two methods that return
the current doc they are at, to save successive calls to doc(). If there are no more docs,
return -1. A summary of what was discussed so far:
> # Deprecate those two methods.
> # Add nextDoc() and skipToDoc(int) that return doc, with default impl in DISI (calls
next() and skipTo() respectively, and will be changed to abstract in 3.0).
> #* I actually would like to propose an alternative to the names: advance() and advance(int)
- the first advances by one, the second advances to target.
> # Wherever these are used, do something like '(doc = advance()) >= 0' instead of comparing
to -1 for improved performance.
> I will post a patch shortly

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message