lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From eks dev <eks...@yahoo.co.uk>
Subject Re: Another possible optimization - now in DocIdSetIterator
Date Fri, 24 Apr 2009 22:20:42 GMT
Hi Shai, 
absolutely! 
we have been there, and there are already some micro benchmarks done in LUCENE-1345
just do not forget to use  -1 < doc instead of -1 != doc, trust me, Yonik convinced me
:)

as a side effect, this change would have some positive effects on iterator semantics, prevents,
very  hard to find, "one off" bugs " caused by calling doc() before calling next()". we had
quite a few  of those.  

-1 is good, as it supports "move to the first" in next() without if(initialized), just incrementing
 
. 
cheers, 
eks



From: Shai Erera <serera@gmail.com>
To: java-dev@lucene.apache.org
Sent: Friday, 24 April, 2009 21:26:21
Subject: Another possible optimization - now in DocIdSetIterator


Hi

I think we can make some optimization to DocIdSetIterator. Today, it defines next() and skipTo(int)
which return a boolean. I've checked the code and it looks like almost always when these two
are called, they are followed by a call to doc().

I was thinking that if those two returned the doc Id they are at, instead of boolean, that
will save the call to doc(). Those that use these can:
* Compare doc to a NO_MORE_DOCS constant (set to -1), to understand there are no more docs
in this iterator.
* If skipTo() is called, compare the 'target' to the returned Id, and if they are not the
same, save it so that the next skipTo is requested, they don't perform it if the returned
Id is greater than the target. If it's not possible to save it, they can call doc() to get
that information.

The way I see it, the impls that will still need to call doc() will lose nothing. All we'll
do is change the 'if' from comparing a boolean to comparing ints (even though that's a bit
less efficient than comparing booleans). The impls that call doc() just because all they have
in hand is a boolean, will gain.

Obviously we can't change those methods' signature, so we can deprecate them and intorudce
nextDoc() and skipToDoc(int target). We should still keep doc() around though.

What do you think? If you agree to this change, I can add them to 1593, or create a new issue
(I prefer the latter so that 1593 will be focused on the changes to Collectors).

Shai



      
Mime
View raw message