lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From eks dev <eks...@yahoo.co.uk>
Subject Re: [jira] Updated: (LUCENE-584) Decouple Filter from BitSet
Date Thu, 07 Sep 2006 08:03:45 GMT
The same effect showed-up on athlon CPU... The only diference between your PerfTest and what
I have been doing is that your loop is as tight as it goes. 
*Maybe* worth of investigating a bit futher as my tests look a bit closer to the real usage
scenarios.
 
I will try to identify cases where OpenBitSet nextSetBit starts being slower ... Loks a lot
like what you said, ntz() vs Long.numberOfTrailingZeros(). The second one is (I guess) native
call, so optimizations from ntz() go in and out of some cache (too deep for me anyhow, just
shooting in the dark). 

Good news, inspite of your doubts, BitSetIterator rocks in skipTo scenario! VInt as well,
even in  medium density cases (memory profitability border is now the only condidtion for
its usage).

on the other note,
the key for really efficiant matchers will be good SmartMatcherFactory that picks the best
representation for given density/"sortednes". 
The cases I've been able to identify so far:

Very Low density - IntList
Low density VIntSortedList
Dense - OpenBitSet/BitSetIterator or such
Sorted - (imagine case where you have an oportunity to sort your index on category field,
quite offten I guess as it does not require absolute "sortedness", it is enough to sort periodicly
without caring for smaller updates). There, one simple interval list can do the magic in just
a few bytes of memory, even in high density cases. 
 
Finding good strategy to find optimal Matcher representation is not all that hard, but really
deciding when to do it will be a bit trickier (in warm-up case is easy, but in dynamic LRU
case it gets complicated as you need as the first efficiently to collect bits in some hits
collector, and than transform it to the optimal representation (SmartBitSet), quickly as otherwise
the benefit of LRU caching will quickly get lost on time spant updateing cache...)

More ideas on this?

----- Original Message ----
From: Yonik Seeley <yonik@apache.org>
To: java-dev@lucene.apache.org
Sent: Wednesday, 6 September, 2006 11:54:14 PM
Subject: Re: [jira] Updated: (LUCENE-584) Decouple Filter from BitSet

On 9/6/06, eks dev <eksdev@yahoo.co.uk> wrote:
> still, DenseBitsMatcher (BitSetIterator warpped in Matcher) works faster than anything
else for this case:
>
>  int skip(Matcher m) throws IOException{
>      int doc=-1, ret = 0;
>      while(m.skipTo(doc+1)){
>         doc = m.doc();
>         ret+=doc;
>       }
>      return ret;
>   }

Huh... that doesn't make much sense.
Maybe it's your pentium-M with it's shorter pipeline and less penalty
for a pipeline flush on a branch prediction miss.  You could try
substituting BitUtil.ntz2 for ntz in OpenBitSet.nextSetBit... it uses
a full binary search down to the byte level, and then an array lookup.

The nice thing about being *open* is that you can write more than one
type of iterator if you really want to:-)

-Yonik

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org





---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message