lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From eks dev <>
Subject Re: [jira] Commented: (LUCENE-584) Decouple Filter from BitSet
Date Thu, 26 Jul 2007 08:03:31 GMT
>Mark Harwood commented on LUCENE-584:

Hi Mark, we used to use Filters a lot...  and concluded,  Matcher is great!  It just takes
some time to get it in head, let me try to help you get there :)

<<<I saw BitSetMatcher etc and appreciate the motivation behind the design for alternative
implementations . What concerns me with the Matcher API in general is that Matchers have non-threadsafe
safe state (i.e. the current position required to support next() )and as such aren't safely
cachable in the same way as BitSets. I see the searcher code uses the safer skipTo() rather
than next()  but there's still the "if(exhausted)" thread safety problem to worry about which
is why I raised points 1 and 4.>>>

1. "Caching Issue": You do not want to cache Matcher, this is just an "Iterator with forward
skipping possibility", why would one cache iterators? (can  be done by introducing rewind(),
maybe not bad idea?). What you really need to put in cache is object that implements Matcher
interface, or some object for which is easy to provide Matcher interface.

2. "thread safety issue" I did not get it, what scenario you see here? 

<<<Additionally, combining Bitsets using Booolean logic is one method call whereas
combining heterogenous Matchers using Boolean logic requires iteration across them and therefore
potentially many method calls (point 3). >>>

3. Lucene core uses next() and skipTo() to combine Filter/Query today, there are no BitSet.and(BitSet)
in Lucene core! this is not going to be changed. If yo need to combine bit sets, you can do
it easily on classes that implement Matcher (imagine, you have two OpenBitSets and they implement
Matcher, nothing prevents you from OpenBitSet.and(OpenBitSet)-ing these implementing objects?
). Simply, you are not less flexible due to Matcher, simply you can do everything as before,
 you are just  not bound  to  memory hungry, slow BitSet ...

<<<I haven't benchmarked this but I imagine it to be significantly slower?>>>
Sure,  but you do not have to make your Filter arithmetic via Matcher, just do it directly
on your implementing classes. 

<<<I use BooleanFilter a lot for security where many large sets are cached and combined
on the fly - caching all the possible combinations as single bitsets would lead to too many
possible combinations. >>>

You can freely keep something like BooleanFilter , even make it faster with OpenBitSet, or
something else even faster, memory better,  and than, once you have Filter content you'd like
to use, just pass it as Matcher to search() method and ta da, yo have it.

Yahoo! Mail is the world's favourite email. Don't settle for less, sign up for
your free account today*

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message