lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "paul.elschot (JIRA)" <j...@apache.org>
Subject [jira] Commented: (LUCENE-584) Decouple Filter from BitSet
Date Wed, 30 Aug 2006 21:18:27 GMT
    [ http://issues.apache.org/jira/browse/LUCENE-584?page=comments#action_12431696 ] 
            
paul.elschot commented on LUCENE-584:
-------------------------------------

Yonik, as to you questions:

> It looks like no Filters currently return a matcher, so the current patch just lays the
groundwork, right?

Right. Only the previous Filter-20060628.patch contains some commented FIXME code to actually
introduce a BitsMatcher in each case where a BitSet is used.

>When some filters do start to return a matcher, it looks like support for the 1.4 BooleanScorer
needs
> to be removed, or a check done in IndexSearcher.search() to disable skipping on the scorer
if it's in use.

Iirc the patch still supports the 1.4 BooleanScorer when a BitSet is returned by Filter. I'd
have to have a look at the patched IndexSearcher to be sure though.
A BitSet is randomly addressable, so it can work to filter the 1.4 BooleanScorer which can
score documents out of order.  This case can be deprecated completely by also deprecating
the possibility to use the 1.4 boolean scorer, but that is not in the patch. The patch only
deprecates the Filter.bits() method.


> I wonder what the performance impact is... for a dense search with a dense bitset
> filter, it looks like quite a bit of overhead is added (two calls in order to get the
next 
> doc, use of nextSetBit() instead of get(), checking "exhausted" each time and 
> checking for -1 to set exhausted). I suppose one can always drop back to using
> a HitCollector for special cases though.

BitsMatcher could also work without the "exhausted" flag, but then an infinite loop
might occur when trying to continue after the first time next() or skipTo() returned false.
Continuing after false was returned in these cases is a bug, however an infinite loop
can be difficult to debug. I'd rather be on the safe side of that with the exhausted flag
and wait for an actual profile to show the performance problem.

Regards,
Paul Elschot


> Decouple Filter from BitSet
> ---------------------------
>
>                 Key: LUCENE-584
>                 URL: http://issues.apache.org/jira/browse/LUCENE-584
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>    Affects Versions: 2.0.1
>            Reporter: Peter Schäfer
>            Priority: Minor
>         Attachments: BitsMatcher.java, Filter-20060628.patch, HitCollector-20060628.patch,
IndexSearcher-20060628.patch, MatchCollector.java, Matcher.java, Matcher20060830.patch, Matcher20060830b.patch,
Scorer-20060628.patch, Searchable-20060628.patch, Searcher-20060628.patch, SortedVIntList.java,
TestSortedVIntList.java
>
>
> {code}
> package org.apache.lucene.search;
> public abstract class Filter implements java.io.Serializable 
> {
>   public abstract AbstractBitSet bits(IndexReader reader) throws IOException;
> }
> public interface AbstractBitSet 
> {
>   public boolean get(int index);
> }
> {code}
> It would be useful if the method =Filter.bits()= returned an abstract interface, instead
of =java.util.BitSet=.
> Use case: there is a very large index, and, depending on the user's privileges, only
a small portion of the index is actually visible.
> Sparsely populated =java.util.BitSet=s are not efficient and waste lots of memory. It
would be desirable to have an alternative BitSet implementation with smaller memory footprint.
> Though it _is_ possibly to derive classes from =java.util.BitSet=, it was obviously not
designed for that purpose.
> That's why I propose to use an interface instead. The default implementation could still
delegate to =java.util.BitSet=.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message