lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mark Harwood (JIRA)" <>
Subject [jira] Commented: (LUCENE-584) Decouple Filter from BitSet
Date Thu, 09 Aug 2007 07:09:48 GMT


Mark Harwood commented on LUCENE-584:

Some further thought on the roles/responsibilities of the various components:

Given a blank sheet of paper (a luxury we may not have) the minimum requirements I would have
could be met with the following:
(note that use of the words "Matcher" and "Filter" etc have been removed because sets of doc
IDs have applications outside of filtering/querying e.g. category counts)

interface DocIdSetFactory
    DocIdSet getDocIdSet(IndexReader reader)
This is more or less equivalent to the purpose of the existing "Filter" - different implementations
define their own selection criteria and produce a set of matching doc Ids e.g. equivalent
of RangeFilter. Each implementation must implement "hashcode" and "equals" methods based on
it's criteria so the factory can be cached and reused (in the same way Query objects are expected
to). The existing CachedFilterBuilder in the XMLQueryParser provides one example of a strategy
for caching Filters using this facility. 

interface DocIdSet
    DocIdSetIterator getIterator();
This interface defines an immutable, threadsafe (and therefore cachable) collection of doc
IDs. Different implementations provide space-efficient alternatives for sparse or heavily
populated sets e.g. BitSet, OpenBitSet, SortedVIntList. As an example caching strategy - the
existing CachingWrapperFilter would cache these objects in a WeakHashMap keyed on IndexReader.

interface DocIdSetIterator
    boolean next();
    int getDoc();
A thread unsafe, single use object, (probably with only one implementation) that is used to
iterate across any DocIdSet. Not cachable and used by Scorers.

In the existing proposal it feels like DocIdSet and DocIdSetIterator are rolled into one in
the form of the Matcher which complicates/prevents caching strategies.


> Decouple Filter from BitSet
> ---------------------------
>                 Key: LUCENE-584
>                 URL:
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>    Affects Versions: 2.0.1
>            Reporter: Peter Schäfer
>            Priority: Minor
>         Attachments: bench-diff.txt, bench-diff.txt, Matcher1-ground-20070730.patch,
Matcher2-default-20070730.patch, Matcher3-core-20070730.patch, Matcher4-contrib-misc-20070730.patch,
Matcher5-contrib-queries-20070730.patch, Matcher6-contrib-xml-20070730.patch, Some
> {code}
> package;
> public abstract class Filter implements 
> {
>   public abstract AbstractBitSet bits(IndexReader reader) throws IOException;
> }
> public interface AbstractBitSet 
> {
>   public boolean get(int index);
> }
> {code}
> It would be useful if the method =Filter.bits()= returned an abstract interface, instead
of =java.util.BitSet=.
> Use case: there is a very large index, and, depending on the user's privileges, only
a small portion of the index is actually visible.
> Sparsely populated =java.util.BitSet=s are not efficient and waste lots of memory. It
would be desirable to have an alternative BitSet implementation with smaller memory footprint.
> Though it _is_ possibly to derive classes from =java.util.BitSet=, it was obviously not
designed for that purpose.
> That's why I propose to use an interface instead. The default implementation could still
delegate to =java.util.BitSet=.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message