lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Paul Elschot (JIRA)" <j...@apache.org>
Subject [jira] Commented: (LUCENE-584) Decouple Filter from BitSet
Date Wed, 25 Jul 2007 21:55:31 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12515434
] 

Paul Elschot commented on LUCENE-584:
-------------------------------------

Have a look at BitSetMatcher in the -default patch. It is constructed from a BitSet, and it
has a method getMatcher() that returns a Matcher that acts as a searching iterator over the
BitSet.

So that is 1) to 4), at least potentially. A clone() method is currently not implemented iirc,
but each call to getMatcher() will return a new iterator over the underlying BitSet. And when
guaranteed non modifyability is needed, a constructor can take a copy of the given document
set, in whatever form.

The point of Matcher is that it allows other implementations than BitSet, like OpenBitSet
and SortedVIntList. Both have the properties that you are looking for. SortedVIntList can
save a lot of memory when compared to (Open)BitSet, and OpenBitSet is somewhat faster than
BitSet. 

I'd like to have a skip list version of SortedVIntList, too. This would be slightly larger
than SortedVIntList, but more efficient on skipTo().

But the first thing that is necessary is to have Filter independent from BitSet.

The real pain with that is going to be the code that currently implements Filters
outside the lucene code base, and a default implementation of a Matcher
should be of help there, just as it is in the -core patch now.

The default implementation will probably need to be improved from its current
state, but that can be done later. For example, one could also use OpenBitSet
in all cases, and even collect the filtered documents directly in that.

Cheers,
Paul Elschot

> Decouple Filter from BitSet
> ---------------------------
>
>                 Key: LUCENE-584
>                 URL: https://issues.apache.org/jira/browse/LUCENE-584
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>    Affects Versions: 2.0.1
>            Reporter: Peter Schäfer
>            Priority: Minor
>         Attachments: bench-diff.txt, bench-diff.txt, Matcher-core20070725.patch, Matcher-default20070725.patch,
Matcher-ground20070725.patch, Some Matchers.zip
>
>
> {code}
> package org.apache.lucene.search;
> public abstract class Filter implements java.io.Serializable 
> {
>   public abstract AbstractBitSet bits(IndexReader reader) throws IOException;
> }
> public interface AbstractBitSet 
> {
>   public boolean get(int index);
> }
> {code}
> It would be useful if the method =Filter.bits()= returned an abstract interface, instead
of =java.util.BitSet=.
> Use case: there is a very large index, and, depending on the user's privileges, only
a small portion of the index is actually visible.
> Sparsely populated =java.util.BitSet=s are not efficient and waste lots of memory. It
would be desirable to have an alternative BitSet implementation with smaller memory footprint.
> Though it _is_ possibly to derive classes from =java.util.BitSet=, it was obviously not
designed for that purpose.
> That's why I propose to use an interface instead. The default implementation could still
delegate to =java.util.BitSet=.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message