lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Paul Elschot (JIRA)" <>
Subject [jira] Commented: (LUCENE-584) Decouple Filter from BitSet
Date Thu, 09 Aug 2007 19:45:43 GMT


Paul Elschot commented on LUCENE-584:


I said: "there is never a threadsafety problem. (See BitSetMatcher.getMatcher() which uses
a local class for the resulting Matcher.)"
That was a mistake. BitSetMatcher is a Matcher constructed from a BitSet, and SortedVIntList
has a getMatcher() method, and I confused the two.

A Matcher is intended to be used in a single thread, so I don't expect thread safety problems.

The problem for the XML parser is that with this patch, the implementing data structure of
a Filter becomes
unaccessible from the Filter class, so it cannot be cached from there.
That means that some cached data structure will have to be chosen, and one way to do
that is by using class BitSetFilter from the patch. This has a bits() method just like the
current Filter class.
CachingWrapperFilter could then become a cache for BitSetFilter.

There is indeed no caching of filters in this patch.
The reason for that is that some Filters do not need a cache. For example:
class TermFilter {
  TermFilter(Term t) {this.term = t;}
  Matcher getMatcher(reader) {return new TermMatcher( reader.termDocs(this.term);}
TermMatcher does not exist (yet), but it could be easily introduced by leaving all the
scoring out of the current TermScorer.

As for DocIdSet, as long as this provides a Matcher as an iterator, it can be used to
implement a (caching) filter.

I don't think this patch complicates the implementation of caching strategies.
For example one could define:
class CachableFilter extends Filter {
  ... some methods to access the underlying data structure to be cached. ...
or write a similar adapter for some subclass of Filter and then write a FilterCache that caches

I did consider defining Matcher as an interface, but I preferred not to do that because
of the default explain() method in the Matcher class of the patch.

> Decouple Filter from BitSet
> ---------------------------
>                 Key: LUCENE-584
>                 URL:
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>    Affects Versions: 2.0.1
>            Reporter: Peter Schäfer
>            Priority: Minor
>         Attachments: bench-diff.txt, bench-diff.txt, Matcher1-ground-20070730.patch,
Matcher2-default-20070730.patch, Matcher3-core-20070730.patch, Matcher4-contrib-misc-20070730.patch,
Matcher5-contrib-queries-20070730.patch, Matcher6-contrib-xml-20070730.patch, Some
> {code}
> package;
> public abstract class Filter implements 
> {
>   public abstract AbstractBitSet bits(IndexReader reader) throws IOException;
> }
> public interface AbstractBitSet 
> {
>   public boolean get(int index);
> }
> {code}
> It would be useful if the method =Filter.bits()= returned an abstract interface, instead
of =java.util.BitSet=.
> Use case: there is a very large index, and, depending on the user's privileges, only
a small portion of the index is actually visible.
> Sparsely populated =java.util.BitSet=s are not efficient and waste lots of memory. It
would be desirable to have an alternative BitSet implementation with smaller memory footprint.
> Though it _is_ possibly to derive classes from =java.util.BitSet=, it was obviously not
designed for that purpose.
> That's why I propose to use an interface instead. The default implementation could still
delegate to =java.util.BitSet=.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message