lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Yonik Seeley <>
Subject Re: constant scoring queries
Date Wed, 11 May 2005 02:42:08 GMT
Hey now... you're going to obsolete all my in-house code and put me
out of a job ;-)

Could you elaborate on the advantage of having say a TermQuery that
could be either normal-scoring or constant-scoring vs two different
Query classes for doing this?  They seem roughly equivalent.

> 1. Add two methods to
>    public boolean constantScoring();
>    public void constantScoring(boolean);
>    When constantScoring(), the boost() is the score for matches.

That seems fine.

> 2. Add two methods to
>    public BitSet cachedBitSet(Query) { return null; }
>    public void cacheBitSet(Query, BitSet) {}
>    IndexSearcher overrides these to maintain an LRU cache of bitsets.

Yup, that's what I have.
Things should be extensible and use a caching interface - the default
implementation being an LRU cache, but users could use their own
implementations to get LFU behavior or whatever.
> 3. Modify BooleanQuery so that, when constantScoring(), TooManyClauses
> is not thrown.

This is good, but not sufficient for RangeQuery.  If
RangeQuery.constantScoring(), then it should not rewrite to a
BooleanQuery at all.  Depending on the RangeQuery, just the creation
of a BooleanQuery that matches it is too heavyweight.
> Also, instead of BitSet we could use an interface:
>    public interface DocIdSet {
>      void add(int docId);
>      boolean contains(int docId);
>      int next(int docId);
>    }
> to permit sparse representations.

Definitely a DocIdSet.  It's called DocSet in my code and has a bitset
implementation and a compact implementation that's an int hash set
(unordered cause I just use it as a filter now).  Here is the basic

public interface DocSet {
  public int size();
  public boolean exists(int docid);
  public DocIterator iterator();
  public BitSet getBits();
  public long memSize();
  public DocSet intersection(DocSet other);
  public int intersectionSize(DocSet other);
  public DocSet union(DocSet other);
  public int unionSize(DocSet other);

I would separate out int next(int docId) into an iterator.  It may be
more efficient to iterate over certain structures if you can maintain
state about where you are (and this may even be true of a BitSet).


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message