lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Doug Cutting <cutt...@apache.org>
Subject constant scoring queries
Date Tue, 10 May 2005 20:39:43 GMT
Background: In http://issues.apache.org/bugzilla/show_bug.cgi?id=34673, 
Yonik Seely proposes a ConstantScoreQuery, based on a Filter.  And in 
http://www.mail-archive.com/lucene-dev@jakarta.apache.org/msg08007.html 
I proposed a mechanism to promote the use of Filters.  Through all of 
this, Paul Elshot has hinted that there might be a better way.

Here's another proposal, tackling many of the same issues:

1. Add two methods to Query.java:

   public boolean constantScoring();
   public void constantScoring(boolean);

   When constantScoring(), the boost() is the score for matches.

2. Add two methods to Searcher.java:

   public BitSet cachedBitSet(Query) { return null; }
   public void cacheBitSet(Query, BitSet) {}

   IndexSearcher overrides these to maintain an LRU cache of bitsets.

3. Modify BooleanQuery so that, when constantScoring(), TooManyClauses 
is not thrown.

4. Modify BooleanScorer to, if constantScoring(),
   - check Searcher for a cached bitset
   - failing that, create a bitset
   - evaluate clauses serially, saving results in bitset
   - cache the bitset
   - use the bitset to handle doc(), next() and skipTo();

5. TermQuery and PhraseQuery could be similarly modified, so that, when 
constant scoring, bitsets are cached for very common terms (e.g., >5% of 
documents).

With these changes, WildcardQuery, PrefixQuery, RangeQuery etc., when 
declared to be constant scoring, will operate much faster and never 
throw TooManyClauses.  We can add an option (the default?) to 
QueryParser to make these constant scoring.

Also, instead of BitSet we could use an interface:

   public interface DocIdSet {
     void add(int docId);
     boolean contains(int docId);
     int next(int docId);
   }

to permit sparse representations.

Thoughts?

Doug


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message