lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Robert Engels" <reng...@ix.netcom.com>
Subject RE: constant scoring queries
Date Tue, 10 May 2005 20:55:42 GMT
I did the nearly the exact same thing in my "derived" Lucene. But in order
to limit modifications to the Lucene core, I created a QueryCache class, and
have derived versions of Prefix and Range query consult the class, passing
in the IndexReader and query to see if there is a cached result. I also
calls QueryCache.clear(IndexReader), when the IndexReader goes out of scope.

Will there be a problem with associating the cache with the IndexSearcher
instances, since it seems that common Lucene code uses code similar to

IndexSearcher searcher = new IndexSearcher(reader);

every time they need to perform a search?

It is REALLY efficient for automatic caching of common range queries and
prefix queries, as I think many users of Lucene pass use a range query to
look for documents modified in the "last n days". The ONLY overhead is extra
memory usage (since without the cache the query needs to be executed as is),
but the size of the LRU cache can be controlled via a property.

-----Original Message-----
From: Doug Cutting [mailto:cutting@apache.org]
Sent: Tuesday, May 10, 2005 3:40 PM
To: java-dev@lucene.apache.org
Subject: constant scoring queries


Background: In http://issues.apache.org/bugzilla/show_bug.cgi?id=34673,
Yonik Seely proposes a ConstantScoreQuery, based on a Filter.  And in
http://www.mail-archive.com/lucene-dev@jakarta.apache.org/msg08007.html
I proposed a mechanism to promote the use of Filters.  Through all of
this, Paul Elshot has hinted that there might be a better way.

Here's another proposal, tackling many of the same issues:

1. Add two methods to Query.java:

   public boolean constantScoring();
   public void constantScoring(boolean);

   When constantScoring(), the boost() is the score for matches.

2. Add two methods to Searcher.java:

   public BitSet cachedBitSet(Query) { return null; }
   public void cacheBitSet(Query, BitSet) {}

   IndexSearcher overrides these to maintain an LRU cache of bitsets.

3. Modify BooleanQuery so that, when constantScoring(), TooManyClauses
is not thrown.

4. Modify BooleanScorer to, if constantScoring(),
   - check Searcher for a cached bitset
   - failing that, create a bitset
   - evaluate clauses serially, saving results in bitset
   - cache the bitset
   - use the bitset to handle doc(), next() and skipTo();

5. TermQuery and PhraseQuery could be similarly modified, so that, when
constant scoring, bitsets are cached for very common terms (e.g., >5% of
documents).

With these changes, WildcardQuery, PrefixQuery, RangeQuery etc., when
declared to be constant scoring, will operate much faster and never
throw TooManyClauses.  We can add an option (the default?) to
QueryParser to make these constant scoring.

Also, instead of BitSet we could use an interface:

   public interface DocIdSet {
     void add(int docId);
     boolean contains(int docId);
     int next(int docId);
   }

to permit sparse representations.

Thoughts?

Doug


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message