lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Robert Engels" <reng...@ix.netcom.com>
Subject RE: constant scoring queries
Date Wed, 11 May 2005 20:54:34 GMT
Would there be anyway to "rewrite" the cached queries as documents are
added?

By this I mean, if a user runs an "expensive" range query that gets cached,
then another user adds a document that should be included in the cached
query, the addDocument() method would "update" the cached query. I think
this is useful when using the "rolling index" method of performing
incremental updates to the index, so cached queries could remain valid after
a roll.

I think a callback interface similar to

updateQuery(Term t,BitSet newdocs);

Then the query if free to ignore the update if the term does not fall inside
the query. This check is obviously query specific.


-----Original Message-----
From: Yonik Seeley [mailto:yseeley@gmail.com]
Sent: Tuesday, May 10, 2005 9:42 PM
To: java-dev@lucene.apache.org
Subject: Re: constant scoring queries


Hey now... you're going to obsolete all my in-house code and put me
out of a job ;-)

Could you elaborate on the advantage of having say a TermQuery that
could be either normal-scoring or constant-scoring vs two different
Query classes for doing this?  They seem roughly equivalent.


> 1. Add two methods to Query.java:
>
>    public boolean constantScoring();
>    public void constantScoring(boolean);
>
>    When constantScoring(), the boost() is the score for matches.

That seems fine.

> 2. Add two methods to Searcher.java:
>
>    public BitSet cachedBitSet(Query) { return null; }
>    public void cacheBitSet(Query, BitSet) {}
>
>    IndexSearcher overrides these to maintain an LRU cache of bitsets.

Yup, that's what I have.
Things should be extensible and use a caching interface - the default
implementation being an LRU cache, but users could use their own
implementations to get LFU behavior or whatever.

> 3. Modify BooleanQuery so that, when constantScoring(), TooManyClauses
> is not thrown.

This is good, but not sufficient for RangeQuery.  If
RangeQuery.constantScoring(), then it should not rewrite to a
BooleanQuery at all.  Depending on the RangeQuery, just the creation
of a BooleanQuery that matches it is too heavyweight.

> Also, instead of BitSet we could use an interface:
>
>    public interface DocIdSet {
>      void add(int docId);
>      boolean contains(int docId);
>      int next(int docId);
>    }
>
> to permit sparse representations.

Definitely a DocIdSet.  It's called DocSet in my code and has a bitset
implementation and a compact implementation that's an int hash set
(unordered cause I just use it as a filter now).  Here is the basic
interface:

public interface DocSet {
  public int size();
  public boolean exists(int docid);
  public DocIterator iterator();
  public BitSet getBits();
  public long memSize();
  public DocSet intersection(DocSet other);
  public int intersectionSize(DocSet other);
  public DocSet union(DocSet other);
  public int unionSize(DocSet other);
}


I would separate out int next(int docId) into an iterator.  It may be
more efficient to iterate over certain structures if you can maintain
state about where you are (and this may even be true of a BitSet).

-Yonik

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message