Return-Path: Delivered-To: apmail-jakarta-lucene-dev-archive@www.apache.org Received: (qmail 70258 invoked from network); 30 Jul 2004 21:18:04 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (209.237.227.199) by minotaur-2.apache.org with SMTP; 30 Jul 2004 21:18:04 -0000 Received: (qmail 83392 invoked by uid 500); 30 Jul 2004 21:18:00 -0000 Delivered-To: apmail-jakarta-lucene-dev-archive@jakarta.apache.org Received: (qmail 83352 invoked by uid 500); 30 Jul 2004 21:18:00 -0000 Mailing-List: contact lucene-dev-help@jakarta.apache.org; run by ezmlm Precedence: bulk List-Unsubscribe: List-Subscribe: List-Help: List-Post: List-Id: "Lucene Developers List" Reply-To: "Lucene Developers List" Delivered-To: mailing list lucene-dev@jakarta.apache.org Received: (qmail 83339 invoked by uid 99); 30 Jul 2004 21:18:00 -0000 X-ASF-Spam-Status: No, hits=0.0 required=10.0 tests=FORGED_RCVD_HELO X-Spam-Check-By: apache.org Received: from [194.109.24.32] (HELO smtp-vbr12.xs4all.nl) (194.109.24.32) by apache.org (qpsmtpd/0.27.1) with ESMTP; Fri, 30 Jul 2004 14:17:58 -0700 Received: from k7l.local (porta.xs4all.nl [80.127.24.69]) by smtp-vbr12.xs4all.nl (8.12.11/8.12.11) with ESMTP id i6ULHpev035094 for ; Fri, 30 Jul 2004 23:17:51 +0200 (CEST) (envelope-from paul.elschot@xs4all.nl) From: Paul Elschot To: lucene-dev@jakarta.apache.org Subject: FilteringQuery.java Date: Fri, 30 Jul 2004 23:17:50 +0200 User-Agent: KMail/1.5.4 References: <20040727123815.29037.qmail@web12707.mail.yahoo.com> In-Reply-To: <20040727123815.29037.qmail@web12707.mail.yahoo.com> MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Content-Disposition: inline Message-Id: <200407302317.50640.paul.elschot@xs4all.nl> X-Virus-Scanned: by XS4ALL Virus Scanner X-Virus-Checked: Checked X-Spam-Rating: minotaur-2.apache.org 1.6.2 0/1000/N Dear developers, At the moment IndexSearcher.search(Query, Filter) computes a score for every document matching the query before checking the filter. With the BitSet.nextSetBit() method one might implement a filter as a required clause in a Query. This would even allow the evt. use of ConjunctionScorer and skipTo() in appropriate circumstances, currently all other clauses required. Below is a Query that intents to do this. It compiles against current CVS, but it has not yet been tested. Before I start writing test code I'd like to have some comments. For very large indexes, and relatively small nrs of filtered docs, a similar filter could be used with something sparser than a full BitSet, eg. a byte array of VInts with the differences between the document numbers. Regards, Paul. Here it is, FilteringQuery.java, under Apache 2.0 licence: package org.apache.lucene.search; import java.util.BitSet; import java.io.IOException; import org.apache.lucene.index.IndexReader; public abstract class FilteringQuery extends Query { Filter filter; String filterName; public FilteringQuery(Filter filter, String filterName) { this.filter = filter; /* should be non null */ this.filterName = filterName; /* for explanations */ } protected String getFilterExplanation() { return (filterName != null) ? filterName : filter.toString(); } /** Prints this FilteringQuery to a String. * @param field Should be null because a FilteringQuery depends on a filter. */ public String toString(String field) { String res = "FilteringQuery( " + getFilterExplanation() + ")"; if (field == null) return res; else return res + "(" + field + " ?)"; } /** Prints this query to a string. */ public String toString() { return toString(null); } /** Expert: * @return null. No similarity is used for scoring a FilteringQuery. */ public Similarity getSimilarity(Searcher searcher) {return null;} /** Expert: Apply the Filter and use the result in another Query which * extends BooleanQuery to have ConjunctionScorer used when it is Query is required. */ public Query rewrite(IndexReader reader) throws IOException { class SkipReaderBitsQuery extends Query { /** Prints this to a String. * @param field Should be null. */ public String toString(String field) { String res = "SkipReaderBitsQuery( " + getFilterExplanation() + ")"; if (field == null) return res; else return res + "(" + field + " ?)"; } /** Expert: Constructs a Weight implementation for this SkipReaderBitsQuery. *

Only implemented by primitive queries, which re-write to themselves. */ protected Weight createWeight(final Searcher searcher) { class FilterWeight implements Weight { public float getValue() {return 0.0f;} public void normalize(float norm) {} public float sumOfSquaredWeights() {return 0.0f;} public Query getQuery() {return FilteringQuery.this;} public Explanation explain(IndexReader reader, int doc) { return new Explanation(getValue(), "weightless " + getFilterExplanation()); } public Scorer scorer(final IndexReader reader) throws IOException { class SkipReaderBitsScorer extends Scorer { BitSet docNrs; int currentDoc; FilterReaderBitsScorer(Similarity similarity) throws IOException { super(similarity); /* CHECKME: ok not to compute the bits earlier? */ docNrs = FilteringQuery.this.filter.bits(reader); currentDoc = -1; } public int doc() {return currentDoc;} public float score() {return 0.0f;} /* should not be called after returning false */ public boolean next() { currentDoc = docNrs.nextSetBit(currentDoc + 1); /* -1 when no next bit */ return currentDoc >= 0; } /* should not be called after returning false */ public boolean skipTo(int target) { currentDoc = docNrs.nextSetBit((currentDoc < target) ? target : (currentDoc + 1)); return currentDoc >= 0; } public Explanation explain(int doc) { skipTo(doc); return new Explanation(score() /* zero anyway */, "document " + doc + " " + ((currentDoc == doc) ? "matches" : "does not match" ) + " filter: " + getFilterExplanation()); } } return new SkipReaderBitsScorer(getSimilarity(searcher)); } } return new FilterWeight(); } } return new SkipReaderBitsQuery(); } } --------------------------------------------------------------------- To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org For additional commands, e-mail: lucene-dev-help@jakarta.apache.org