Return-Path: Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: (qmail 29122 invoked from network); 19 Dec 2005 00:29:17 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (209.237.227.199) by minotaur.apache.org with SMTP; 19 Dec 2005 00:29:17 -0000 Received: (qmail 40231 invoked by uid 500); 19 Dec 2005 00:29:13 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 40200 invoked by uid 500); 19 Dec 2005 00:29:12 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Delivered-To: moderator for java-user@lucene.apache.org Received: (qmail 65482 invoked by uid 99); 18 Dec 2005 14:44:59 -0000 X-ASF-Spam-Status: No, hits=0.0 required=10.0 tests=HTML_MESSAGE X-Spam-Check-By: apache.org Received-SPF: neutral (asf.osuosl.org: local policy) Message-ID: <43A575B8.4000903@pit.speedlinq.nl> Date: Sun, 18 Dec 2005 15:44:08 +0100 From: Laurens Pit User-Agent: Mozilla Thunderbird 1.0 (Windows/20041206) X-Accept-Language: en-us, en MIME-Version: 1.0 To: java-user@lucene.apache.org Subject: Re: Filtering after Query References: <43A43705.8080606@pit.speedlinq.nl> <200512171827.05095.paul.elschot@xs4all.nl> In-Reply-To: <200512171827.05095.paul.elschot@xs4all.nl> Content-Type: multipart/alternative; boundary="------------070301020302030607080407" X-Virus-Checked: Checked by ClamAV on apache.org X-Spam-Rating: minotaur.apache.org 1.6.2 0/1000/N --------------070301020302030607080407 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Hi Paul, W.r.t. ConstantScoringQuery, it contains a minor bug: it doesn't the handle the case where the Filter.bits method would return null. I think the ConstantScorer should look like: public boolean next() throws IOException { doc = (bits == null) ? doc+1 : bits.nextSetBit(doc+1); return doc >= 0; } public boolean skipTo(int target) throws IOException { doc = (bits == null) ? target : bits.nextSetBit(target); // requires JDK 1.4 return doc >= 0; } Anyways, while the ConstantScoringQuery does offer a great new feature, it does not do what I want: when my custom filter bits method is called, I still have to create a BitSet object with size reader.maxDoc(), which value is the total number of documents in the index, and then determine for each document if it should be included or excluded. This is performance killing for me. I have the same problem with HitCollector: I'd need to go through /all/ documents. After playing around a bit, I think I could solve this by adding TermQuery's as the last terms to a BooleanQuery, but that would mean I'd also have to store certain (security related) values in separate fields in the index. I could live with that, if I'm seeing this right: when I set the boost factor of those fields to 0, then it won't affect the scoring, right? Regards, Cret Paul Elschot wrote: >On Saturday 17 December 2005 17:04, Cret Hummin wrote: > > >>Hi All, >> >>When using Searcher.search(Query, Filter), and I use my own custom >>filter, it appears I'm presented with /all/ the documents in the index, >>i.e. in the method bits(IndexReader reader) from my custom Filter, the >>value of reader.maxDoc() is always the number of documents in the index. >>The same is true when do Searcher.search(FilteredQuery(Query, Filter)). >> >>Is it possible to filter /after/ the query has limited the number of >>possible documents, /before/ returning a Hits collection? >> >> > >The easiest way to do this is by adding a required clause to a BooleanQuery. >You might consider using a ConstantScoringQuery for this clause: >http://issues.apache.org/jira/browse/LUCENE-383 > >In case you really want to filter only the documents that match a query >you'll need to implement a filtering HitCollector and use it on the >lower level search API. >An easier way to implement such a filtering HitCollector could be >by adding to it the search methods that return a Hits as an alternative >to Filter. >A disadvantage of this approach is that skipTo() cannot be used >to combine the filter and the query, see also here: >http://issues.apache.org/jira/browse/LUCENE-330 > >Regards, >Paul Elschot > > >--------------------------------------------------------------------- >To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org >For additional commands, e-mail: java-user-help@lucene.apache.org > > > > --------------070301020302030607080407--