Return-Path: Delivered-To: apmail-jakarta-lucene-user-archive@www.apache.org Received: (qmail 68403 invoked from network); 7 Feb 2004 22:30:12 -0000 Received: from daedalus.apache.org (HELO mail.apache.org) (208.185.179.12) by minotaur-2.apache.org with SMTP; 7 Feb 2004 22:30:12 -0000 Received: (qmail 68696 invoked by uid 500); 7 Feb 2004 22:29:41 -0000 Delivered-To: apmail-jakarta-lucene-user-archive@jakarta.apache.org Received: (qmail 68659 invoked by uid 500); 7 Feb 2004 22:29:41 -0000 Mailing-List: contact lucene-user-help@jakarta.apache.org; run by ezmlm Precedence: bulk List-Unsubscribe: List-Subscribe: List-Help: List-Post: List-Id: "Lucene Users List" Reply-To: "Lucene Users List" Delivered-To: mailing list lucene-user@jakarta.apache.org Received: (qmail 68577 invoked from network); 7 Feb 2004 22:29:40 -0000 Received: from unknown (HELO server007.webpack.hosteurope.de) (80.237.130.15) by daedalus.apache.org with SMTP; 7 Feb 2004 22:29:40 -0000 Received: from 80.129.72.224 (p508148E0.dip0.t-ipconnect.de [80.129.72.224]) (authenticated bits=0) by server007.webpack.hosteurope.de (8.12.8/8.12.8) with ESMTP id i17MThgu013999 for ; Sat, 7 Feb 2004 23:29:43 +0100 Date: Sat, 7 Feb 2004 23:32:35 +0100 From: Ramy Hardan X-Mailer: The Bat! (v2.00.6) Reply-To: Ramy Hardan X-Priority: 3 (Normal) Message-ID: <17640017652.20040207233235@hardan.de> To: "Lucene Users List" Subject: Search Refinement Approaches In-Reply-To: <5952EE4C-58C3-11D8-A28B-000393A564E6@ehatchersolutions.com> References: <00c501c3ecbd$5827e710$0401a8c0@antioch> <5952EE4C-58C3-11D8-A28B-000393A564E6@ehatchersolutions.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-Spam-Rating: daedalus.apache.org 1.6.2 0/1000/N X-Spam-Rating: minotaur-2.apache.org 1.6.2 0/1000/N Hi, Reviewing javadocs and previous posts, search refinement or 'search within search' is best done with a Filter. To fill the Filter's BitSet with the results of a search, a HitCollector is the obvious solution. Unfortunately when using HitCollector I have to implement all the functionality the Hits class usually provides myself. Is there an efficient way to search refinement preferably without losing the Hits class? I can think of the following approaches: - Don't use Hits: collect all scores and document numbers with a HitCollector and sort them by score after the search. Retrieve the needed documents from IndexReader via document number. - Use Hits: Briefly examining the source reveals this possiblilty: subclass BitSet and override the boolean get(int bitIndex) method to additionally set the bit at bitIndex in another BitSet. Use this subclass in a Filter and initialize it with all ones (in the first search). This way I can tell which documents are tested by the IndexSearcher against the Filter by examining the second BitSet and use it as a Filter for the refining search. Here's a scetch of this for clarification: public class FilterBitSet extends BitSet { private BitSet bitsForRefiningFilter; public boolean get( int bitIndex ) { boolean result = super.get( bitIndex ); if (result) bitsForRefiningFilter.set( bitIndex ); return result; } } Is this really possible? (might be more of a question for dev) Last question about document numbers: When and how exactly do they change? The javadoc states they change upon addition and deletion. May I assume that a particular document number is stable as long as it is not changed (deleted and added) although other documents are added/deleted and optimize() is NOT called? If yes, is this about to change in the foreseeable future? Thanks in advance Ramy --------------------------------------------------------------------- To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org For additional commands, e-mail: lucene-user-help@jakarta.apache.org