Return-Path: Delivered-To: apmail-jakarta-lucene-dev-archive@www.apache.org Received: (qmail 75782 invoked from network); 3 Mar 2004 20:00:41 -0000 Received: from daedalus.apache.org (HELO mail.apache.org) (208.185.179.12) by minotaur-2.apache.org with SMTP; 3 Mar 2004 20:00:41 -0000 Received: (qmail 45912 invoked by uid 500); 3 Mar 2004 20:00:28 -0000 Delivered-To: apmail-jakarta-lucene-dev-archive@jakarta.apache.org Received: (qmail 45887 invoked by uid 500); 3 Mar 2004 20:00:28 -0000 Mailing-List: contact lucene-dev-help@jakarta.apache.org; run by ezmlm Precedence: bulk List-Unsubscribe: List-Subscribe: List-Help: List-Post: List-Id: "Lucene Developers List" Reply-To: "Lucene Developers List" Delivered-To: mailing list lucene-dev@jakarta.apache.org Received: (qmail 45869 invoked from network); 3 Mar 2004 20:00:27 -0000 Received: from unknown (HELO smtp-out2.xs4all.nl) (194.109.24.12) by daedalus.apache.org with SMTP; 3 Mar 2004 20:00:27 -0000 Received: from k7l.local (porta.xs4all.nl [80.127.24.69]) by smtp-out2.xs4all.nl (8.12.10/8.12.10) with ESMTP id i23K0VOG033025 for ; Wed, 3 Mar 2004 21:00:31 +0100 (CET) From: Paul Elschot To: "Lucene Developers List" Subject: Re: Queries with only non required terms: not as OR? Date: Wed, 3 Mar 2004 21:00:30 +0100 User-Agent: KMail/1.5.4 References: <20040302193315.26480.qmail@web12702.mail.yahoo.com> <200403022319.01431.paul.elschot@xs4all.nl> <40461A38.5040207@apache.org> In-Reply-To: <40461A38.5040207@apache.org> MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Content-Disposition: inline Message-Id: <200403032100.30956.paul.elschot@xs4all.nl> X-Spam-Rating: daedalus.apache.org 1.6.2 0/1000/N X-Spam-Rating: minotaur-2.apache.org 1.6.2 0/1000/N Doug, On Wednesday 03 March 2004 18:47, Doug Cutting wrote: > Paul Elschot wrote: > > I read a bit into the source code and I found this comment at > > BooleanQuery.scorer(): > > > > // Also, at this point a > > // BooleanScorer cannot be embedded in a ConjunctionScorer, as the hits > > // from a BooleanScorer are not always sorted by document number (sigh) > > // and hence BooleanScorer cannot implement skipTo() correctly, which is > > // required by ConjunctionScorer. > > > > The test function I used assumes that documents will be collected in > > order. Could this be the source of the problem? > > It could be. I'll make the test search in the array of doc nrs that it receives now. > I only realized recently that BooleanScorer does some local reordering > of document numbers passed to the HitCollector. There's no easy fix. I assume it works correctly, so why fix it, except for speed? > When I get a chance I intend to rewrite BooleanScorer to fix this and to > correctly implement skipTo(). The result will be somewhat slower for You might find the previously posted test code to be a test case for that. It's nice to see a possible real use this :) even though I was doing something wrong. > some queries, especially those with a large number of optional terms, > but will sometimes be faster when it's nested in other queries, and > skipTo() can be leveraged. I would like to get to this in next few When the two cases can be distinguished, you might try and leave the current method in for the large number of optional terms. I like speed, and I guess I'm not the only one. Also, with the term vectors in CVS one might expect more queries with optional terms resulting from relevance feedback methods. > weeks, and then make a 1.4 RC1 release. The fix will take a few days > work. If I can find someone to fund the work it may happen sooner. > Right now other projects have higher priority for me. Lucene is moving fast enough for me... Thanks a lot, Paul. --------------------------------------------------------------------- To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org For additional commands, e-mail: lucene-dev-help@jakarta.apache.org