Return-Path: Delivered-To: apmail-jakarta-lucene-dev-archive@www.apache.org Received: (qmail 80904 invoked from network); 7 Apr 2004 18:14:05 -0000 Received: from daedalus.apache.org (HELO mail.apache.org) (208.185.179.12) by minotaur-2.apache.org with SMTP; 7 Apr 2004 18:14:05 -0000 Received: (qmail 29954 invoked by uid 500); 7 Apr 2004 18:13:55 -0000 Delivered-To: apmail-jakarta-lucene-dev-archive@jakarta.apache.org Received: (qmail 29940 invoked by uid 500); 7 Apr 2004 18:13:53 -0000 Mailing-List: contact lucene-dev-help@jakarta.apache.org; run by ezmlm Precedence: bulk List-Unsubscribe: List-Subscribe: List-Help: List-Post: List-Id: "Lucene Developers List" Reply-To: "Lucene Developers List" Delivered-To: mailing list lucene-dev@jakarta.apache.org Received: (qmail 29923 invoked from network); 7 Apr 2004 18:13:52 -0000 Received: from unknown (HELO moutng.kundenserver.de) (212.227.126.185) by daedalus.apache.org with SMTP; 7 Apr 2004 18:13:52 -0000 Received: from [212.227.126.207] (helo=mrelayng.kundenserver.de) by moutng.kundenserver.de with esmtp (Exim 3.35 #1) id 1BBHYi-0004hv-00 for lucene-dev@jakarta.apache.org; Wed, 07 Apr 2004 20:13:56 +0200 Received: from [82.135.0.158] (helo=detego-software.de) by mrelayng.kundenserver.de with asmtp (TLSv1:RC4-MD5:128) (Exim 3.35 #1) id 1BBHYi-0005q0-00 for lucene-dev@jakarta.apache.org; Wed, 07 Apr 2004 20:13:56 +0200 Message-ID: <40744498.9010507@detego-software.de> Date: Wed, 07 Apr 2004 20:12:40 +0200 From: Christoph Goller User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.5) Gecko/20031009 X-Accept-Language: de, en-us, en, de-at MIME-Version: 1.0 To: Lucene Developers List Subject: Re: TermDocs.skipTo() References: <406DA9EE.6090600@detego-software.de> <4071C4C0.30907@apache.org> In-Reply-To: <4071C4C0.30907@apache.org> X-Enigmail-Version: 0.76.7.0 X-Enigmail-Supports: pgp-inline, pgp-mime Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 8bit X-Provags-ID: kundenserver.de abuse@kundenserver.de auth:12f525e90d51bb735119ab4626f6800d X-Spam-Rating: daedalus.apache.org 1.6.2 0/1000/N X-Spam-Rating: minotaur-2.apache.org 1.6.2 0/1000/N Doug, Daniel found a bug today and therefore I reviewed skipTo once again. Here are some further things to consider: *) MultiTermDocs.skipTo could easily be optimized too, couldn\x{00B4}t it? *) SegmentTermDocs: skipStream never closed *) SegmentTermPositions: seek(Terminfo): probably should always make proxCount = 0; *) I think due to your last changes SegmentTermDocs makes one skip less than is required? However, I haven�t tested this. while (target > skipDoc && skipCount < numSkips) { lastSkipDoc = skipDoc; lastFreqPointer = freqPointer; lastProxPointer = proxPointer; if (skipDoc != 0 && skipDoc >= doc) numSkipped += skipInterval; skipDoc += skipStream.readVInt(); freqPointer += skipStream.readVInt(); proxPointer += skipStream.readVInt(); skipCount++; } // if we found something to skip, then skip it if (lastFreqPointer > freqStream.getFilePointer()) { freqStream.seek(lastFreqPointer); skipProx(lastProxPointer); doc = lastSkipDoc; count += numSkipped; } Consider exit of while because of skipCount == numSkips. Then doc becomes lastSkipDoc not skipDoc! *) PhraseScorer.skipTo jumps one doc too far because of call to sort() which calls next for each PhrasePosition. Here is Daniels test that demonstrates this: public class DanielBug { private final static String DIR = "/tmp/testindex"; public static void main (String[] args) throws Exception { Analyzer a = new StandardAnalyzer(); IndexWriter iw = new IndexWriter(DIR, a, true); Document d = new Document(); // 0 hits only if this field contains the same value as // the same field in the next document: d.add(new Field("source", "marketing info", true, true, true)); iw.addDocument(d); d = new Document(); d.add(new Field("contents", "foobar", true, true, true)); d.add(new Field("source", "marketing info", true, true, true)); iw.addDocument(d); iw.optimize(); iw.close(); System.out.println("Indexing Done"); IndexSearcher is = new IndexSearcher(DIR); Query q = QueryParser.parse("+contents:foobar +source:\"marketing info\"", "", a); Hits hits = is.search(q); System.out.println("q="+q); System.out.println("hits="+hits.length()); } } Instead of 1 hit, 0 hits are found with 1.4rc2, while 1.3 finds the hit. I committed the necessary change to PhraseScorer already and it fixes the problem. Unfortunately, I haven�t found the time to restructure the IndexReaders so far. Hopefully tomorrow :-) Christoph --------------------------------------------------------------------- To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org For additional commands, e-mail: lucene-dev-help@jakarta.apache.org