Mailing-List: contact lucene-dev-help@jakarta.apache.org; run by ezmlm
Precedence: bulk
Reply-To: "Lucene Developers List" <lucene-dev@jakarta.apache.org>
Message-ID: <40744498.9010507@detego-software.de>
Date: Wed, 07 Apr 2004 20:12:40 +0200
From: Christoph Goller <goller@detego-software.de>
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.5) Gecko/20031009
MIME-Version: 1.0
To: Lucene Developers List <lucene-dev@jakarta.apache.org>
Subject: Re: TermDocs.skipTo()
References: <406DA9EE.6090600@detego-software.de> <4071C4C0.30907@apache.org>
In-Reply-To: <4071C4C0.30907@apache.org>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 8bit

Doug,

Daniel found a bug today and therefore I reviewed skipTo once again.
Here are some further things to consider:

*) MultiTermDocs.skipTo could easily be optimized too, couldn\x{00B4}t it?

*) SegmentTermDocs: skipStream never closed

*) SegmentTermPositions: seek(Terminfo): probably should always make
proxCount = 0;

*) I think due to your last changes SegmentTermDocs makes one skip less than is 
required? However, I haven�t tested this.

while (target > skipDoc && skipCount < numSkips) {
         lastSkipDoc = skipDoc;
         lastFreqPointer = freqPointer;
         lastProxPointer = proxPointer;

         if (skipDoc != 0 && skipDoc >= doc)
           numSkipped += skipInterval;

         skipDoc += skipStream.readVInt();
         freqPointer += skipStream.readVInt();
         proxPointer += skipStream.readVInt();

         skipCount++;
       }

       // if we found something to skip, then skip it
       if (lastFreqPointer > freqStream.getFilePointer()) {
         freqStream.seek(lastFreqPointer);
         skipProx(lastProxPointer);

         doc = lastSkipDoc;
         count += numSkipped;
       }

Consider exit of while because of skipCount == numSkips. Then doc becomes 
lastSkipDoc not skipDoc!

*) PhraseScorer.skipTo jumps one doc too far because of call to sort() which 
calls next for each PhrasePosition. Here is Daniels test that demonstrates this:

public class DanielBug {

   private final static String DIR = "/tmp/testindex";

   public static void main (String[] args) throws Exception {
     Analyzer a = new StandardAnalyzer();
     IndexWriter iw = new IndexWriter(DIR, a, true);

     Document d = new Document();
     // 0 hits only if this field contains the same value as
     // the same field in the next document:
     d.add(new Field("source", "marketing info", true, true, true));
     iw.addDocument(d);

     d = new Document();
     d.add(new Field("contents", "foobar", true, true, true));
     d.add(new Field("source", "marketing info", true, true, true));
     iw.addDocument(d);

     iw.optimize();
     iw.close();
     System.out.println("Indexing Done");

     IndexSearcher is = new IndexSearcher(DIR);
     Query q = QueryParser.parse("+contents:foobar +source:\"marketing info\"", 
"", a);
     Hits hits = is.search(q);
     System.out.println("q="+q);
     System.out.println("hits="+hits.length());
   }

}

Instead of 1 hit, 0 hits are found with 1.4rc2, while 1.3 finds the hit. I
committed the necessary change to PhraseScorer already and it fixes the problem.

Unfortunately, I haven�t found the time to restructure the IndexReaders so far.
Hopefully tomorrow :-)

Christoph


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org