lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Christoph Goller <>
Subject Re: TermDocs.skipTo()
Date Wed, 07 Apr 2004 18:12:40 GMT

Daniel found a bug today and therefore I reviewed skipTo once again.
Here are some further things to consider:

*) MultiTermDocs.skipTo could easily be optimized too, couldn\x{00B4}t it?

*) SegmentTermDocs: skipStream never closed

*) SegmentTermPositions: seek(Terminfo): probably should always make
proxCount = 0;

*) I think due to your last changes SegmentTermDocs makes one skip less than is 
required? However, I haven´t tested this.

while (target > skipDoc && skipCount < numSkips) {
         lastSkipDoc = skipDoc;
         lastFreqPointer = freqPointer;
         lastProxPointer = proxPointer;

         if (skipDoc != 0 && skipDoc >= doc)
           numSkipped += skipInterval;

         skipDoc += skipStream.readVInt();
         freqPointer += skipStream.readVInt();
         proxPointer += skipStream.readVInt();


       // if we found something to skip, then skip it
       if (lastFreqPointer > freqStream.getFilePointer()) {;

         doc = lastSkipDoc;
         count += numSkipped;

Consider exit of while because of skipCount == numSkips. Then doc becomes 
lastSkipDoc not skipDoc!

*) PhraseScorer.skipTo jumps one doc too far because of call to sort() which 
calls next for each PhrasePosition. Here is Daniels test that demonstrates this:

public class DanielBug {

   private final static String DIR = "/tmp/testindex";

   public static void main (String[] args) throws Exception {
     Analyzer a = new StandardAnalyzer();
     IndexWriter iw = new IndexWriter(DIR, a, true);

     Document d = new Document();
     // 0 hits only if this field contains the same value as
     // the same field in the next document:
     d.add(new Field("source", "marketing info", true, true, true));

     d = new Document();
     d.add(new Field("contents", "foobar", true, true, true));
     d.add(new Field("source", "marketing info", true, true, true));

     System.out.println("Indexing Done");

     IndexSearcher is = new IndexSearcher(DIR);
     Query q = QueryParser.parse("+contents:foobar +source:\"marketing info\"", 
"", a);
     Hits hits =;


Instead of 1 hit, 0 hits are found with 1.4rc2, while 1.3 finds the hit. I
committed the necessary change to PhraseScorer already and it fixes the problem.

Unfortunately, I haven´t found the time to restructure the IndexReaders so far.
Hopefully tomorrow :-)


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message