lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Doug Cutting <cutt...@lucene.com>
Subject Re: Confused by writePostings/SegmentTermDocs.next()
Date Tue, 02 Dec 2003 17:24:21 GMT
Simon Cozens wrote:
>     However, I'm a bit stuck at the moment.
> 
> The "document code" is written by writePostings in DocumentWriter:
> 
>         int f = posting.freq;
>         if (f == 1)                               // optimize freq=1
>           freq.writeVInt(1);                      // set low bit of doc num.
>         else {
>           freq.writeVInt(0);                      // the document number
>           freq.writeVInt(f);                      // frequency in doc
>         }
> 
> So that integer with the low bit filed off is *always* going to be zero.
> Which means that the returned set of documents is always going to have the
> IDs set to zero, which is precisely what's happening in my Perl port. But
> I'd rather like it to have the right document ID, which is 1.

DocumentWriter optimizes a particular case, where the document number is 
always zero.  The general case is in SegmentMerger.java:

  int docCode = (doc - lastDoc) << 1;       // use low bit to flag freq=1
  lastDoc = doc;

  int freq = postings.freq();
  if (freq == 1) {
    freqOutput.writeVInt(docCode | 1);      // write doc & freq=1
  } else {
    freqOutput.writeVInt(docCode);          // write doc
    freqOutput.writeVInt(freq);             // write frequency in doc
  }

I hope this makes more sense.

Doug




---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org


Mime
View raw message