lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Marvin Humphrey <>
Subject Re: Payloads
Date Fri, 22 Dec 2006 19:48:05 GMT

On Dec 22, 2006, at 10:36 AM, Doug Cutting wrote:

> The easiest way to do this would be to have separate files in each  
> segment for each PostingFormat.  It would be better if different  
> posting formats could share files, but that's harder to coordinate.

The approach I'm taking in KinoSearch 0.20 is for each field to get  
its own postings file, named _XXX.pYYY, where "_XXX" is the segment  
name and "YYY" is the field number.  That allows a single decoder to  
be pointed at each file.  _XXX.frq and _XXX.prx have been eliminated.

One file per format would also work.

> Alternately we could force all postings into a single file per  
> segment.  That would simplify the APIs, but prohibit certain file  
> formats, like the one Lucene uses currently.

In theory, we could also have one file per property: doc num, freq,  
positions, boost, payload.  The base Posting object would have only  
document number, and each subclass would add a new property, and a  
new file.

I'm not sure that's better, as it precludes optimizations such as the  
even/odd trick currently used in _XXX.frq, but it merits mention as  
the conceptual opposite of having one file per format.

Matchers would be happy with that scheme no matter what.

> So the ideal solution would permit both different formats to either  
> share a file, or to use their own file(s).  Is it worth the  
> complexity this would add to the API?  Or should we jettison the  
> notion of multiple posting files per segment?

Does punting on this issue have any drawbacks other than an unknown  
performance impact?  Can we design the API so that we leave open the  
option of allowing the user to spec multiple files if that proves  
advantageous later?

Marvin Humphrey
Rectangular Research

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message