lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Doug Cutting <>
Subject Re: Payloads
Date Fri, 22 Dec 2006 18:36:40 GMT
Ning Li wrote:
> The draft proposal seems to suggest the following (roughly):
>  A dictionary entry is <Term, FilePointer>.

Perhaps this ought to be <Term, TermInfo>, where TermInfo contains a 
FilePointer and perhaps other information (e.g., frequency data).

>  A posting entry for a term in a document is <Doc, PostingContent>.
> Classes which implement PostingFormat decide the format of PostingContent.


> Is it a good idea to allow PostingFormat to decide whether and how to
> store posting content in multiple files?

Ideally, yes.  The easiest way to do this would be to have separate 
files in each segment for each PostingFormat.  It would be better if 
different posting formats could share files, but that's harder to 

Alternately we could force all postings into a single file per segment. 
  That would simplify the APIs, but prohibit certain file formats, like 
the one Lucene uses currently.

So the ideal solution would permit both different formats to either 
share a file, or to use their own file(s).  Is it worth the complexity 
this would add to the API?  Or should we jettison the notion of multiple 
posting files per segment?


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message