lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Marvin Humphrey <>
Subject Re: Payloads
Date Fri, 22 Dec 2006 16:35:56 GMT

On Dec 21, 2006, at 1:58 PM, Ning Li wrote:

> Storing all the posting content, e.g. frequencies and positions, in a
> single file greatly simplifies things. However, this could cause some
> performance penalty. For example, boolean query 'Apache AND Lucene'
> would have to paw through positions. But position indexing for Apache
> and Lucene is necessary to support phrase query '"Apache Lucene"'.

Precision would be enhanced if boolean scoring took position into  
account, and could be further enhanced if each position were assigned  
a boost.  For that purpose, having everything in one file is an  
advantage, as it cuts down disk seeks.  Turn off freqs, positions,  
and boosts, and you have only doc_nums, which is ideal for matching  
rather than scoring, yielding a performance gain.

What's being considered doesn't really speak to the motivation of  
improving existing core functionality, though.  It's more about  
expanding the API to allow new applications.

Marvin Humphrey
Rectangular Research

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message