lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From karl wettin <>
Subject Re: boosting fields
Date Thu, 27 Apr 2006 07:49:20 GMT

26 apr 2006 kl. 19.18 skrev Doug Cutting:

> karl wettin wrote:
>>>> How about refactoring fields to something like:
>>>> [Document](fieldName)<#>---- {0..1} ->[Field +boost]<#>----
>>>> {0..*}  -> [FieldValue +store +index +termVector]
> If you think you have a simple, back-compatible way to do this,  
> please submit a patch.  Perhaps it is simpler than I imagined.
>>> Long-term, an API which supports per token boosting will  
>>> probably  prove useful, as a part of #11 on http:// 
>>> lucene/Lucene2Whiteboard.
>> I've wanted that feature a few times. Let me know if there is   
>> something I can do to help when the time is right.
> The time will be right as soon as someone decides they want to  
> implement this!  Ideally every part of the index would be  
> pluggable, but the most important is postings, so probably we  
> should start there.
> My idea is that the logic of DocumentWriter

I would prefer to leave out the persistence and deprication from the  
discussion until the rest is solved, as I spend all my spare brain  
time on the InstanciatedIndex-thingy.

> and also probably a no-positions version, a no-freqs version and a  
> weight-per-position version.  TermFreqs and TermPositions should be  
> replaced with a generic Postings API.  Applications can then  
> downcast the Postings instance based on the FieldInfo.

This is much more interesting from my point of view. Let's start here.

I might be wrong and I really don't know why it is so bad, but I  
think casting based on FieldInfo would be breaking the Liskov  
subtituion principle in big way.

My own immediate thought is to compromise by allowing boost per term  
in document. Simply remove the norms-methods from the IndexReader and  
add a new one to the TermEnum and fall back on the field boost. How  
would the value be picked up by the scorer?

Boost per position, et.c. sounds very expensive.

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message