lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Doug Cutting <cutt...@apache.org>
Subject Re: boosting fields
Date Wed, 26 Apr 2006 17:18:55 GMT
karl wettin wrote:
>> karl wettin wrote:
>>
>>> This could lead me to believe I can use different boost for  fields  
>>> with the same name within one document.
>>
>> You can.  The values are multiplied to produce the final boost  value 
>> for the field.
> 
> It's not really the same thing as I tried to describe though.

No, it's not, you're right.

>>> How about refactoring fields to something like:
>>> [Document](fieldName)<#>---- {0..1} ->[Field +boost]<#>---- {0..*}
 
>>> -> [FieldValue +store +index +termVector]
>>
>>
>> That would be a big, incompatible change to one of Lucene's primary  
>> APIs, no?
> 
> Not if I got it right in my head. Then it's really just a matter of  
> handling deprication. The field-methods in Document could be the same.

If you think you have a simple, back-compatible way to do this, please 
submit a patch.  Perhaps it is simpler than I imagined.

>> Long-term, an API which supports per token boosting will probably  
>> prove useful, as a part of #11 on http://wiki.apache.org/jakarta- 
>> lucene/Lucene2Whiteboard.
> 
> I've wanted that feature a few times. Let me know if there is  something 
> I can do to help when the time is right.

The time will be right as soon as someone decides they want to implement 
this!  Ideally every part of the index would be pluggable, but the most 
important is postings, so probably we should start there.

My idea is that the logic of DocumentWriter.invertDocument() remain much 
the same, and that DocumentWriter.addPosition() is replaced with a 
method on a pluggable class.  So invertDocument() would keep a 
FieldIndexer for each field and call a method like addPosition() for 
each token found.  (We might add a boost field to Token that's passed 
into this method.)  Then, at the end, invertDocument() would flush all 
of the FieldIndexers().  SegmentMerger would need to be changed 
similarly.  Implementing FieldIndexers that can sensibly share output 
files may be tricky.  We should implement FieldIndexers that are 
back-compatible with the existing index format, and also probably a 
no-positions version, a no-freqs version and a weight-per-position 
version.  TermFreqs and TermPositions should be replaced with a generic 
Postings API.  Applications can then downcast the Postings instance 
based on the FieldInfo.

Doug

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message