lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Karl Wettin <karl.wet...@gmail.com>
Subject Re: indexing fields with multiplicity
Date Wed, 29 Aug 2007 19:53:07 GMT

29 aug 2007 kl. 21.37 skrev Tim Sturge:

> That's exactly my question. I feel like
>
> for (i = 0 ; i < XXXX ; i++) {
> document.add(new Field("anchor","USA"));
> }
>
> is exactly equivalent to
>
> field = new Field("anchor","USA"));
> field.setBoost(YYYY);
> document.add(field);
>
> but I don't know the function that relates XXXX and YYYY. I feel  
> like there's a correct information-theorectical answer and I'd like  
> to know what it is.

You would have to refactor norm(t,d) in this computation:

http://lucene.zones.apache.org:8080/hudson/job/Lucene-Nightly/javadoc/ 
org/apache/lucene/search/Similarity.html

However, field boost is merged in to the document boost, so it might  
not translate that easy as you want. Perhaps payloads and  
BoostingTermQuery fits your needs better.


-- 
karl



>
> Tim
>
> Karl Wettin wrote:
>>
>> 29 aug 2007 kl. 19.13 skrev Tim Sturge:
>>
>>> I'm looking for a boost when the anchor text is more commonly  
>>> associated with one topic than another. For example the United  
>>> States of America
>>> is called "USA" by a lot of people. The United Space Alliance is  
>>> also called "USA" but by many less people.
>>>
>>> If I just index them both with "USA" once, they will rank  
>>> equally. I want the United States of America to rank higher.
>>
>> Why not use Field#setBoost(float)?
>>
>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message