lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From markharw00d <markharw...@yahoo.co.uk>
Subject Re: highlight - scoring fragments with more of the same token
Date Tue, 26 Sep 2006 07:11:12 GMT
If you were to score repeated terms then I suspect it would have to be 
done so that the repetitions didn't score as highly as the first 
occurrence - otherwise f2 could be selected as a better fragment than f3 
for the query q1 in your example.
Repetitions of a term in a fragment could be scored as a very small 
fraction of the score given to the first occurrence. This would at least 
rank  f2 higher than f1 for query q2.
Another potentially useful ranking factor may be to boost fragments 
found at the beginning of a document - that's where people tend to write 
summaries or introductions.


Doron Cohen wrote:
> This question was raised in the user's list -
> http://www.nabble.com/highlighting-tf2322109.html
>
> Assume three fragments and two queries:
>   f1 = aa  11  bb  33  cc
>   f2 = aa  11  bb  11  cc
>   f3 = aa  11  bb  22  cc
>   q1 = 11 22
>   q2 = 11
> Now we call highlighter.getBestFragment(q);
> For q1, f3 is returned, as expected.
> For q2, f1 is returned, although "11" appears twice in f2 but only once in
> f1.
>
> This is because QueryScorer.getTokenScore(Token) counts only unique
> fragment tokens.
>
> Would it make sense to make this behavior controllable?
> (It is easily done but I am not sure about the consequences.)
>
> Or perhaps there is a way to achieve this behavior (preferring f2 on f1 for
> q2 above) that I missed?
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>
>
>
>   



		
___________________________________________________________ 
Copy addresses and emails from any email account to Yahoo! Mail - quick, easy and free. http://uk.docs.yahoo.com/trueswitch2.html

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message