lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Rads2029 <radh...@gmail.com>
Subject Re: Modifying score based on tf and slop
Date Mon, 13 Jul 2009 12:12:17 GMT

Hi all,

I modified the setFreqCurrentDoc method of SpanScorer  as follows: 
( Frequency is updated only for the shortest span )

 int minMatchLenght=-1;
	    do {
	      int matchLength = spans.end() - spans.start();
	      if(minMatchLenght==-1)minMatchLenght=matchLength;
	      else if(minMatchLenght>matchLength)minMatchLenght=matchLength;
	      more = spans.next();
	    } while (more && (doc == spans.doc()));
	    freq = getSimilarity().sloppyFreq(minMatchLenght);

Now, if a term occurs more than once in my search field, my score is not
boosted up- This is what I want. 
However , just for overriding this one method, I had to create the following
new classes

1)CustomSpanScorer extending Scorer ( setFreqCurrentDoc  is overriden)
2)CustomSpanWeight extends SpanWeight ( scorer is overriden to use
CustomSpanScorer )
3)CustomSpanQuery extends SpanQuery( createWeight is overriden to use
createWeight)
4) CustomSpanNearQuery extends CustomSpanQuery ( All methods are repeated)
5)CustomNearSpansOrdered  - No change from  NearSpansOrdered 
6)CustomNearSpansUnOrdered  - No change from  NearSpansUnOrdered

Please let me know if this is the correct way to go about this


Rads2029 wrote:
> 
> Hi all,
> 
> All I have is a query running on a document with a single field which
> has some search value. This is all which will be present.
> No more documents / fields.
> 
> I have the following specific requirements
> 
> 1) Length of document should not affect score - Implemented as per
> lucene documentation using concept of Fair Similairty by making
> lengthnorm as 1
> 
> 2) The no of times a term in the query  occurs in the search field
> should not affect the score
> 
> 3) I am using the spannearquery. Hence the slop should affect the score.
> 
> 
> I implemented 2) by changing the tf to return 1 if freq >0 .
> 
> But this adversely affects  3) as the slop value is factored into the
> tf ( as per what I can see in the span scorer)
> 
> 
> How can I ensure the frequency of a certain term does not affect the
> score while at the same ensuring that the slop does affect it ?
> 
> 
> Thanks,
> Radha
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
> 
> 
> 

-- 
View this message in context: http://www.nabble.com/Modifying-score-based-on-tf-and-slop-tp23412168p24460573.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message