lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Itamar Syn-Hershko <ita...@code972.com>
Subject Scoring exact matches higher in a stemmed field
Date Fri, 16 Jul 2010 15:28:10 GMT
Hi all,


Consider the following string: "the buffalo buffaloes" [1].


When passed through a stemming analyzer, the resulting token would be 
"buffalo buffalo" (assuming a good stemmer).


To enable exact searches, say I mark the original term and index it at 
the same term position. So "the buffalo buffaloes" -> (buffalo buffalo$) 
(buffalo buffaloes$) - now exact searches are allowed on the same field 
without having 2 different fields [2].


However, with this approach default scoring isn't working well. What is 
my best option at upgrading a match for an exact match of this sort, 
also when using the same stemming analyzer, without using payloads on 
the marked token?


In other words - how do I make documents containing "the buffalo 
buffaloes" considered more relevant than docs containing the word 
"buffalo" only once?


The trick here is to boost the marked token if found at search time. 
While this sounds easy to do, I can't find the best approach on 
implementing this - esp. since Similarity.float Idf(Index.Term term, 
Searcher searcher) seem to have been deprecated for some reason.


Itamar.


[1] 
http://en.wikipedia.org/wiki/Buffalo_buffalo_Buffalo_buffalo_buffalo_buffalo_Buffalo_buffalo

:)

[2] Rationale: 
http://www.code972.com/blog/2010/07/more-flexible-hebrew-indexing-hebmorph/


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message