lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Shai Erera <ser...@gmail.com>
Subject Re: Scoring exact matches higher in a stemmed field
Date Fri, 16 Jul 2010 17:01:07 GMT
Depends for which query no? ;)

Sounds like you want to simulate the QP behavior
http://lucene.apache.org/java/2_4_0/queryparsersyntax.html for
boosting. Meaning, if for the query "b" you want to simulate the query
"b OR b$^2" and have matches of b$ count more than b, then I'd follow
how QP does it - create the query programmatically or something (I'm
not near the code at the moment so I cannot give a more concrete
approach).

If you want b and b$ to count the same, then that's already the
behavior - i.e., docs containing both will score higher.

If I misunderstood your question, then plea correct me.

Shai

On Friday, July 16, 2010, Itamar Syn-Hershko <itamar@code972.com> wrote:
> Hi all,
>
>
> Consider the following string: "the buffalo buffaloes" [1].
>
>
> When passed through a stemming analyzer, the resulting token would be "buffalo buffalo"
(assuming a good stemmer).
>
>
> To enable exact searches, say I mark the original term and index it at the same term
position. So "the buffalo buffaloes" -> (buffalo buffalo$) (buffalo buffaloes$) - now exact
searches are allowed on the same field without having 2 different fields [2].
>
>
> However, with this approach default scoring isn't working well. What is my best option
at upgrading a match for an exact match of this sort, also when using the same stemming analyzer,
without using payloads on the marked token?
>
>
> In other words - how do I make documents containing "the buffalo buffaloes" considered
more relevant than docs containing the word "buffalo" only once?
>
>
> The trick here is to boost the marked token if found at search time. While this sounds
easy to do, I can't find the best approach on implementing this - esp. since Similarity.float
Idf(Index.Term term, Searcher searcher) seem to have been deprecated for some reason.
>
>
> Itamar.
>
>
> [1] http://en.wikipedia.org/wiki/Buffalo_buffalo_Buffalo_buffalo_buffalo_buffalo_Buffalo_buffalo
:)
>
> [2] Rationale: http://www.code972.com/blog/2010/07/more-flexible-hebrew-indexing-hebmorph/
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message