lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mike Cawson (JIRA)" <j...@apache.org>
Subject [jira] Created: (LUCENE-2256) Fuzzy search result ranking
Date Mon, 08 Feb 2010 21:17:28 GMT
Fuzzy search result ranking
---------------------------

                 Key: LUCENE-2256
                 URL: https://issues.apache.org/jira/browse/LUCENE-2256
             Project: Lucene - Java
          Issue Type: Improvement
          Components: Search
    Affects Versions: 3.0
         Environment: all
            Reporter: Mike Cawson


When a search term is expanded into a set of alternatives (using Fuzzy, Range, Prefix or Wildcard
queries), the user really wants documents that have any one of the alternatives (ideally the
exact one typed). She is not asking for the document that contains the maximum number of different
alternatives, but that is how the scoring works.

The problem is that the SHOULD directive doesn't implement an OR between alternatives but
an AND/OR.

frederick~ alderwood~ expands to something like:
(frederick frederich^0.9 fredereck^0.9) (alderwood elderwood^0.9 underwood^0.8)

A document containing frederick, frederich and fredereck would score more highly than one
with the exact search terms, frederick and alderwood, yet it only satisfies one of the user's
two query terms.

The problem is not the same as issue 329 but is caused by the scores for all of the expanded
terms being summed. What is required is the maximum score for any of the alternatives for
each term, summed across all terms.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message