lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Michael McCandless (Commented) (JIRA)" <>
Subject [jira] [Commented] (LUCENE-3846) Fuzzy suggester
Date Sun, 04 Mar 2012 19:44:59 GMT


Michael McCandless commented on LUCENE-3846:

This is a nice benefit of the path-based best-first search (in the patch): it's easy to use
a custom cost matrix.  The cost can also be context-dependent too (based on past matched characters,
though not [easily] future ones).

We don't need to explore that now, before committing this, but it's nice that we'll have the
freedom to do so later.

Re-ranking definitely adds cost since you'll have to pull a bigger N.  I think we'll likely
have to somehow combine the cost and "isFuzzy" into a single cost, during the search, not
after (reranking).  Not sure how to do that yet...
> Fuzzy suggester
> ---------------
>                 Key: LUCENE-3846
>                 URL:
>             Project: Lucene - Java
>          Issue Type: Improvement
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>             Fix For: 3.6, 4.0
>         Attachments: LUCENE-3846.patch
> Would be nice to have a suggester that can handle some fuzziness (like spell correction)
so that it's able to suggest completions that are "near" what you typed.
> As a first go at this, I implemented 1T (ie up to 1 edit, including a transposition),
except the first letter must be correct.
> But there is a penalty, ie, the "corrected" suggestion needs to have a much higher freq
than the "exact match" suggestion before it can compete.
> Still tons of nocommits, and somehow we should merge this / make it work with analyzing
suggester too (LUCENE-3842).

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:!default.jspa
For more information on JIRA, see:


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message