lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Robert Muir (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (LUCENE-3846) Fuzzy suggester
Date Fri, 12 Oct 2012 15:47:04 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-3846?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13475090#comment-13475090
] 

Robert Muir commented on LUCENE-3846:
-------------------------------------

This looks really good (especially if it holds up for a larger fst!) Thanks for hacking
at the intersectPrefixPaths to get us going.

One caveat: the whole thing is optimized for the case where the "query analyzer" 
is very simple, particularly where there are no positionIncrement=0 tokens.

So if you use a query analyzer with say a synonymsfilter, then you hit the
"nocommit: how slow can this be?". We should maybe benchmark that. if its
really horrible we could just throw an exception and document that you cannot
use synonyms etc in your query analyzer (use them at index-time instead) to
prevent performance traps...

                
> Fuzzy suggester
> ---------------
>
>                 Key: LUCENE-3846
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3846
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>             Fix For: 4.1
>
>         Attachments: LUCENE-3846_fuzzy_analyzing.patch, LUCENE-3846.patch, LUCENE-3846.patch,
LUCENE-3846.patch, LUCENE-3846.patch, LUCENE-3846.patch
>
>
> Would be nice to have a suggester that can handle some fuzziness (like spell correction)
so that it's able to suggest completions that are "near" what you typed.
> As a first go at this, I implemented 1T (ie up to 1 edit, including a transposition),
except the first letter must be correct.
> But there is a penalty, ie, the "corrected" suggestion needs to have a much higher freq
than the "exact match" suggestion before it can compete.
> Still tons of nocommits, and somehow we should merge this / make it work with analyzing
suggester too (LUCENE-3842).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message