lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Robert Muir (JIRA)" <j...@apache.org>
Subject [jira] Commented: (LUCENE-2507) automaton spellchecker
Date Fri, 01 Oct 2010 05:25:33 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-2507?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12916787#action_12916787
] 

Robert Muir commented on LUCENE-2507:
-------------------------------------

By the way, out of curiousity i tested an alternative configuration, DirectSpellChecker with
.setMaxEdits(1)

With this "lighter" configuration:
||impl||Number correct (out of 547)||Number correct, inverted (out of 547)||Avg time in ms||
|DirectSpellChecker(n=1)|165|432|1.83ms|

So here, you have the flexibility to have essentially the same performance as the existing
spellchecker,
and the false positive rate is hugely reduced (in this contrived test). You trade off only
being able to
catch 77% of the suggestions relative to the old spellchecker... but this might be good for
setups
that feel the n=2 default is too aggressive.

And again, like the original configuration, you have no index to rebuild at all.


> automaton spellchecker
> ----------------------
>
>                 Key: LUCENE-2507
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2507
>             Project: Lucene - Java
>          Issue Type: New Feature
>          Components: contrib/spellchecker
>            Reporter: Robert Muir
>            Assignee: Robert Muir
>             Fix For: 4.0
>
>         Attachments: LUCENE-2507.patch, LUCENE-2507.patch, LUCENE-2507.patch, LUCENE-2507.patch
>
>
> The current spellchecker makes an n-gram index of your terms, and queries this for spellchecking.
> The terms that come back from the n-gram query are then re-ranked by an algorithm such
as Levenshtein.
> Alternatively, we could just do a levenshtein query directly against the index, then
we wouldn't need
> a separate index to rebuild.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message