lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Erick Erickson (JIRA)" <j...@apache.org>
Subject [jira] [Resolved] (LUCENE-1532) File based spellcheck with doc frequencies supplied
Date Sat, 16 Mar 2013 18:40:12 GMT

     [ https://issues.apache.org/jira/browse/LUCENE-1532?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Erick Erickson resolved LUCENE-1532.
------------------------------------

    Resolution: Won't Fix

SPRING_CLEANING_2013 we can reopen if necessary. 
                
> File based spellcheck with doc frequencies supplied
> ---------------------------------------------------
>
>                 Key: LUCENE-1532
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1532
>             Project: Lucene - Core
>          Issue Type: New Feature
>          Components: modules/spellchecker
>            Reporter: David Bowen
>            Priority: Minor
>
> The file-based spellchecker treats all words in the dictionary as equally valid, so it
can suggest a very obscure word rather than a more common word which is equally close to the
misspelled word that was entered.  It would be very useful to have the option of supplying
an integer with each word which indicates its commonness.  I.e. the integer could be the document
frequency in some index or set of indexes.
> I've implemented a modification to the spellcheck API to support this by defining a DocFrequencyInfo
interface for obtaining the doc frequency of a word, and a class which implements the interface
by looking up the frequency in an index.  So Lucene users can provide alternative implementations
of DocFrequencyInfo.  I could submit this as a patch if there is interest.  Alternatively,
it might be better to just extend the spellcheck API to have a way to supply the frequencies
when you create a PlainTextDictionary, but that would mean storing the frequencies somewhere
when building the spellcheck index, and I'm not sure how best to do that.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message