lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Simon Willnauer (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (LUCENE-4813) Allow DirectSpellchecker to use totalTermFrequency rather than docFrequency
Date Fri, 08 Mar 2013 13:42:12 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-4813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13597123#comment-13597123
] 

Simon Willnauer commented on LUCENE-4813:
-----------------------------------------

bq. Can we do without the FieldStatistics/DocFreqStatistics/etc and just change 'freq' to
long?
I really appreciate the fact that this is an object that I can pass in for several reasons.
First you can just plug in your own stats if you want to and it pulls a terms object only
once that I can provide. In my usecase I call the same instance of DirectSpellChecker in the
same request multiple times to generate candidates and that way I can just keep my Terms /
TermsEnum instance reused which is a small but yet important cost IMO which can in my expert
case help. For the users this that have used this class before nothing really changes unless
you want to go to totalTermFreq as their stats but we can make this simple. We can also make
these classes package private I am totally ok with this to hide this small complexity here
from the average user but enable the expert user. API stays the same and if sumTotalTermFreq
is available you also get it in the SuggestWord. I would not want to fork this entire code
just for the sake of being able to reuse these statistics etc. if hiding this from the user
is the problem then lets move to pkg private. if its just you "feeling" this is a too big
of a change for the sake then I am not moving sorry.
                
> Allow DirectSpellchecker to use totalTermFrequency rather than docFrequency
> ---------------------------------------------------------------------------
>
>                 Key: LUCENE-4813
>                 URL: https://issues.apache.org/jira/browse/LUCENE-4813
>             Project: Lucene - Core
>          Issue Type: Bug
>          Components: modules/spellchecker
>    Affects Versions: 4.1
>            Reporter: Simon Willnauer
>             Fix For: 4.2, 5.0
>
>         Attachments: LUCENE-4813.patch, LUCENE-4813.patch
>
>
> we have a bunch of new statistics in on our term dictionaries that we should make use
of where it makes sense. For DirectSpellChecker totalTermFreq and sumTotalTermFreq might be
better suited for spell correction on top of a fulltext index than docFreq and maxDoc

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message