lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Simon Willnauer (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (LUCENE-3846) Fuzzy suggester
Date Fri, 12 Oct 2012 15:29:03 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-3846?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13475072#comment-13475072
] 

Simon Willnauer commented on LUCENE-3846:
-----------------------------------------

just for the record here are my benchmark numbers for the latest branch code:

{noformat}
Test class requires enabled assertions, enable globally (-ea) or for Solr/Lucene subpackages
only: org.apache.lucene.search.suggest.LookupBenchmarkTest
-- prefixes: 6-9, num: 7, onlyMorePopular: true
FuzzySuggester  queries: 50001, time[ms]: 4650 [+- 12.56], ~kQPS: 11
AnalyzingSuggester queries: 50001, time[ms]: 444 [+- 1.89], ~kQPS: 113
JaspellLookup   queries: 50001, time[ms]: 181 [+- 0.96], ~kQPS: 275
TSTLookup       queries: 50001, time[ms]: 229 [+- 2.35], ~kQPS: 218
FSTCompletionLookup queries: 50001, time[ms]: 245 [+- 3.54], ~kQPS: 204
WFSTCompletionLookup queries: 50001, time[ms]: 121 [+- 1.72], ~kQPS: 413
-- prefixes: 100-200, num: 7, onlyMorePopular: true
FuzzySuggester  queries: 50001, time[ms]: 5432 [+- 20.86], ~kQPS: 9
AnalyzingSuggester queries: 50001, time[ms]: 403 [+- 1.47], ~kQPS: 124
JaspellLookup   queries: 50001, time[ms]: 129 [+- 1.24], ~kQPS: 389
TSTLookup       queries: 50001, time[ms]: 68 [+- 4.03], ~kQPS: 739
FSTCompletionLookup queries: 50001, time[ms]: 254 [+- 2.60], ~kQPS: 197
WFSTCompletionLookup queries: 50001, time[ms]: 82 [+- 1.03], ~kQPS: 610
-- construction time
FuzzySuggester  input: 50001, time[ms]: 450 [+- 1.86]
AnalyzingSuggester input: 50001, time[ms]: 449 [+- 1.82]
JaspellLookup   input: 50001, time[ms]: 40 [+- 3.80]
TSTLookup       input: 50001, time[ms]: 111 [+- 3.33]
FSTCompletionLookup input: 50001, time[ms]: 213 [+- 4.36]
WFSTCompletionLookup input: 50001, time[ms]: 156 [+- 2.08]
-- prefixes: 2-4, num: 7, onlyMorePopular: true
FuzzySuggester  queries: 50001, time[ms]: 3571 [+- 12.15], ~kQPS: 14
AnalyzingSuggester queries: 50001, time[ms]: 997 [+- 5.73], ~kQPS: 50
JaspellLookup   queries: 50001, time[ms]: 494 [+- 2.25], ~kQPS: 101
TSTLookup       queries: 50001, time[ms]: 1846 [+- 9.67], ~kQPS: 27
FSTCompletionLookup queries: 50001, time[ms]: 221 [+- 1.57], ~kQPS: 227
WFSTCompletionLookup queries: 50001, time[ms]: 457 [+- 9.05], ~kQPS: 109
-- RAM consumption
FuzzySuggester  size[B]:      889,138
AnalyzingSuggester size[B]:      889,138
JaspellLookup   size[B]:    9,815,128
TSTLookup       size[B]:    9,858,792
FSTCompletionLookup size[B]:      466,520
WFSTCompletionLookup size[B]:      507,640
{noformat}
                
> Fuzzy suggester
> ---------------
>
>                 Key: LUCENE-3846
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3846
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>             Fix For: 4.1
>
>         Attachments: LUCENE-3846_fuzzy_analyzing.patch, LUCENE-3846.patch, LUCENE-3846.patch,
LUCENE-3846.patch, LUCENE-3846.patch, LUCENE-3846.patch
>
>
> Would be nice to have a suggester that can handle some fuzziness (like spell correction)
so that it's able to suggest completions that are "near" what you typed.
> As a first go at this, I implemented 1T (ie up to 1 edit, including a transposition),
except the first letter must be correct.
> But there is a penalty, ie, the "corrected" suggestion needs to have a much higher freq
than the "exact match" suggestion before it can compete.
> Still tons of nocommits, and somehow we should merge this / make it work with analyzing
suggester too (LUCENE-3842).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message