lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Simon Willnauer (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (LUCENE-3846) Fuzzy suggester
Date Thu, 11 Oct 2012 16:23:03 GMT

     [ https://issues.apache.org/jira/browse/LUCENE-3846?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Simon Willnauer updated LUCENE-3846:
------------------------------------

    Attachment: LUCENE-3846.patch

here is a patch that adds the missing intersect method and adds several tests derived from
the AnalyzingSuggestorTest. The tests all pass at this point but I do get a weird failure
if I run the benchmarks. somehow the TopNSearcher runs into a bad state which I can't really
figure out.

this patch has several refactorings in AnalyzingSuggestor mainly to make testing easier in
the fuzzy case (encapuslated some stuff into package private methods etc.) Yet there are tons
of nocommits but at least we have something working. 

Regarding the failure, I see a NoSuchELementException from the "queue" in the top N searcher
that somehow removed the bottom and tries to pull the last element that doesn't exists. (stacktrace
below) Yet, the funky thing is that this doesn't happen if I run this with exactFirst=false
but the problem seems to be in the non-exactFirst part (see stacktrace). I use a direct intersection
for exactFirst in the fuzzy case so that code is identical to analyzing suggestor since the
intersection of the LD automaton doesn't return enough information to tell what is an exact
match. 

here is the stacktrace:

{code}
java.util.NoSuchElementException
	at java.util.TreeMap.key(TreeMap.java:1206)
	at java.util.TreeMap.lastKey(TreeMap.java:274)
	at java.util.TreeSet.last(TreeSet.java:384)
	at org.apache.lucene.util.fst.Util$TopNSearcher.addIfCompetitive(Util.java:339)
	at org.apache.lucene.util.fst.Util$TopNSearcher.search(Util.java:453)
	at org.apache.lucene.search.suggest.analyzing.AnalyzingSuggester.lookup(AnalyzingSuggester.java:581)
	at org.apache.lucene.search.suggest.LookupBenchmarkTest$2.call(LookupBenchmarkTest.java:228)
	at org.apache.lucene.search.suggest.LookupBenchmarkTest$2.call(LookupBenchmarkTest.java:1)
	at org.apache.lucene.search.suggest.LookupBenchmarkTest.measure(LookupBenchmarkTest.java:253)
	at org.apache.lucene.search.suggest.LookupBenchmarkTest.runPerformanceTest(LookupBenchmarkTest.java:224)
	at org.apache.lucene.search.suggest.LookupBenchmarkTest.testPerformanceOnPrefixes6_9(LookupBenchmarkTest.java:192)
NOTE: reproduce with: ant test  -Dtestcase=LookupBenchmarkTest -Dtests.method=testPerformanceOnPrefixes6_9
-Dtests.seed=B5BAF2A9592263BC -Dtests.locale=fi_FI -Dtests.timezone=Africa/Lagos -Dtests.file.encoding=UTF-8
NOTE: test params are: codec=Lucene40: {}, sim=DefaultSimilarity, locale=fi_FI, timezone=Africa/Lagos
NOTE: Linux 2.6.38-16-generic amd64/Sun Microsystems Inc. 1.6.0_26 (64-bit)/cpus=12,threads=1,free=578809008,total=1539571712
NOTE: All tests run in this JVM: [LookupBenchmarkTest]

{code}

mike if you get a chance it would be great if you could look into that one?!

                
> Fuzzy suggester
> ---------------
>
>                 Key: LUCENE-3846
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3846
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>             Fix For: 4.1
>
>         Attachments: LUCENE-3846_fuzzy_analyzing.patch, LUCENE-3846.patch, LUCENE-3846.patch,
LUCENE-3846.patch
>
>
> Would be nice to have a suggester that can handle some fuzziness (like spell correction)
so that it's able to suggest completions that are "near" what you typed.
> As a first go at this, I implemented 1T (ie up to 1 edit, including a transposition),
except the first letter must be correct.
> But there is a penalty, ie, the "corrected" suggestion needs to have a much higher freq
than the "exact match" suggestion before it can compete.
> Still tons of nocommits, and somehow we should merge this / make it work with analyzing
suggester too (LUCENE-3842).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message