lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael McCandless <>
Subject Re: FuzzySuggester EXACT_FIRST criteria
Date Thu, 14 Nov 2013 16:05:41 GMT
On Wed, Nov 13, 2013 at 12:04 PM, Christian Reuschling
<> wrote:
> We started to implement a named entity recognition on the base of AnalyzingSuggester,
which offers
> the great support for Synonyms, Stopwords, etc.
> For this, we slightly modified AnalyzingSuggester.lookup() to only return the exactFirst
> (considering the exactFirst code block only, skipping the 'sameSurfaceForm' check and
break, to get
> the synonym hits too).
> This works pretty good, and our next step would be to bring in some fuzzyness against
> mistakes. For this, the idea was to do exactly the same, but with FuzzySuggester instead.
> Now we have the problem that 'EXCACT_FIRST' in FuzzySuggester not only relies on sharing
the same
> prefix - also different/misspelled terms inside the edit distance are considered as 'not
> which means we get the same results as with AnalyzingSuggester.
> query: "screen"
> misspelled query: "screan"
> dictionary: "screen", "screensaver"
> AnalyzingSuggester hits: screen, screensaver
> AnalyzingSuggester hits on misspelled query: <empty>
> AnalyzingSuggester EXACT_FIRST hits: screen
> AnalyzingSuggester EXACT_FIRST hits on misspelled query: <empty>
> FuzzySuggester hits: screen, screensaver
> FuzzySuggester hits on misspelled query: screen, screensaver
> FuzzySuggester EXACT_FIRST hits: screen
> FuzzySuggester EXACT_FIRST hits on misspelled query: <empty> => TARGET: screen
> Is there a possibility to distinguish? I see that the 'exact' criteria relies on an FST
> 'END_BYTE arc leaving'. Maybe these can be set differently when building the Levenshtein
automata? I
> have no clue.

It seems like the problem is that AnalyzingSuggester checks for
exactFirst before calling .getFullPrefixPaths (which, in
FuzzySuggester subclass, applies the fuzziness)?

Mike McCandless

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message