lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Robert Muir <rcm...@gmail.com>
Subject Re: Lucene DirectSpellChecker strange behavior
Date Tue, 07 Jun 2016 15:24:13 GMT
Its just a heuristic: that it does not allow 2 edits
(insertion/deletion/substitution/transposition) to the word if the first
character differs (
https://github.com/apache/lucene-solr/blob/master/lucene/suggest/src/java/org/apache/lucene/search/spell/DirectSpellChecker.java#L411).
So when it goes back for n=2, it requires the first character to match.

At least at the time the thing was written, this has a very large impact on
performance, because otherwise too much of the term dictionary must be
inspected and its much slower. The idea is, it won't hurt too much on
quality, for the same reasons that many of these string distance functions
incorporate a bias towards the matching prefix (e.g. jaro winkler).


On Tue, Jun 7, 2016 at 5:20 AM, Caroline Collet <caroline.collet@pertimm.com
> wrote:

> Hello,
>
> I have a very strange behavior when I use the DirectSpellChecker of
> Lucene. I have set the prefixLength to 0. I have indexed only one item with
> one field : brand=samsung.
> I have tried to make requests with spelling mistakes inside.
>
> When I search for "smsng" I obtain "samsung" which is logical since I only
> have 2 corrections to make to obtain "samsung"
> When I search for "amsung" I obtain "samsung" since I have set the
> prefixLenght to 0
> But when I search "amung" which only has 2 errors, I do not obtain
> "samsung", I obtain nothing.
>
> I don't understand this behaviour, it is like no other correction is
> permitted if the first letter is misspelled.
>
> Did I miss some parameters of the spellchecker that could explain this
> behavior?
>
> I precise that I use :
> - Lucene 5.5.0
> - JRE 1.8
>
> Thank you in advance for taking time to answer my question,
> Bests regards,
> --
> [image: PERTIMM] <http://www.pertimm.com/fr/>
>
> Caroline Collet
> Ingénieur développement
>
> Tel : +33 (0)1 80 04 82 89
> <caroline.collet@pertimm.com>caroline.collet@pertimm.com
> http://www.pertimm.com/fr/
>
> Pertimm
> 51, boulevard Voltaire
> 92600 Asnières-Sur-Seine, France
>
>
>

Mime
  • Unnamed multipart/related (inline, None, 0 bytes)
View raw message