lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Thomas Morton (JIRA)" <>
Subject [jira] Commented: (LUCENE-1297) Allow other string distance measures in spellchecker
Date Sat, 31 May 2008 15:45:44 GMT


Thomas Morton commented on LUCENE-1297:

I think the dice coefficient would be nice to have.  I'm not sure the jaccard index makes
sense in the context of spelling correction since order isn't captured.  I implemented JaroWinkler
since I'm suggesting proper names and it does a good job with those.

With the StringDistance interface defined, anyone can implement the distance measure however
they want.  What I think would be very useful is weighted version of edit distance with the
weights tuned to your target language/domain.  Also with support in solr for specifying this
parameter in the SpellCheckRequestHandler, changing this just becomes a config change.

> Allow other string distance measures in spellchecker
> ----------------------------------------------------
>                 Key: LUCENE-1297
>                 URL:
>             Project: Lucene - Java
>          Issue Type: New Feature
>          Components: contrib/spellchecker
>    Affects Versions: 2.4
>         Environment: n/a
>            Reporter: Thomas Morton
>            Assignee: Otis Gospodnetic
>            Priority: Minor
>             Fix For: 2.4
>         Attachments: string_distance.patch
> Updated spelling code to allow for other string distance measures to be used.
> Created StringDistance interface.
> Modified existing Levenshtein distance measure to implement interface (and renamed class).
> Verified that change to Levenshtein distance didn't impact runtime performance.
> Implemented Jaro/Winkler distance metric
> Modified SpellChecker to take distacne measure as in constructor or in set method and
to use interface when calling.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message