lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Thomas Morton (JIRA)" <>
Subject [jira] Updated: (LUCENE-1550) Add N-Gram String Matching for Spell Checking
Date Sun, 22 Mar 2009 18:07:50 GMT


Thomas Morton updated LUCENE-1550:

    Attachment: LUCENE-1550.patch

2 seems a reasonable default.  Experiments in paper should comparable results for bi-grams
and tri-grams.  Made an empty constructor which sets n=2.

Yes that can be moved up without penalty.

That's a bug in the empty case.  Should return 0 unless both strings are empty.  I ported
this bug form the Levenstein Distance code.  It's now fixed in both and has unit tests in
both.  New patch attached.

Technically NGramDistance(1) is the same thing as LevensteinDistance but LevensteinDistance
code is more straight forward and may be slightly faster.

> Add N-Gram String Matching for Spell Checking
> ---------------------------------------------
>                 Key: LUCENE-1550
>                 URL:
>             Project: Lucene - Java
>          Issue Type: New Feature
>          Components: contrib/spellchecker
>    Affects Versions: 2.9
>            Reporter: Thomas Morton
>            Assignee: Grant Ingersoll
>            Priority: Minor
>             Fix For: 2.9
>         Attachments: LUCENE-1550.patch, LUCENE-1550.patch
> N-Gram version of edit distance based on paper by Grzegorz Kondrak, "N-gram similarity
and distance". Proceedings of the Twelfth International Conference on String Processing and
Information Retrieval (SPIRE 2005), pp. 115-126,  Buenos Aires, Argentina, November 2005.


This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message