lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Grant Ingersoll (JIRA)" <>
Subject [jira] Commented: (LUCENE-1183) TRStringDistance uses way too much memory (with patch)
Date Thu, 21 Feb 2008 13:39:19 GMT


Grant Ingersoll commented on LUCENE-1183:

It occurs to me that we apparently have two different implementations of Levenshtein, one
in spellchecker and one for FuzzyQuery.  I haven't analyzed them individually to know for
sure, but if this is a much better implementation, then we should think about using it for
FuzzyQuery, too.  

The FuzzyQuery (FuzzyTermEnum) version claims to have a fast-fail mechanism, too:
<p>Embedded within this algorithm is a fail-fast Levenshtein distance
   * algorithm.  The fail-fast algorithm differs from the standard Levenshtein
   * distance algorithm in that it is aborted if it is discovered that the
   * mimimum distance between the words is greater than some threshold.
   * <p>

Cedrik, since you seem to know about these things, would you have time to look at FuzzyTermEnum?
 A 3x speedup there would be great for users, too.

> TRStringDistance uses way too much memory (with patch)
> ------------------------------------------------------
>                 Key: LUCENE-1183
>                 URL:
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: contrib/*
>    Affects Versions: 1.9, 2.0.0, 2.1, 2.2, 2.3
>            Reporter: C├ędrik LIME
>         Attachments:, TRStringDistance.patch
>   Original Estimate: 0.17h
>  Remaining Estimate: 0.17h
> The implementation of TRStringDistance is based on version 2.1 of org.apache.commons.lang.StringUtils#getLevenshteinDistance(String,
String), which uses an un-optimized implementation of the Levenshtein Distance algorithm (it
uses way too much memory). Please see Bug 38911 (
for more information.
> The commons-lang implementation has been heavily optimized as of version 2.2 (3x speed-up).
I have reported the new implementation to TRStringDistance.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message