lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Cédrik LIME (JIRA) <>
Subject [jira] Commented: (LUCENE-1183) TRStringDistance uses way too much memory (with patch)
Date Mon, 26 May 2008 09:41:56 GMT


Cédrik LIME commented on LUCENE-1183:

All of Bob's FuzzyTermEnum patch is in my patch. I only left some smallish optimizations that
didn't bring much but did hurt code readability. In other words, should you commit my patch,
you will have most of (99.9%) LUCENE-691.
I think this is an important patch for Lucene 2.4, as it brings vast performance improvements
in fuzzy search (no hard numbers, sorry).

> TRStringDistance uses way too much memory (with patch)
> ------------------------------------------------------
>                 Key: LUCENE-1183
>                 URL:
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: contrib/*
>    Affects Versions: 1.9, 2.0.0, 2.1, 2.2, 2.3
>            Reporter: Cédrik LIME
>            Assignee: Otis Gospodnetic
>            Priority: Minor
>         Attachments: FuzzyTermEnum.patch,, TRStringDistance.patch
>   Original Estimate: 0.17h
>  Remaining Estimate: 0.17h
> The implementation of TRStringDistance is based on version 2.1 of org.apache.commons.lang.StringUtils#getLevenshteinDistance(String,
String), which uses an un-optimized implementation of the Levenshtein Distance algorithm (it
uses way too much memory). Please see Bug 38911 (
for more information.
> The commons-lang implementation has been heavily optimized as of version 2.2 (3x speed-up).
I have reported the new implementation to TRStringDistance.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message