lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Christian Ziech (JIRA)" <j...@apache.org>
Subject [jira] [Created] (LUCENE-8219) LevenshteinAutomata should estimate the number of states and transitions
Date Thu, 22 Mar 2018 14:44:00 GMT
Christian Ziech created LUCENE-8219:
---------------------------------------

             Summary: LevenshteinAutomata should estimate the number of states and transitions
                 Key: LUCENE-8219
                 URL: https://issues.apache.org/jira/browse/LUCENE-8219
             Project: Lucene - Core
          Issue Type: Improvement
            Reporter: Christian Ziech


Currently the toAutomaton() method of the LevenshteinAutomata class uses the default constructor
of the Automaton although it exactly knows how many states the automaton will have and can
do a reasonable estimation on how many transitions it will need as well.

I suggest changing the lines {code:language=java|firstline=154|linenumbers=true}
// the number of states is based on the length of the word and n
int numStates = description.size();

Automaton a = new Automaton();
int lastState;
{code}
to
{code:language=java|firstline=154|linenumbers=true}
// the number of states is based on the length of the word and n
final int numStates = description.size();
final int numTransitions = numStates * Math.min(1 + 2 * n, alphabet.length);
final int prefixStates = prefix != null ? prefix.codePointCount(0, prefix.length()) : 0;

final Automaton a = new Automaton(numStates + prefixStates, numTransitions);
int lastState;
{code}
For my test data this cut down on the total amount of memory needed for int arrays by factor
4. The estimation of "1 + 2 * editDistance" should maybe rather be replaced by a value coming
from the ParametricDescription used.




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message