lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Kevin Lawson (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (LUCENE-4947) Java implementation (and improvement) of Levenshtein & associated lexicon automata
Date Thu, 02 May 2013 14:10:16 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-4947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13647554#comment-13647554
] 

Kevin Lawson commented on LUCENE-4947:
--------------------------------------

I had some free time yesterday and updated LevenshteinAutomaton to include transpositions
in its edit distance determination methods. The included tests have also been updated to accommodate
the modifications and ensure correct functionality.

How do I go about submitting the updated code? Push an update to my github? Simply attach
the new archive here? I'm not sure where the code donation/acceptance process is at now, so
I'm unsure of how to do this.
                
> Java implementation (and improvement) of Levenshtein & associated lexicon automata
> ----------------------------------------------------------------------------------
>
>                 Key: LUCENE-4947
>                 URL: https://issues.apache.org/jira/browse/LUCENE-4947
>             Project: Lucene - Core
>          Issue Type: Improvement
>    Affects Versions: 4.0-ALPHA, 4.0-BETA, 4.0, 4.1, 4.2, 4.2.1
>            Reporter: Kevin Lawson
>         Attachments: LevenshteinAutomaton-master.zip, MDAG-master.zip
>
>
> I was encouraged by Mike McCandless to open an issue concerning this after I contacted
him privately about it. Thanks Mike!
> I'd like to submit my Java implementation of the Levenshtein Automaton as a homogenous
replacement for the current heterogenous, multi-component implementation in Lucene.
> Benefits of upgrading include 
> - Reduced code complexity
> - Better performance from components that were previously implemented in Python
> - Support for on-the-fly dictionary-automaton manipulation (if you wish to use my dictionary-automaton
implementation)
> The code for all the components is well structured, easy to follow, and extensively commented.
It has also been fully tested for correct functionality and performance.
> The levenshtein automaton implementation (along with the required MDAG reference) can
be found in my LevenshteinAutomaton Java library here: https://github.com/klawson88/LevenshteinAutomaton.
> The minimalistic directed acyclic graph (MDAG) which the automaton code uses to store
and step through word sets can be found here: https://github.com/klawson88/MDAG
> *Transpositions aren't currently implemented. I hope the comment filled, editing-friendly
code combined with the fact that the section in the Mihov paper detailing transpositions is
only 2 pages makes adding the functionality trivial.
> *As a result of support for on-the-fly manipulation, the MDAG (dictionary-automaton)
creation process incurs a slight speed penalty. In order to have the best of both worlds,
i'd recommend the addition of a constructor which only takes sorted input. The complete, easy
to follow pseudo-code for the simple procedure can be found in the first article I linked
under the references section in the MDAG repository)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message