commons-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Gary Gregory (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CODEC-174) Improve performance of Beider Morse encoder
Date Tue, 12 Nov 2013 20:56:17 GMT

    [ https://issues.apache.org/jira/browse/CODEC-174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13820452#comment-13820452
] 

Gary Gregory commented on CODEC-174:
------------------------------------

Applied patch {{CODEC-174-delete-subsequence-cache-and-use-String.patch}} for a small (2-3%)
but consistent performance gain.

{noformat}
commit -m "[CODEC-174] Small (2.3%) but consistent performance gain with this patch from https://issues.apache.org/jira/secure/attachment/12612838/CODEC-174-delete-subsequence-cache-and-use-String.patch.
The nicer aspect of the patch is that it simplifies the code." C:/vcs/svn/apache/commons/trunks-proper/codec/src/main/java/org/apache/commons/codec/language/bm/PhoneticEngine.java
    Sending        C:/vcs/svn/apache/commons/trunks-proper/codec/src/main/java/org/apache/commons/codec/language/bm/PhoneticEngine.java
    Transmitting file data ...
    Committed revision 1541231.
{noformat}


> Improve performance of Beider Morse encoder
> -------------------------------------------
>
>                 Key: CODEC-174
>                 URL: https://issues.apache.org/jira/browse/CODEC-174
>             Project: Commons Codec
>          Issue Type: Improvement
>    Affects Versions: 1.6, 1.7
>            Reporter: Thomas Champagne
>              Labels: patch, performance
>         Attachments: CODEC-174-change-rules-storage-to-Map.patch, CODEC-174-delete-subsequence-cache-and-use-String.patch,
CODEC-174-delete-subsequence-cache.patch, CODEC-174-refactor-join-method-in-Phoneme.patch,
CODEC-174-refactor-restrictTo-method-in-SomeLanguages.patch, CODEC-174-reuse-set-in-PhonemeBuilder.patch,
CODEC_174_cleanup.patch, TestCacheSubSequence.java, test-commons-codec-test-bm.zip
>
>
> I use Beider Morse encoder with Solr. When it indexes a lot of documents using this encoder,
the import time is multiplied by 30. So, I have decided to optimize the current implementation
in the commons-codec.
> Currently, I have created two patch. The first patch delete a "performance hack" about
a subsequence cache. This cache doesn't optimize performance and after deleting it, you can
win some milliseconds.
> The second patch changes the storage of the rules in memory using a Map instead of List.
With it, you can access to a rule directly with the beginning of pattern. This patch divide
the encoding time by 2.
> I will try to find more improvement. If you have any idea, please tell me it.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Mime
View raw message