commons-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Gary Gregory (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CODEC-174) Improve performance of Beider Morse encoder
Date Sun, 10 Nov 2013 02:25:17 GMT

    [ https://issues.apache.org/jira/browse/CODEC-174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13818330#comment-13818330
] 

Gary Gregory commented on CODEC-174:
------------------------------------

For the other Clirr reports, does this breakage warrant a 2.0 label or can we file the changes
under the fact that these classes are internal to the codec? Should these classes be Javadoc'd
as package private? The changes seem inherent in the implementation of the performance improvements.

> Improve performance of Beider Morse encoder
> -------------------------------------------
>
>                 Key: CODEC-174
>                 URL: https://issues.apache.org/jira/browse/CODEC-174
>             Project: Commons Codec
>          Issue Type: Improvement
>    Affects Versions: 1.6, 1.7
>            Reporter: Thomas Champagne
>              Labels: patch, performance
>         Attachments: CODEC-174-change-rules-storage-to-Map.patch, CODEC-174-delete-subsequence-cache-and-use-String.patch,
CODEC-174-delete-subsequence-cache.patch, CODEC-174-reuse-set-in-PhonemeBuilder.patch, CODEC_174_cleanup.patch,
TestCacheSubSequence.java, test-commons-codec-test-bm.zip
>
>
> I use Beider Morse encoder with Solr. When it indexes a lot of documents using this encoder,
the import time is multiplied by 30. So, I have decided to optimize the current implementation
in the commons-codec.
> Currently, I have created two patch. The first patch delete a "performance hack" about
a subsequence cache. This cache doesn't optimize performance and after deleting it, you can
win some milliseconds.
> The second patch changes the storage of the rules in memory using a Map instead of List.
With it, you can access to a rule directly with the beginning of pattern. This patch divide
the encoding time by 2.
> I will try to find more improvement. If you have any idea, please tell me it.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Mime
View raw message