commons-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Thomas Champagne (JIRA)" <>
Subject [jira] [Created] (CODEC-174) Improve performance of Beider Morse encoder
Date Mon, 04 Nov 2013 15:11:17 GMT
Thomas Champagne created CODEC-174:

             Summary: Improve performance of Beider Morse encoder
                 Key: CODEC-174
             Project: Commons Codec
          Issue Type: Improvement
    Affects Versions: 1.7, 1.6
            Reporter: Thomas Champagne

I use Beider Morse encoder with Solr. When it indexes a lot of documents using this encoder,
the import time is multiplied by 30. So, I have decided to optimize the current implementation
in the commons-codec.

Currently, I have created two patch. The first patch delete a "performance hack" about a subsequence
cache. This cache doesn't optimize performance and after deleting it, you can win some milliseconds.

The second patch changes the storage of the rules in memory using a Map instead of List. With
it, you can access to a rule directly with the beginning of pattern. This patch divide the
encoding time by 2.

I will try to find more improvement. If you have any idea, please tell me it.

This message was sent by Atlassian JIRA

View raw message