commons-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Thomas Champagne (JIRA)" <>
Subject [jira] [Commented] (CODEC-174) Improve performance of Beider Morse encoder
Date Mon, 04 Nov 2013 16:11:19 GMT


Thomas Champagne commented on CODEC-174:

The patches are based on top of trunk :
All tests pass after applying patches.
With the test program, I obtain this result :
Without patch :
{quote}Time for encoding 20000 times the input 'Angelo' : 10537 ms{quote}
With the patch delete-subsequence-cache : 
{quote}Time for encoding 20000 times the input 'Angelo' : 8997 ms{quote}
With the patch : change-rules-storage-to-Map : 
{quote}Time for encoding 20000 times the input 'Angelo' : 4979 ms{quote}

> Improve performance of Beider Morse encoder
> -------------------------------------------
>                 Key: CODEC-174
>                 URL:
>             Project: Commons Codec
>          Issue Type: Improvement
>    Affects Versions: 1.6, 1.7
>            Reporter: Thomas Champagne
>              Labels: patch, performance
>         Attachments: CODEC-174-change-rules-storage-to-Map.patch, CODEC-174-delete-subsequence-cache.patch,
> I use Beider Morse encoder with Solr. When it indexes a lot of documents using this encoder,
the import time is multiplied by 30. So, I have decided to optimize the current implementation
in the commons-codec.
> Currently, I have created two patch. The first patch delete a "performance hack" about
a subsequence cache. This cache doesn't optimize performance and after deleting it, you can
win some milliseconds.
> The second patch changes the storage of the rules in memory using a Map instead of List.
With it, you can access to a rule directly with the beginning of pattern. This patch divide
the encoding time by 2.
> I will try to find more improvement. If you have any idea, please tell me it.

This message was sent by Atlassian JIRA

View raw message