commons-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Thomas Neidhart (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (LANG-935) Possible performance improvement on string escape functions
Date Sat, 14 Mar 2015 15:36:38 GMT

     [ https://issues.apache.org/jira/browse/LANG-935?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Thomas Neidhart updated LANG-935:
---------------------------------
    Attachment: LANG-935.patch

Attached is a small change to the PR from Fabian.

It simplifies the proposed change by keeping the most important part: do not perform the greedy
search if there is no translation rule starting with the current input.

Benchmark results with the examples:

PR version:
{noformat}
Benchmark                    Mode  Cnt       Score      Error  Units
LangBenchmark.testEscape    thrpt   10  176862.112 ± 4214.145  ops/s
LangBenchmark.testUnEscape  thrpt   10  122259.267 ± 3876.409  ops/s
{noformat}

Modified patch:
{noformat}
Benchmark                    Mode  Cnt       Score      Error  Units
LangBenchmark.testEscape    thrpt   10  177597.328 ± 3294.652  ops/s
LangBenchmark.testUnEscape  thrpt   10  135307.880 ± 4854.221  ops/s
{noformat}

It should perform slightly better than the PR, especially if there are many translation rules
with the same starting character, which benefits mainly the unescape methods.

The proposed patch should not have any negative impact on other uses of the class, but I would
be happy if others review the patch.

> Possible performance improvement on string escape functions
> -----------------------------------------------------------
>
>                 Key: LANG-935
>                 URL: https://issues.apache.org/jira/browse/LANG-935
>             Project: Commons Lang
>          Issue Type: Improvement
>          Components: lang.text.translate.*
>    Affects Versions: 3.1
>            Reporter: Peter Wall
>            Priority: Minor
>              Labels: performance
>             Fix For: Patch Needed
>
>         Attachments: LANG-935.patch, tempproject1.zip
>
>
> The escape functions for HTML etc. use the same code and the same initialisation tables
for the escape and unescape functions, and while this is an elegant approach it leads to a
number of deficiencies:
> 1. The code is very much less efficient than it could be
> 2. A new output string is created even when no conversion is required
> 3. No mapping is provided for characters that do not have a specific representation (for
example HTML 0x101 should become &amp;#257; )
> The proposal is to use a new mapping technique to address these issues



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message