commons-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Fabian Lange (JIRA)" <>
Subject [jira] [Commented] (LANG-935) Possible performance improvement on string escape functions
Date Sat, 14 Mar 2015 10:03:38 GMT


Fabian Lange commented on LANG-935:

for example
    private static final String ESCAPE_HTML4 = StringEscapeUtils.escapeHtml4("a string with
some amounts of html special chars äüö'&>< ");

    public String testMethod() {
       return StringEscapeUtils.unescapeHtml4(ESCAPE_HTML4);

which performs a backwards translation of html, which all starts with &, which matches
your edge case, still results in this:

Result: 53780.401 ±(99.9%) 1764.483 ops/s [Average]
  Statistics: (min, avg, max) = (53417.388, 53780.401, 54581.485), stdev = 458.231
  Confidence interval (99.9%): [52015.918, 55544.885]

Result: 220500.387 ±(99.9%) 22247.764 ops/s [Average]
  Statistics: (min, avg, max) = (213460.824, 220500.387, 228476.282), stdev = 5777.674
  Confidence interval (99.9%): [198252.624, 242748.151]

thats a 4 times improvement.

Only a very limited amount of edgy 1 char replacement cases are not showing clear winners
(some show indeed the new code be slightly slower).

So I guess its time to make a decision right?

Imagine it would be the other way around. My code would be master and the current code would
be patch. Would you be willing to massively slow down most of the real world use cases for
the translators for some limited char edge cases?

> Possible performance improvement on string escape functions
> -----------------------------------------------------------
>                 Key: LANG-935
>                 URL:
>             Project: Commons Lang
>          Issue Type: Improvement
>          Components: lang.text.translate.*
>    Affects Versions: 3.1
>            Reporter: Peter Wall
>            Priority: Minor
>              Labels: performance
>             Fix For: Patch Needed
>         Attachments:
> The escape functions for HTML etc. use the same code and the same initialisation tables
for the escape and unescape functions, and while this is an elegant approach it leads to a
number of deficiencies:
> 1. The code is very much less efficient than it could be
> 2. A new output string is created even when no conversion is required
> 3. No mapping is provided for characters that do not have a specific representation (for
example HTML 0x101 should become &amp;#257; )
> The proposal is to use a new mapping technique to address these issues

This message was sent by Atlassian JIRA

View raw message