commons-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Fabian Lange (JIRA)" <>
Subject [jira] [Commented] (LANG-935) Possible performance improvement on string escape functions
Date Sat, 14 Mar 2015 09:54:38 GMT


Fabian Lange commented on LANG-935:

I can guarantee you it outperforms in all the benchmarks from this test, all unit tests (which
are way to few) and in all internal usages I have found.

If you say it performs slower in a very specific use case, then I am happy to address this,
but I have not found any so far. If you look at it from an algorithmic complexitx point, you
will find that my patch is significantly better.
Imagine this, the old lets take a hash code will also have to iterate all chars, because non
of the substring string hashcodes is pre-populate. In fact it turns out the current implementation
performs worse with character iteration in ALL cases, just because the current algorithm requires
My patch does skip a lot iteration in many frequent use case completely.

So please back up you objection with a concrete example we can benchmark. Yes many equally
sized strings with same first character are not benefitting as much as the other use cases,
but they in fact do.

> Possible performance improvement on string escape functions
> -----------------------------------------------------------
>                 Key: LANG-935
>                 URL:
>             Project: Commons Lang
>          Issue Type: Improvement
>          Components: lang.text.translate.*
>    Affects Versions: 3.1
>            Reporter: Peter Wall
>            Priority: Minor
>              Labels: performance
>             Fix For: Patch Needed
>         Attachments:
> The escape functions for HTML etc. use the same code and the same initialisation tables
for the escape and unescape functions, and while this is an elegant approach it leads to a
number of deficiencies:
> 1. The code is very much less efficient than it could be
> 2. A new output string is created even when no conversion is required
> 3. No mapping is provided for characters that do not have a specific representation (for
example HTML 0x101 should become &amp;#257; )
> The proposal is to use a new mapping technique to address these issues

This message was sent by Atlassian JIRA

View raw message