tomcat-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
Subject DO NOT REPLY [Bug 51400] Use of "new String(byte[] b, String enc)" hits Sun JVM bottleneck
Date Thu, 23 Jun 2011 14:47:42 GMT

--- Comment #10 from Christopher Schultz <> 2011-06-23 14:47:42
UTC ---
> would caching charset misses be a good idea, if the Encoding strings can also
> be received from external sources?

+1 to Konstantin Prei├čer's DOS concerns.

> However, there is static method Charset.availableCharsets() which returns a
> SortedMap<String, Charset> of all charsets available by the current JVM. Maybe
> this list could be used to build a Map of all available charsets (the aliases
> returned by Charset.aliases() would also have to be added)? Then missing
> charsets could also be found fast.

If you read some of the online posts linked from this BZ issue, you'll see
claims that pre-populating such a cache does not have a noticeable impact on
performance. Honestly, I'm okay not pre-populating things because there are
probably a dozen encodings that get any significant amount of real use on the
web, while Charset.availableCharsets returns 163 different obscure character

I suppose it's a fairly small set of encodings, but with little benefit,
there's no reason IMO to pre-populate.

> However, I think, in B2CConverter.getCharset() the encoding string should be
> converted to lower-case/upper-case before a lookup in the Map, to avoid
> multiple entries ("uTF-8", "UtF-8" etc.).

Actually, I might leave the case in-tact for performance considerations. Yes,
it's true that utf-8, UTF-8, uTf-8, UTf-8, UtF-8, etc. would all be the same, I
suspect that only "utf-8" and "UTF-8" will be used in the wild with any
reasonable frequency. Normalizing case for every lookup is probably a waste of
time, unless there are significant concerns of DOS using long, non-normalized
permutations of valid encodings (longest is x-MacCentralEurope with 17
characters to play with). 17 characters is a lot of permutations (~2MiB),

Configure bugmail:
------- You are receiving this mail because: -------
You are the assignee for the bug.
To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message