tomcat-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From bugzi...@apache.org
Subject DO NOT REPLY [Bug 51400] Use of "new String(byte[] b, String enc)" hits Sun JVM bottleneck
Date Thu, 23 Jun 2011 14:47:42 GMT
https://issues.apache.org/bugzilla/show_bug.cgi?id=51400

--- Comment #10 from Christopher Schultz <chris@christopherschultz.net> 2011-06-23 14:47:42
UTC ---
> would caching charset misses be a good idea, if the Encoding strings can also
> be received from external sources?

+1 to Konstantin Prei├čer's DOS concerns.

> However, there is static method Charset.availableCharsets() which returns a
> SortedMap<String, Charset> of all charsets available by the current JVM. Maybe
> this list could be used to build a Map of all available charsets (the aliases
> returned by Charset.aliases() would also have to be added)? Then missing
> charsets could also be found fast.

If you read some of the online posts linked from this BZ issue, you'll see
claims that pre-populating such a cache does not have a noticeable impact on
performance. Honestly, I'm okay not pre-populating things because there are
probably a dozen encodings that get any significant amount of real use on the
web, while Charset.availableCharsets returns 163 different obscure character
sets.

I suppose it's a fairly small set of encodings, but with little benefit,
there's no reason IMO to pre-populate.

> However, I think, in B2CConverter.getCharset() the encoding string should be
> converted to lower-case/upper-case before a lookup in the Map, to avoid
> multiple entries ("uTF-8", "UtF-8" etc.).

Actually, I might leave the case in-tact for performance considerations. Yes,
it's true that utf-8, UTF-8, uTf-8, UTf-8, UtF-8, etc. would all be the same, I
suspect that only "utf-8" and "UTF-8" will be used in the wild with any
reasonable frequency. Normalizing case for every lookup is probably a waste of
time, unless there are significant concerns of DOS using long, non-normalized
permutations of valid encodings (longest is x-MacCentralEurope with 17
characters to play with). 17 characters is a lot of permutations (~2MiB),
though.

-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@tomcat.apache.org
For additional commands, e-mail: dev-help@tomcat.apache.org


Mime
View raw message