commons-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gary Gregory <ggreg...@seagullsw.com>
Subject [lang] and [collections] Primitives
Date Tue, 20 May 2003 16:54:19 GMT
Speaking of primitives and Collections: I was going to port our code to use
.lang.StringEscapeUtils.escapeXml(String) but I thought I'd create a
benchmark to compare our impl to .lang. Our impl, not being
"entities-pluggable" and array based is 4x faster, so I cannot replace. :-(

After running HP's Hpjmeter on some JVM -Xrunhprof output I confirmed what I
was suspecting. I see the following issues in .lang.Entities.java:

private static String escapeEntities(String str, Entities entities) {
    StringBuffer buf = new StringBuffer(str.length() * 2);
    int i;
    for (i = 0; i < str.length(); ++i) {
        char ch = str.charAt(i);
        String entity = entities.entityName(ch);

(1) Java Strings are not great, ah, if they where Collections or iterable...

For every single character in a string, the following happens:

        char ch = str.charAt(i);

charAt() checks the bounds every time. Just for fun, I changed the code to:
 
    char[] chars = str.toByteArray();
    for (i = 0; i < chars.length; ++i) {
        char ch = chars[i];

That yields a 10% speed improvement but gobbles up memory since the byte
array returned by toByteArray is a _copy_ of the BA held by the string. Not
great but a possible solution (and trade-off). 

So, the first question is: Is it worth creating a StringIterator-type of
class that gets read-only access to a String's BA. Unfortunately this kind
of code would have to use reflection to get a reference to the String's BA. 

(2) entities.entityName(ch) creates Integers.

For every character in the String, entities.entityName(ch) is called, which
in turn creates an Integer object for it char argument used in a Map lookup.
That's a _lot_ of time spent in Integer.<init>. 

This is where a primitive Map keyed on ints would come in handy. Is there a
though of Collections providing such a class?

If it did, it would seem a bit odd to have .lang depend on .collection.
OTOH, Collections are now a basic part of the JRE.

All comments welcome, thanks for reading,
Gary

-----Original Message-----
From: Stephen Colebourne [mailto:scolebourne@btopenworld.com] 
Sent: Tuesday, May 20, 2003 01:15
To: Jakarta Commons Developers List
Cc: Rodney Waldhoff
Subject: [collections] Primitives

Rodney,
I'm noting that you are adding more implementations to the primitives area
of [collections]. I had a few questions.

1) Are you code generating the classes, using Velocity or some other tool? I
would have thought that it would be an ideal way to generate the classes,
avoiding the messy search and replace and less testing. I also think that
this technique will be increasingly important when we come to do primitive
Sets and Maps.

1b) Talking of Maps...given the large number of classes, should primitive
collections be split into packages for List, Set, Map? Now would be the time
to do it. (primitive collections _could_ be a separate project in commons at
the current rate...)

2) Why is it important to have separate Serializable and Non-Serializable
implementations. Why not just make them all Serializable?

3) Can we agree on the naming strategy for the decorator package? I used
AbstractCollectionDecorator, but you used BaseProxyIntList.

Stephen


---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message