harmony-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Salikh Zakirov <Salikh.Zaki...@Intel.com>
Subject [drlvm] string interning in java
Date Fri, 28 Jul 2006 15:38:46 GMT

I have been looking to the string interning code in DRLVM recently.
The reason I have started to do this is that the following stress
test quickly causes OOM  (OutOfMemoryError) on DRLVM:

public class StressIntern {
    public static void main(String[] args) {
        String s = "abc";
        for (int i = 0; i < 100000; i++) {
            s = (s + i + s).intern();
            if (s.length() > 65536) s = "abc" + i;
            if (i % 1000 == 0) trace(".");

    public static void trace(Object o) {

It turned out that DRLVM interns all of its strings in the permanent
manner, keeping the strong reference to the string instances forever.
Interned strings are never reclaimed and thus OOM occurs.

As far as I remember, the intern() contract has two requirements
1. s1.equals(s2) must imply s1.intern() == s2.intern()
2. for any string literal "abc" == "abc".intern()

In practical terms, the first requirement is satisfied by keeping 
an authoritative pool of interned strings. The second requirement
is implemented by looking the string in authoritative pool upon
resolving constant pool entry.

Keeping interned strings forever is not a spec requirement, and it looks
like a flaw in the current design, because it allows any user application
to attack a containing VM by interning large number of strings.

The attached patches implement different approach to the interning strings.
To make interned string pool garbage collectable, I used WeakHashMap.
The native string pool is no longer used as authoritative, but is used to
cache object pointers obtained by interning strings in the WeakHashMap.
In this way, any strings originating from class files and loaded through LDC
bytecode, are still unreclaimable (at least until class unloading
is implemented properly in DRLVM). But the strings interned at runtime by explicit calling
String.intern() method, do not have native counterpart any longer, and are fully garbage collectable.

Making class constant pool entries reclaimable does not look like an important
feature to me at this moment, because the fair number of native memory is still
used for each class pool entry, and complete solution must implement full class

Performance-wise, using java string pool slows down execution a bit.
I used a HelloWorld-like application to measure startup time on my laptop.

   unmodified DRLVM: 0.270s +/- 0.005 average, 0.257s fastest
   java string pool: 0.277s +/- 0.005 average, 0.265s fastest

which is about 3% slowdown.
In exchange, it buys us improved robustness.

Now I am going to attempt optimization of java code by using a customized weak hashmap implementation.
By using it, I will be able to create just one weak reference to the interned string instead
of current two, and to perform just one hash lookup instead of two on interning new string.
Salikh Zakirov, Intel Middleware Products Division 

Terms of use : http://incubator.apache.org/harmony/mailing.html
To unsubscribe, e-mail: harmony-dev-unsubscribe@incubator.apache.org
For additional commands, e-mail: harmony-dev-help@incubator.apache.org

View raw message