Return-Path: Delivered-To: apmail-incubator-harmony-dev-archive@www.apache.org Received: (qmail 6669 invoked from network); 28 Jul 2006 15:40:04 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (209.237.227.199) by minotaur.apache.org with SMTP; 28 Jul 2006 15:40:04 -0000 Received: (qmail 71303 invoked by uid 500); 28 Jul 2006 15:40:00 -0000 Delivered-To: apmail-incubator-harmony-dev-archive@incubator.apache.org Received: (qmail 71254 invoked by uid 500); 28 Jul 2006 15:40:00 -0000 Mailing-List: contact harmony-dev-help@incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: harmony-dev@incubator.apache.org Delivered-To: mailing list harmony-dev@incubator.apache.org Received: (qmail 71243 invoked by uid 99); 28 Jul 2006 15:40:00 -0000 Received: from asf.osuosl.org (HELO asf.osuosl.org) (140.211.166.49) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 28 Jul 2006 08:40:00 -0700 X-ASF-Spam-Status: No, hits=-0.0 required=10.0 tests=SPF_HELO_PASS,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (asf.osuosl.org: domain of gcjhd-harmony-dev@m.gmane.org designates 80.91.229.2 as permitted sender) Received: from [80.91.229.2] (HELO ciao.gmane.org) (80.91.229.2) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 28 Jul 2006 08:39:59 -0700 Received: from list by ciao.gmane.org with local (Exim 4.43) id 1G6UQn-0006aH-MK for harmony-dev@incubator.apache.org; Fri, 28 Jul 2006 17:39:18 +0200 Received: from msfwpr01.ims.intel.com ([62.118.80.132]) by main.gmane.org with esmtp (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Fri, 28 Jul 2006 17:39:17 +0200 Received: from Salikh.Zakirov by msfwpr01.ims.intel.com with local (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Fri, 28 Jul 2006 17:39:17 +0200 X-Injected-Via-Gmane: http://gmane.org/ To: harmony-dev@incubator.apache.org From: Salikh Zakirov Subject: [drlvm] string interning in java Date: Fri, 28 Jul 2006 19:38:46 +0400 Lines: 66 Message-ID: Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit X-Complaints-To: usenet@sea.gmane.org X-Gmane-NNTP-Posting-Host: msfwpr01.ims.intel.com User-Agent: Thunderbird 1.5.0.4 (Windows/20060516) Sender: news X-Virus-Checked: Checked by ClamAV on apache.org X-Spam-Rating: minotaur.apache.org 1.6.2 0/1000/N Hi, I have been looking to the string interning code in DRLVM recently. The reason I have started to do this is that the following stress test quickly causes OOM (OutOfMemoryError) on DRLVM: public class StressIntern { public static void main(String[] args) { String s = "abc"; for (int i = 0; i < 100000; i++) { s = (s + i + s).intern(); if (s.length() > 65536) s = "abc" + i; if (i % 1000 == 0) trace("."); } } public static void trace(Object o) { System.out.print(o); System.out.flush(); } } It turned out that DRLVM interns all of its strings in the permanent manner, keeping the strong reference to the string instances forever. Interned strings are never reclaimed and thus OOM occurs. As far as I remember, the intern() contract has two requirements 1. s1.equals(s2) must imply s1.intern() == s2.intern() 2. for any string literal "abc" == "abc".intern() In practical terms, the first requirement is satisfied by keeping an authoritative pool of interned strings. The second requirement is implemented by looking the string in authoritative pool upon resolving constant pool entry. Keeping interned strings forever is not a spec requirement, and it looks like a flaw in the current design, because it allows any user application to attack a containing VM by interning large number of strings. The attached patches implement different approach to the interning strings. To make interned string pool garbage collectable, I used WeakHashMap. The native string pool is no longer used as authoritative, but is used to cache object pointers obtained by interning strings in the WeakHashMap. In this way, any strings originating from class files and loaded through LDC bytecode, are still unreclaimable (at least until class unloading is implemented properly in DRLVM). But the strings interned at runtime by explicit calling String.intern() method, do not have native counterpart any longer, and are fully garbage collectable. Making class constant pool entries reclaimable does not look like an important feature to me at this moment, because the fair number of native memory is still used for each class pool entry, and complete solution must implement full class unloading. Performance-wise, using java string pool slows down execution a bit. I used a HelloWorld-like application to measure startup time on my laptop. unmodified DRLVM: 0.270s +/- 0.005 average, 0.257s fastest java string pool: 0.277s +/- 0.005 average, 0.265s fastest which is about 3% slowdown. In exchange, it buys us improved robustness. Now I am going to attempt optimization of java code by using a customized weak hashmap implementation. By using it, I will be able to create just one weak reference to the interned string instead of current two, and to perform just one hash lookup instead of two on interning new string. -- Salikh Zakirov, Intel Middleware Products Division --------------------------------------------------------------------- Terms of use : http://incubator.apache.org/harmony/mailing.html To unsubscribe, e-mail: harmony-dev-unsubscribe@incubator.apache.org For additional commands, e-mail: harmony-dev-help@incubator.apache.org