groovy-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alain Stalder <astal...@span.ch>
Subject Re: Improve Groovy class loading performance and memory management
Date Sat, 28 May 2016 14:49:14 GMT
This is going to be a *very* long mail, but I think it is probably worth 
it! :)

First of all, although I am not 100% sure, I think I agree with Jochen 
regarding ClassValue - in any case, I find ClassValue is not a viable 
option to count on in the immediately forseeable future.

Instead I wrote a PoC based on the Groovy (2.5.0) master with the 
following highlights:

- In most use cases, classes that are no longer used become immediately 
available for garbage collection.
- In all cases, garbage collection is possible once the limit on 
Metaspace/PermGen resp. Heap is reached, i.e. no more OutOfMemoryErrors.
- Appears in some quick initial tests to be generally even a bit 
*faster*(!) than the current implementation.
- (Not using ClassValue at all.)
- (The two merge requests by John Wagenleitner for GROOVY-7683 (weak 
reference to Class in ClassInfo) and GROOVY-7646 (explicit cleanup after 
running scripts in GroovyShell) would become obsolete.)

Let me first define two things:

I will call a class "weakly-collectable" if it can be collected while 
the VM is running normally, i.e. before any limit on PermGen/Metaspace 
or Heap is reached, and "softly-collectable" if that only happens when 
such a limit is reached, but is still possible then, i.e. no 
OutOfMemoryError.

I will call the (maybe most typical?) use case where a Java VM 
dynamically compiles and runs some Groovy scripts the 
"script-running-use-case", including generally also the case were 
scripts were precompiled and are loaded by a dedicated class loader 
(i.e. not the same class loader as Groovy itself), and I will call the 
use case where both Groovy and compiled scripts are loaded by the same 
classe loader the "gradle-use-case", like when the Gradle daemon keeps 
running and reloads Groovy and build scripts (as I understand how this 
works - correct me if I got that wrong).

The status quo with Groovy 2.4.6. is as follows:

- script-running-use-case, use ClassValue: softly-collectable
- script-running-use-case, don't user ClassValue: not collectable 
(OutOfMemoryError)
- gradle-use-case, use ClassValue: not collectable (OutOfMemoryError)
- gradle-use-case, don't user ClassValue: softly-collectable

Now for the PoC...

Here are the PoC branch and diff to master:
- https://github.com/jexler/groovy/tree/weak-gc-poc
- https://github.com/jexler/groovy/compare/master...jexler:weak-gc-poc

The core new thing is the class 
org.codehaus.groovy.reflection.ClassInfoMap, which is based on 
ConcurrentReferenceHashMap from the spring framework (which in turn 
appears to have originated from JBoss). It implements basically a 
WeakHashMap with thread-safe read/write access.

In ClassInfo, that new ClassInfoMap is used within GlobalClassSet. 
(Detail: I have left the ManagedLinkedList<ClassInfo> items in the 
GlobalClassSet class because at least some Gradle versions seem to 
access it directly via reflection.)

GroovyClassValue (both the real one based on ClassValue and the pre Java 
7 emulation based on ManagedConcurrentMap) is not used at all any more.

The other "half" of the PoC concerns the java.beans.Introspector, 
because its caches are now the last thing that prevents 
weakly-collecting unused classes (as I will show a bit later on).

The basic approach here is to cache BeanInfo as a new private member 
"beanInfo" of ClassInfo and to remove it immediately after creation from 
Introspector caches. There is also a new public getter 
classInfo.getBeanInfo() that lazily initializes BeanInfo and returns it.

I provide 4 options for this PoC how to clean up Introspector caches, 
via a system property "weak-gc-poc.cleanup":

- "none": No cleanup, as today
- "class": The default, call Introspector.flushFromCaches(theClass) 
after getting beanInfo and storing it in ClassInfo
- "super": Same as class, but do the cleanup for the class and all of 
its superclasses (except java.* and javax.*)
- "all": Clean Introspector caches for all classes, i.e. call 
Introspector.flushCaches()

In the end I suspect only "none" and "class" would be viable options 
because the others probably have too much impact on performance (more 
creations of BeanInfo for same classes), potentially also influencing 
performance of outside code that is also using Introspector.

First some results based on classgctest ( 
https://github.com/jexler/classgc ).

script-running-use-case, with the default "weak-gc-poc.cleanup" setting 
of "class":

$ java -XX:MaxMetaspaceSize=256m -Xmx512m -cp 
.:groovy-2.5.0-weak-gc-poc.jar ClassGCTester -cp filling/ -parent tester 
-classes GroovyFilling

Secs Test classes              Metaspace/PermGen Heap   Load time Create 
time
        #loaded  #remaining        used committed       used 
committed     average     average
    0         1           1       6.4m       6.5m      14.1m 245.5m     
1.226ms    11.831ms
    1       482         482       9.1m      10.5m      25.9m 245.5m     
0.343ms     1.650ms
    2      1356        1356      12.5m      15.8m      63.1m 245.5m     
0.265ms     1.167ms
    3      2398         137       7.9m      16.8m      19.7m 224.0m     
0.243ms     0.977ms
    4      3475        1214      12.0m      16.8m      20.5m 239.5m     
0.223ms     0.902ms

So, weakly-collectable, what we want.

gradle-use-case, first with "class":

$ java -XX:MaxMetaspaceSize=256m -Xmx512m -cp . ClassGCTester -cp 
groovy-2.5.0-weak-gc-poc.jar:filling/ -parent null -classes GroovyFilling

Secs Test classes              Metaspace/PermGen Heap   Load time Create 
time
        #loaded  #remaining        used committed       used 
committed     average     average
    0         1           1       8.1m       8.5m      17.9m 245.5m     
2.249ms   131.702ms
    1         9           9      22.9m      23.9m      23.7m 240.0m     
1.728ms   115.582ms
    2        18          18      39.0m      40.6m      47.8m 300.5m     
1.450ms   112.826ms
    3        26          26      53.3m      55.1m     108.7m 300.5m     
1.456ms   113.726ms
    4        36          36      71.1m      73.6m     103.8m 396.0m     
1.372ms   110.934ms
    5        46          46      88.9m      92.1m     180.5m 396.0m     
1.335ms   107.233ms
    6        56          56     106.7m     110.3m      99.0m 414.0m     
1.308ms   107.037ms
    7        66          66     124.5m     128.8m     109.3m 443.5m     
1.267ms   104.878ms
    8        77          77     144.1m     148.9m     111.6m 437.0m     
1.229ms   103.268ms
    9        86          86     160.1m     165.4m     102.0m 467.0m     
1.206ms   103.848ms
   10        96          96     177.9m     183.9m     115.8m 465.0m     
1.188ms   102.931ms
   11       107         107     197.5m     204.0m     128.3m 450.5m     
1.166ms   102.170ms
   12       117         117     215.3m     222.4m     132.6m 459.5m     
1.149ms   101.614ms
   13       127         127     233.1m     240.9m     142.7m 458.5m     
1.142ms   101.311ms
   14       136           3      10.9m      60.0m      17.5m 450.0m     
1.135ms   103.695ms

So, softly-collectable, which is because the Introspector keeps BeanInfo 
for superclasses, as becomes evident when explicitly running the test 
again with "super":

$ java -XX:MaxMetaspaceSize=256m -Xmx512m -Dweak-gc-poc.cleanup=super 
-cp . ClassGCTester -cp groovy-2.5.0-weak-gc-poc.jar:filling/ -parent 
null -classes GroovyFilling

Secs Test classes              Metaspace/PermGen Heap   Load time Create 
time
        #loaded  #remaining        used committed       used 
committed     average     average
    0         1           1       8.1m       8.5m      17.9m 245.5m     
2.307ms   125.460ms
    1         9           3      10.6m      16.8m      18.3m 233.0m     
1.668ms   114.096ms
    2        19          12      26.6m      27.9m      29.4m 295.5m     
1.661ms   111.405ms
    3        27          20      40.9m      42.3m      90.3m 295.5m     
1.729ms   111.737ms
    4        37          19      39.2m      41.3m      81.7m 358.5m     
1.658ms   107.926ms
    5        47          29      57.0m      59.0m     156.8m 358.5m     
1.632ms   104.915ms
    6        57           8      19.7m      29.5m      47.4m 344.0m     
2.088ms   103.593ms
    7        68          19      39.2m      43.5m      62.7m 372.5m     
1.986ms   101.548ms

So, weakly-collectable in this case.

Let me first present some quick results regarding performance before 
discussing where maybe to take this...

First test script, script0.groovy:
--
def shell = new GroovyShell()
for (int i=0; i<1000; i++) {
    long t0 = System.nanoTime()
    for (int j=0; j<1000; j++) {
       shell.run("return $i+$j", "script", [])
    }
    long t1 = System.nanoTime()
    printf("%3d: %3.1fs%n", i, ((double)(t1-t0))/1000000000)
}
--

$ groovyc script0.groovy

Then running it first with 2.5.0-SNAPSHOT (current master):

$ java -cp 
.:/Users/alain/tech/unix/groovy-2.5.0-SNAPSHOT/lib/groovy-2.5.0-SNAPSHOT.jar 
script0
   0: 3.6s
   1: 2.7s
   2: 2.1s
   3: 2.4s
   4: 1.7s
   5: 1.8s
   6: 1.6s
   7: 2.7s
   8: 1.6s
   9: 2.0s

And then with the PoC (default "class"):

$ java -cp 
.:/Users/alain/tech/unix/groovy-2.5.0-weak-gc-poc/lib/groovy-2.5.0-weak-gc-poc.jar 
script0
   0: 3.5s
   1: 2.3s
   2: 1.8s
   3: 1.7s
   4: 1.6s
   5: 1.4s
   6: 1.2s
   7: 1.2s
   8: 1.3s
   9: 1.3s

So, very similar performance, the PoC appears even slightly faster and 
with the PoC classes were weakly collectable, as expected.

Second test script, script1.groovy (rather ugly, but works):
--

def scriptText = """
class Script1 extends Script {
    static class Inner {
        int x = 1;
    }

    Object run() {
        int x = new Inner().x
        int y = new Parallel().y
        return x+y
    }
}

class Parallel {
     int y = 2;
}
"""

def shell = new GroovyShell()
for (int i=0; i<1000; i++) {
    long t0 = System.nanoTime()
    for (int j=0; j<1000; j++) {
       shell.run(scriptText, "script", [])
    }
    long t1 = System.nanoTime()
    printf("%3d: %3.1fs%n", i, ((double)(t1-t0))/1000000000)
}
--

Output with 2.5.0 master:

$ java -cp 
.:/Users/alain/tech/unix/groovy-2.5.0-SNAPSHOT/lib/groovy-2.5.0-SNAPSHOT.jar 
script1
   0: 8.6s
   1: 6.5s
   2: 5.8s
   3: 4.7s
   4: 5.9s
   5: 5.0s
   6: 5.2s
   7: 5.3s
   8: 6.8s
   9: 5.4s

Output with Poc (default "class"):

$ java -cp 
.:/Users/alain/tech/unix/groovy-2.5.0-weak-gc-poc/lib/groovy-2.5.0-weak-gc-poc.jar 
script1
   0: 8.2s
   1: 5.9s
   2: 4.5s
   3: 4.0s
   4: 4.0s
   5: 3.7s
   6: 3.8s
   7: 3.6s
   8: 3.6s
   9: 3.5s

This time the PoC appears to be really faster and YES, classes were also 
weakly-collectable in this case, even though several Groovy classes were 
generated by compiling the script text Script1.

The third test script introduced a little bit of concurrency, although 
very likely not enough to really stress things, script2.groovy:
--
def scriptText = """
class Script1 extends Script {
    static class Inner {
        int x = 1;
    }

    Object run() {
        int x = new Inner().x
        int y = new Parallel().y
        return x+y
    }
}

class Parallel {
     int y = 2;
}
"""

def shells = new GroovyShell[10]
for (int t=0; t<10; t++) {
   shells[t] = new GroovyShell()
}
for (int i=0; i<1000; i++) {
    long t0 = System.nanoTime()
    def threads = new Thread[10]
    for (int t=0; t<10; t++) {
       final int n = t
       threads[n] = Thread.start {
          for (int j=0; j<100; j++) {
             shells[n].run(scriptText, "script", [])
          }
       }
    }
    for (int t=0; t<10; t++) {
       threads[t].join()
    }
    long t1 = System.nanoTime()
    printf("%3d: %3.1fs%n", i, ((double)(t1-t0))/1000000000)
}
--

Output with 2.5.0 master:

$ java -cp 
.:/Users/alain/tech/unix/groovy-2.5.0-SNAPSHOT/lib/groovy-2.5.0-SNAPSHOT.jar 
script2
   0: 3.4s
   1: 2.7s
   2: 3.0s
   3: 2.1s
   4: 3.4s
   5: 2.5s
   6: 2.7s
   7: 2.8s
   8: 4.4s
   9: 2.3s

Output with Poc (default "class"):

$ java -cp 
.:/Users/alain/tech/unix/groovy-2.5.0-weak-gc-poc/lib/groovy-2.5.0-weak-gc-poc.jar 
script2
   0: 2.8s
   1: 2.0s
   2: 1.7s
   3: 1.7s
   4: 1.6s
   5: 1.6s
   6: 1.8s
   7: 1.3s
   8: 1.7s
   9: 1.7s

Once more the PoC appears to be faster and again classes were 
weakly-collectable.

I have also run the Gradle build of grengine with 2.5.0 master and the 
PoC. The build contains 6 unit tests that load lots of Groovy classes in 
separate threads, but compile little. There the PoC appaered to be 
slightly slower (0-5%) than 2.5.0 master.

What next?

Maybe you first want to take a look yourself?

A distribution based on the PoC is available at 
https://www.jexler.net/apache-groovy-binary-2.5.0-weak-gc-poc.zip

If there was a consensus to continue to evaluate this approach:

Regarding the map taken from Spring Framework:
- Is it really thread-safe? (Naively one would assume so because part of 
a widely-used framework, but assumption is the mother of all ...)
- Does it ever perform serverely less than the current implementation?
- Do classes also remain weakly-collectable if more complex things are 
done (and stored in ClassInfo)?
- OK to use the code? I presume yes, is also Apache 2.0; maybe necessary 
to mention it in some other places outside of the code?

Regarding Introspector:
- Go with "weak-gc-poc.cleanup" always "class", i.e. remove the system 
property completely?
- For Gradle, recommend to call Introspector.flushCaches() after each 
build to make classes immediately available for garbage collection?
- Similar recommendation for similar use cases like e.g. Groovy in a 
webapp container?
- Later migrate completely away from using Introspector to solve this 
completely.

But is this the way to go, and for which version?

You have to help me out here:
- Could this be a candidate for a 2.4.7, even though ClassInfo would get 
a new public method (getBeanInfo)?
- Does this sound interesting enough to do more at the moment?
- If yes, who could maybe test a few more things, maybe with Gradle or 
Grails etc.? (Or does that usually only happen in a beta?)

Please tell me if there is anything more I can do here to help out...

Alain


Mime
View raw message