groovy-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alain Stalder <astal...@span.ch>
Subject Re: Improve Groovy class loading performance and memory management
Date Wed, 18 May 2016 07:04:23 GMT
jwagenleitner wrote:
 > I think performance in general and not just under concurrent use is 
extremely important for ClassInfo.  My understanding is that the static 
cache it holds of ClassInfo's is queried on every method call (at least 
in dynamic groovy).  That is probably why the current hash-based caches 
are used to save from the O(n) retrieval from a globalClassSet which is 
implemented as a linked list.

Too bad, I see no way at the moment how to get "on-the-fly" garbage 
collection of ClassInfo and fast concurrent access to it.

What I quickly tried  was to change the type of object stored with 
globalClassValue from ClassInfo to other types, but all failed in some 
ways (I did not test the full matrix of (Java 7 ClassValue or not) and 
(same class loader as Groovy for compiled Groovy script or not), but 
gave up once something failed):
- WeakReference<ClassInfo>: Garbage collected even if the class is still 
referenced.
- SoftReference<ClassInfo>: OutOfMemoryError but took longer than usual.
- WeakHashMap<Class,ClassInfo>: i.e. a WeakHashMap with just a single 
entry (so no synchronization needed) - that was actually my biggest 
hope, but: OutOfMemoryError.

jwagenleitner wrote:
 >>> So why not have the GroovyClassLoader keep a set of all classes it
 >>> compiled itself and were loaded and offer a new ~
 >>> GroovyClassLoader#finalCleanup() method that removes meta information
 >>> for all these classes so that they would become immediately 
eligible for
 >>> garbage collection? (I guess InvokerHelper.removeClass(clazz) and
 >>> Introspector.flushFromCaches(clazz), but whatever is needed...)
 >>
 >> GroovyClassLoader (GCL) actually represents a tree of class loaders. 
for each compilation GCL will spawn an instance of InnerLoader. Since 
two different compilations are supposed to know each others
classes a list of classes is kept in GCL itself (see classCache). The 
inner loader itself is not referenced by GCL. Because of that list GCL 
has the clearCache method to remove classes from previous compilations.
 >>
 >> Why did we use this structure? GCL is supposed to offer you the 
possibility to compile the same class multiple times. That means you 
will get the same class multiple times. At the same time a class must be 
defined under the same name only once in a given defining class loader. 
As a result trying to define a class, that already exists under that 
name results in an error. A classloading constraint is actually to 
return the same class instance each time you request a class with a 
certain name. Is implies the error before.... it also means GCL is 
breaking those constraints knowingly.
 >>
 >> Anyway... I think such a cleanup method is misplaced in GCL, since 
it spans beyond the classloader... how about GroovySystem?
 >>
 > I agree that if a method were added I don't think GCL is the right 
place and that something like GroovySystem#removeClass(Class) or 
GroovySystem#flushFromCaches(Class) would be good.

I guess this would help a little, but - again - as soon as you use e.g. 
a closure in a script, you have more than one class from a compilation, 
and I guess this happens often.

In cases where a script was compiled at runtime, that would have to be 
by a GroovyClassLoader$InnerLoader (which extends GroovyClassLoader). 
Now, that $InnerLoader could be obtained with 
script.getClass().getClassLoader() - you could tell so by whether it is 
a GroovyClassLoader#InnerLoader - and tell it to clean up all classes it 
loaded which would all be related to the only script it compiled. If you 
want to offer that - which I think would probably make sense - you would 
have to add a new method to GroovyClassLoader or at least to 
$InnerLoader - like #cleanupCompiledScripts() or whatever - and then 
offer its functionality also from GroovySystem for convenience, probably 
in two variants, one that really only removes just the indicated class 
and one extends to all classes compiled from the same script in case it 
was compiled at runtime from a Groovy script.

This will certainly also not cover all use cases (Groovy classes loaded 
from the file system by an URLClassLoader, for example, there I see now 
way how to track which classes would belong to which main script etc.), 
but I think the use case of scripts compiled at runtime would still 
justify it. It would offer a relatively clean way to make all classes 
from a script compilation available for GC more quickly.

I would maybe also offer a similar method for ConfigSlurper, for 
convenience, because that also implicitly always compiles a Groovy 
script (the config), thus filling Metaspace/PermGen - with almost 
certainly nobody expecting this offhand - so that users would not have 
to explicitly get the class from the parsed object and then call the 
removal function of GroovySystem. (There a different type of loader 
seems to be used, RootLoader, but also usually compilation only gives a 
single class, I would estimate.)

Finally, Groovy class loading is so dynamic/flexible that many things 
become impossible to untangle (I am just waiting now for Jochen 
Theodorou to say that some things I wrote above are not always like 
that), so if my hopes really fade regarding a prospect for "on-the-fly" 
GC in the relatively near future (before a Groovy 3), I might consider 
to add similar cleanup functionality to Grengine, where I estimate it 
could be done in a much more structured and controlled way.

Script:

def script = new GroovyShell().parse("99")
println script.getClass().getClassLoader()

def a = new ConfigSlurper().parse("b{c=5}")
assert a.b.c == 5
println a.getClass().getClassLoader()

Output:

groovy.lang.GroovyClassLoader$InnerLoader@7f010382
org.codehaus.groovy.tools.RootLoader@5451c3a8

Alain





Mime
View raw message