groovy-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alain Stalder <astal...@span.ch>
Subject Re: Improve Groovy class loading performance and memory management
Date Fri, 13 May 2016 20:23:58 GMT
First about the bug that classes were loaded only once: This did not 
happen in my case for the GroovyFilling because I used different 
directories for ClassGCTester and GroovyFilling, so the parent of the 
URLClassLoader could not load the test class, but it happened 
(unintentionally) in my tests for JavaFilling which is why I though that 
loading Java classes was so incredibly fast... :(

With that fixed, I got much more reasonable output, about 
0.1ms+0.02ms=0.12ms for loading JavaFilling and about 0.4ms + 0.9ms = 
1.3ms for the original GroovyFilling. So, there is not all that much of 
room for improvement, at least a lot less than I feared, considering 
that Groovy classes do have more capabilities.

$ java -cp . ClassGCTester filling/ JavaFilling

Secs Test classes              Metaspace/PermGen Heap   Load time Create 
time
        #loaded  #remaining        used committed       used 
committed     average     average
    0         1           1       2.9m       4.8m       2.6m 245.5m     
0.812ms     0.098ms
    1      5577        2803       7.4m      21.3m      37.9m 177.5m     
0.136ms     0.028ms
    2     13610        3437       8.2m      25.9m      45.6m 361.0m     
0.114ms     0.024ms
    3     21579        2836       7.4m      24.0m      38.3m 513.5m     
0.109ms     0.023ms
    4     29582        2269       6.7m      21.8m      31.6m 733.5m     
0.106ms     0.022ms
    5     37792        1909       6.3m      20.5m      28.7m 990.5m     
0.104ms     0.022ms
    6     45894        1441       5.7m      18.6m      22.3m 1257.0m     
0.103ms     0.022ms
    7     54049        1026       5.2m      17.6m      19.1m 1575.5m     
0.102ms     0.021ms

$ java -XX:MaxMetaspaceSize=64m -Xmx512m -Dgroovy.use.classvalue=true 
-cp .:groovy-2.4.6.jar ClassGCTester filling/ GroovyFilling

Secs Test classes              Metaspace/PermGen Heap   Load time Create 
time
        #loaded  #remaining        used committed       used 
committed     average     average
    0         1           1       6.3m       6.5m      13.4m 245.5m     
0.875ms    11.525ms
    1       474         474       9.0m      10.3m      36.5m 245.5m     
0.350ms     1.694ms
    2      1312        1312      12.2m      15.1m     103.7m 309.5m     
0.265ms     1.224ms
    3      2291        2291      16.0m      21.0m      87.4m 389.0m     
0.417ms     1.034ms
    4      2977        2977      18.6m      25.1m     178.4m 389.0m     
0.360ms     0.962ms
    5      4065        4065      22.8m      31.5m     216.0m 389.0m     
0.307ms     0.905ms
    6      4641        4641      25.0m      34.9m     164.0m 455.5m     
0.444ms     0.892ms
    7      5314        5314      27.6m      38.8m     213.2m 455.5m     
0.412ms     0.888ms

What I still observed was that for loading JavaFilling, Metaspace does 
not grow indefinitely even without a limit (see above), but it does for 
GroovyFilling, which I can also understand. Would be nice if it was 
possible that Groovy classes were also collected so quickly, but in 
practice I guess once you know that, you just have to set a reasonable 
maximum for Metaspace, and when you operate a "server environment" you 
have to take care of these kinds of things anyway, I would say.

So, essentially I am quite happy with the results and with how Groovy 
fares :)

I have made a "release" 1.1.0 of ClassGCTester with an added check that 
the test class cannot be loaded from the classpath of ClassGCTester 
alone and with a fix for the display of Metaspace/PermGen (this matches 
now roughly the output of "jstat -gc ..." for MC and MU, resp. the 
equivalents for PermGen), plus updated the readme and examples.

So far for the "broad picture", some feedback regarding your detailed 
analysis hopefully a bit later, I think I will first read it again a few 
more times - very interesting... :)

Alain


On 13.05.16 13:03, Jochen Theodorou wrote:
> On 13.05.2016 02:22, Alain Stalder wrote:
> [...]
>> Qualitatively this often has the following result in the Java VM:
>> Metaspace resp. PermGen, and Heap in parallel, just grow until a
>> configured limit is reached (and note that there is none by default for
>> Metaspace in Java 8 and later), often only then is it garbage collected.
>> With Java classes, at least with simple ones, this looks often
>> different, those appear to be garbage collected much more quickly.
>>
>> Another qualitative difference is that loading a Groovy class and
>> instantiating it seems typically to be considerably slower than
>> instantiating a Java class with similar functionality, even quite
>> drastically so, more than one would expect even considering the need to
>> create metadata for dynamic function calls etc.
>>
>> At least that has been my experience over the past few years.
>
> this is going to be a long mail ;)
>
> so let us make three things to discuss here:
>
> 1) Object initialization performance
> 2) class verification/complexity
> 3) garbage collection of unused classes
>
> And we have to distinguish here between usages of ClassValue, 
> invokedynamic and the traditional version of those... that makes 4 
> aspects.
>
> So I will write several things you probably already know, but other 
> reading here might not. And even though I simplify a bit, please bear 
> with me ;) And first of all, let us talk about the meta class 
> system... that mostly targets ClassValue then.
>
> The old version of the meta class system uses a big global table for 
> all meta classes, with a class key and ClassInfo as value. In 
> ClassInfo we have the meta class, which might be either recreate-able 
> (in which case the meta class is soft referenced) or not (in which 
> case it is a strong reference). The idea being, that as soon as the 
> class can be garbage collected, the ClassInfo can as well, and with it 
> the meta class.
>
> Problem 1 here is, the meta class holds a strong reference to the 
> class, so if the ClassInfo holds a strong reference to the meta class, 
> this entry in the table will never be collected. I mention this only 
> for completeness, since you did not set a permanent meta class in your 
> test
>
> Problem 2 here is, the code is concurrent, which rules out WeakHashMap 
> and forced us to implement our own map.
>
> In the ClassValue version we do not have our own table anymore and let 
> the JVM manage this.
>
> To avoid the lookup cost of the meta class, every Groovy class has a 
> reference to its ClassInfo. The meta class and ClassInfo are lazy 
> initialized.... well, "populated with actual data" in case of ClassInfo.
>
> Some classes extend GroovyObjectSupport, which does the initialization 
> in the constructor already, groovy.lang.Script is one of them. That 
> means every time you create an instance of a new script class, you get 
> the meta class already, even if the meta class is not used. let us 
> have a small look at the bytecode of the constructors (x.groovy which 
> only returns 42) of such a script:
>
>>   // access flags 0x1
>>   public <init>()V
>>     ALOAD 0
>>     INVOKESPECIAL groovy/lang/Script.<init> ()V
>>    L0
>>     INVOKESTATIC x.$getCallSiteArray 
>> ()[Lorg/codehaus/groovy/runtime/callsite/CallSite;
>>     ASTORE 1
>>    L1
>>     RETURN
>>
>>   // access flags 0x1
>>   public <init>(Lgroovy/lang/Binding;)V
>>    L0
>>     INVOKESTATIC x.$getCallSiteArray 
>> ()[Lorg/codehaus/groovy/runtime/callsite/CallSite;
>>     ASTORE 2
>>     ALOAD 0
>>     ALOAD 1
>>     INVOKESPECIAL groovy/lang/Script.<init> (Lgroovy/lang/Binding;)V
>>    L1
>>     RETURN
>
> as you can see here, there is a getCallSiteArray call in here and 
> direct methods calls for initialization. The getCallSiteArray call in 
> here is actually surplus, but it is difficult to decide in the 
> compiler early on if we will need it or not, because the callsite 
> array here is basically an array wrapper, which supplies method names 
> for method calls as well as specialized code for doing dynamic method 
> calls. Would the constructor for example contain a method call to a 
> method foo(), you would see some thing like getting the CallSite and 
> executing "call" on it.
>
> Why are we doing static method calls in the constructor here? Because 
> in several cases the compiler optimizes the dynamic call away here. 
> Basically you cannot provide a super constructor in Groovy, which 
> means only the one statically defined does count. And as long as the 
> given types are matching enough for the compiler to decide, we can 
> create direct method calls.
>
> So what does this mean for the object initialization performance in 
> cases of Scripts so far? We eagerly initialize ClassInfo and MetaClass 
> for each script. That means a lookup by reflection of the complete 
> structure of the class and its super classes... which is cached, so 
> the super class lookup will be much lower the next time. initializing 
> the first meta class will also initialize the extension method lookup, 
> which can also take a bit time... again, a one-time cost here.
>
> The invokedynamic version will again use the jdk internals for the 
> callsites, thus $getCallSiteArray is never called nor are the methods 
> around this created. This saves bytecode size (faster verification), 
> and in fact the two mentioned constructors above would look just like 
> they would in Java. This of course still implies the meta class system 
> init.
>
> Now... the actual code of the script is to be found in the run method, 
> which looks like this in old groovy:
>
>>   // access flags 0x1
>>   public run()Ljava/lang/Object;
>>    L0
>>     INVOKESTATIC x.$getCallSiteArray 
>> ()[Lorg/codehaus/groovy/runtime/callsite/CallSite;
>>     ASTORE 1
>>    L1
>>     LINENUMBER 1 L1
>>     BIPUSH 42
>>     INVOKESTATIC java/lang/Integer.valueOf (I)Ljava/lang/Integer;
>>     ARETURN
>>    L2
>>     ACONST_NULL
>>     ARETURN
>
> again we see the call to get the callsite array. But otherwise the 
> method should like this in Java too. Since we are not calling any 
> actual methods and just return 42, we have again needless overhead in 
> this. Accordingly the invokedynamic version looks like this:
>
>>   // access flags 0x1
>>   public run()Ljava/lang/Object;
>>    L0
>>     LINENUMBER 1 L0
>>     BIPUSH 42
>>     INVOKESTATIC java/lang/Integer.valueOf (I)Ljava/lang/Integer;
>>     ARETURN
>>    L1
>>    FRAME FULL [] [java/lang/Throwable]
>>     NOP
>>     ATHROW
>
> Frames appear in this one, because the old groovy bytecode is java5 
> compatible, while invokedynamic requires java7 and thus the new faster 
> verifier with frames. The frames basically support the verifier to 
> make it faster.... I never tested if that is actually the case.
>
> To sum it up a little so far.... the invokedynamic version uses less 
> bytecode and the new verifier 1685 vs 2140, both could execute that 
> script without ever using the meta class system, if run is called 
> directly and if not Script would initialize the meta class eagerly. Of 
> course the later is for nil as soon as you have some method calls in 
> there, which are not static.
>
> Since we now have some basics, we should start talking about your test 
> case.
>
> First thing... I don´t think you load the class new every time. (Taken 
> from my machine, with java8u25 and Groovy 2.4.6)
>>    0         1           1      10.7m 11.4m        14.4m    
>> 240.0m     0.105ms     0.061ms
>>    1    760081           1      11.4m      11.8m        28.3m 
>> 246.5m     0.000ms     0.000ms
>
> I see here the first time class loading with 105 and a creation time 
> with 61. This class loading time has to appear every time and must 
> never be 0. But that is the case. So even if you create a new 
> URLClassLoader without defining a root, it still has the system class 
> loader as parent. Providing null as parent class loader fixes that 
> problem. And then I got a more interesting:
>
>>    0         1           1      10.6m 11.4m        14.4m    
>> 240.0m     0.213ms     0.063ms
>>    1     11519        2227      16.2m      35.9m        24.2m 
>> 246.0m     0.060ms     0.022ms
>>    2     27103        4055      19.4m      46.8m        40.7m 
>> 229.0m     0.051ms     0.019ms
>>    3     42465           1      12.2m      34.4m         4.5m 
>> 235.0m     0.049ms     0.019ms
>>    4     57606         580      13.3m      34.4m         9.0m 
>> 236.0m     0.048ms     0.018ms
>>    5     73506        1918      15.8m      39.1m        21.3m 
>> 236.0m     0.047ms     0.018ms
>>    6     89212        2482      16.8m      43.0m        25.8m 
>> 212.5m     0.047ms     0.018ms
>>    7    104578           1      12.4m      41.7m         4.0m 
>> 232.0m     0.046ms     0.018ms
>
> actually this was with java5 compatible code... with java8 standard I 
> get something like this:
>
>>    0         1           1      10.6m 11.4m        14.4m    
>> 240.0m     0.214ms     0.029ms
>>    1     12117        2861      17.3m      38.4m        30.4m 
>> 246.0m     0.062ms     0.016ms
>>    2     29143        1303      14.6m      36.6m        15.8m 
>> 236.0m     0.052ms     0.014ms
>>    3     46633        4267      19.8m      48.3m        43.2m 
>> 235.0m     0.048ms     0.013ms
>>    4     62634         900      14.0m      35.2m        11.4m 
>> 236.0m     0.048ms     0.013ms
>>    5     80618        4358      20.1m      48.7m        43.5m 
>> 235.5m     0.046ms     0.013ms
>>    6     97964        2336      16.6m      41.0m        25.3m 
>> 236.5m     0.046ms     0.012ms
>
> This makes me actually wondering a bit as of why the creation time 
> goes down, but not the load time. And after some thinking things 
> become clear to me...when you load a class using loadClass, the class 
> is not verified immediately. This happens lazy on actual static 
> initialization or once you ask the class for its structure... like 
> when using getMethods. So what load time tests here is purely the 
> class loader performance, nothing else. A difference in load time 
> would thus indicate solely that more or less classes are used. This 
> also means the creation time here includes the verification time of 
> the class, which I guess amounts to the .006 in difference between the 
> two.
>
> Now lets get further and change the simple java class with a run 
> method to extend Groovy´s script class.. well this requires further 
> changes, since now of course giving the groovy jar on the classpath 
> will no longer enable the spawned URLClassloader to load groovy 
> classes. Anyway... after fixing that:
>
>>    0         1           1      13.4m 13.8m        24.0m    
>> 240.0m     1.290ms    90.612ms
>>    1        12          12      39.3m      40.1m        64.3m 
>> 190.0m     0.985ms    82.673ms
>>    2        23          23      65.0m      66.5m        56.0m 
>> 246.0m     1.024ms    88.072ms
>>    3        34           5      29.3m      46.8m        27.6m 
>> 230.5m     0.976ms    88.245ms
>>    4        48          19      54.8m      64.0m        39.2m 
>> 228.0m     0.929ms    83.359ms
>>    5        60           1      19.6m      51.3m        13.2m 
>> 228.0m     0.892ms    84.574ms
>>    6        73          14      42.6m      55.8m        36.8m 
>> 228.0m     0.859ms    82.042ms
> ...
>>   46       593           1      21.8m 62.2m         7.9m    
>> 220.5m     0.783ms    76.878ms
>>   47       607          15      47.0m      62.3m        41.8m 
>> 219.0m     0.787ms    76.695ms
>>   48       621          29      78.3m      82.5m        44.5m 
>> 220.5m     0.782ms    76.563ms
>
> nothing much more will happen. I guess having a fast SSD is paying off 
> here, creation times would have been 1s+ otherwise. Anyway... as you 
> can see by the load time, we load a ton more classes now and creation 
> is much more expensive, since we now load part of the meta class 
> system. Without any further changes, this would be the best Groovy can 
> achieve without actually changing the way scripts are done. But as you 
> can see as well, the memory behaviour is actually quite well. I would 
> have liked to test the ClassValue version, but it crashes almost 
> immediately after my changes.
>
> Next step is then to use the simple "42" script:
> ...
>>   46       596          17      52.0m 63.7m        36.9m    
>> 219.5m     0.840ms    76.407ms
>>   47       609           1      21.7m      60.7m        11.5m 
>> 223.0m     0.835ms    76.536ms
>>   48       621          13      43.0m      60.7m        27.0m 
>> 223.0m     0.830ms    76.434ms
>
> we now load even more classes like CallSite and such, but creation 
> times are about the same meaning the overhead here is almost zero in 
> the end. Unsurprisingly the invokedynamic version behaves about the 
> same. Maybe the load times are a bit less good... which indicates more 
> classes being loaded internally in the JDK code to support invokedynamic.
>
> Now a further step... I can change the script from being "42" to:
>
> public class GroovyFilling {
>   public Object run() {
>     return 42;
>   }
> }
>
>
> so basically the same as the java version... This means Script will no 
> longer be used here!
>
>>   46       638          28      74.9m 79.6m        50.7m    
>> 227.5m     0.626ms    71.473ms
>>   47       650          10      35.4m      60.3m        41.1m 
>> 227.5m     0.631ms    71.630ms
>>   48       665          25      68.4m      74.6m        47.4m 
>> 227.5m     0.626ms    71.512ms
>
> well... we seem to load less classes, but we still get quite high 
> creation times. looking at the bytecode I find this (for the indy 
> version, but that does not really matter):
>
>>   // access flags 0x1
>>   public <init>()V
>>     ALOAD 0
>>     INVOKESPECIAL java/lang/Object.<init> ()V
>>    L0
>>     ALOAD 0
>>     INVOKEVIRTUAL GroovyFilling.$getStaticMetaClass 
>> ()Lgroovy/lang/MetaClass;
>>     ASTORE 1
>>     ALOAD 1
>>     ALOAD 0
>>     SWAP
>>     PUTFIELD GroovyFilling.metaClass : Lgroovy/lang/MetaClass;
>>     ALOAD 1
>>     POP
>>    L1
>>     RETURN
>
> So we actually do an eager meta class creation here as well. The worst 
> part here is... we get this even with static compilation... which 
> absolutely is a bug.
>
> So to improve this test here I guess trying to get rid of 
> $getStaticMetaClass and making Script (maybe even GroovyObjectSupport) 
> init meta classes lazy would be a gain. In the times of pre Groovy 1.8 
> this would not have amassed to much gain, since we did dynamic calls 
> all other the place. But since then we managed to reduce those calls 
> quite a lot, so I see an actual potential to reduce creation times for 
> simple cases. Of course in the more complex ones or in case in which 
> the super class constructor cannot be chosen statically by the 
> compiler, all bets are off again.
>
> bye Jochen
>
> .
>


Mime
View raw message