groovy-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Uwe Schindler" <uschind...@apache.org>
Subject RE: new MOP under Java9 module system findings
Date Mon, 02 Apr 2018 14:52:42 GMT
Hi Jochen,

> [...]
> > Yes, that script language was designed from scratch with only
> invokedynamic and MethodHandles in mind.
> 
> which I guess means no mop at all, or nothing beyond a missing method

Exactly. As said between the lines, painless scripts cannot define own classes (only indirectly
via lambdas).

> > It also uses a pure whitelist-approach (explicit whitelisting what you can
> call, so no blacklists needed), but still allows most groovy syntax. As
> Elasticsearch is very performance critical (e.g. you may call a script for every
> search result which may be up to billion of times per search execution), we
> tried to avoid boxing and dynamic lookups by using extensive caching in the
> indy call sites.
> 
> yes, I can understand that part
> 
> > On the other hand it still allows the "def" type. The compiler tries to guess
> most types during compilation and only if it cannot guess the type it writes an
> invokedynamic.
> 
> which means semantically your are in between dynamic Groovy and Java,
> where def is more a "do this dynamic if in doubt", otherwise it is more
> like Java. So I strongly assume in

Yes, that's exactly the idea. I think this is mostly the case for scripting languages inside
servers like Elasticsearch. You just expose some public API to those script that can then
call them. Painless only allows functional stuff (lambdas), but no own classes.

> class A {}
> class B extends A {
>     void foo(String s) {}
> }
> 
> B b = new B()
> b.foo("a")
> A a = b
> a.foo("a")
> 
> you are going to call foo(String)in B in both cases while in
> 
> class A {
>     void foo(Object o) {}
> }
> class B extends A {
>     void foo(String s) {}
> }
> 
> B b = new B()
> b.foo("a")
> A a = b
> a.foo("a")
> 
> you are going to call B.foo(String) in the first case and A.foo(Object)
> in the second case (static/dynamic Groovy: B.foo(String) in both cases),
> and I assume when using def everywhere in this last example you are
> going to call B.foo(String) in both cases, because you guess B in both
> cases. And in case of
> 
> void bar (A a) {
>    a.foo("")
> }
> bar(new B())
> bar(new A())
> 
> you will in both cases call A.foo(Object) (static Groovy the same,
> dynamic Groovy both B.foo(String)), while in
> 
> void bar (def a) {
>    a.foo("")
> }
> bar(new B())
> bar(new A())
> 
> you will in both cases call first B.foo(String) and then A.foo(Object).
> (static Groovy will not compile this, dynamic Groovy will behave the same)

As said before, you cannot define own classes, so there is no need for metaobjects. So yes.
You have it harder! I just posted this mail to explain, why painless was created and what
the main decisions were. In fact, some parts like the DefBootstrap and its mono/poly/megamorphic
call sites and caches may be reused.

> > But nevertheless, by using invokedynamic with good caches (we go up to
> megamorphic if needed!) the slowdown by using invokedynamic for most
> scripts is neglectible!!! (less than 10% when full hotspot compilation is done).
> The main reason for creating a new script language was not only security
> (this was just good for selling it to users at that time), but also speed. Groovy
> behaves very bad when types are variying and especially in the "def" case it
> uses way too much reflection (the invokedynamic part has no good caches
> and still uses reflection all the time on the dynamic lookup in the call site type
> handler).
> 
> I think the Reflection part is going to go away mostly. We still require
> it to query the class for class members though. I assume it is the same
> for you. The usage of reflection in the dynamic lookup of the call site
> is for once due to the mop and due to using unreflect(Method) to produce
> handles and the resulting handle not being cached. That also contributes
> to bad invokedynamic performance, because creating handles and
> especially MethodType objects is surprisingly expensive.

Yes, that the reason why you need those caches.

> > The script language also supports lambdas and method references, but in
> contrast to Groovy, those are lambdas, not closures! So it uses
> LambdaMetaFactory (actually it uses it's own implementation, because
> LambdaMetaFactory is too strict with types and has some bugs in Java 9 that
> make it behave differently than the spec; java code does not see this, but
> dynamic languages break heavily with Java 9 when they use
> LambdaMetaFactory).
> 
> oh, any pointers on the bugs you mentioned? Since we are going to use it
> in the future it would be very nice to know of these

This commit broke it: http://hg.openjdk.java.net/jdk9/jdk9/jdk/rev/4be3ef759ead

Here is our issue: https://github.com/elastic/elasticsearch/issues/23473

And that's the JDK bug about it: https://bugs.openjdk.java.net/browse/JDK-8174983

> > As said before, the compiler tries to guess all types at compile time and
> passes the native types to the invokedynamic, unless it can directly generate
> a real method call (everything is known). For dynamic ("def") stuff it
> delegates everything to DefBootstrap (https://goo.gl/DdNJJG), which has
> several modes:
> >
> > - simple method calls: here it uses a polymorphic cache with up to 5
> receiver items (the usual guardWithTest chain using MethodHandles). Once it
> sees more receivers, it reverts the whole cache and goes megamorphic (it
> builds a methodhandle that calls ClassValue.get on every call).
> 
> which means you guard only on the receiver type, not the parameter
> types? So in case of
> 
> class P {}
> class Q extends P {}
> class A {
>    void foo(P p){}
> }
> class B extends A {
>    void foo(Q q){}
> }
> 
> void bar(A a, def p) {
>    a.foo(p)
> }
> 
> bar(new B(), new P())
> bar(new B(), new Q())
> 
> you are going to call in both cases A.foo(P)? While in
> 
> bar(new B(), new Q())
> bar(new B(), new P())
> 
> you are going to call...no.. even though you have here Q first and will
> naturally select B.foo(Q), you have to have to create a new variant for
> P or else your call to B.foo(Q) with a P would fail. I cannot say I
> looked extensively at DefBootstrap, but I could find only a CHECK_CLASS
> for the receiver, and nothing for the parameters.... Where is that check
> then?

Very simple: We do not support method overloads. I mentioned this a bit later in my mail,
so the receiver check is enough. Arity is implicitely done by the JVM through the invokedynamic
call and types of "def" parameters are not interesting (as there is only one possible method).
For all APIs in Elasticsearch/JDK (e.g. Matcher#group(int/String)) we added method aliases,
if they overloaded methods are needed:

https://www.elastic.co/guide/en/elasticsearch/painless/current/modules-scripting-painless-dispatch.html

Because of the "whitelist" approach, we select the methods through the configuration of the
whitelist. In case of regex.Matcher, the whitelist only contains Matcher#group(int). The String
variant has a different name (aliased).

This is a simplification. I understand that in Groovy this is the limitation behind your indy
bootstraper, so it is "monomorphic" (otherwise it would blow up). But nevertheless, it might
be a good optimization also in groovy to somehow cache the fact that method name and arity
is unique, so you may be able to cache on receiver only for some "hot methods".

> > This is slower, but better than resolving every time. Keep in mind: This one
> has no bugs with ClassValue, as all seen types are local and can be cleaned
> up by GC correctly. The problems of Groovy with ClassValue are homemade,
> sorry. ClassValue works perfectly, if you do not misuse it!!! Unfortunately,
> Groovy has a monomorphic cache only, which was one reason why groovy
> was behaving bad in Elasticsearch!
> 
> Afaik we have no longer problems with ClassValue.

At that time it was Groovy 2.4.x which had issues.

> > - fields, array loads/stores: same as method calls, it just translates to a
> methodhandle like for simple method calls (including all caches), as
> mentioned above.
> > - (binary-)operators: As you have mostly 2 variable types here, it's hard to
> build good caches. So it goes for a monomorphic cache here. In most cases
> this is not an issue, as operator constructs don't have varying types (unless
> both sides are DEF). Still, once seen at runtime it stays constant for most
> cases.
> 
> interesting find.

All operators are static methods. Similar to Groovy when using like using "+" operator on
two java.util.List instances. Whenever a DEF type is part of a construct with operators, it's
transformed to an invokedynamic that is resolved  to one of those static methods in Def/DefMath.

> > - references (these are method references): Like in Java, lambdas
> (with/without captures) are also compiled to static/instance methods in the
> script class.
> 
> yes, that is where I want to go to. Not only for lambdas, also for Closure.
> 
> > Lambdas are also tried to be resolved at compile time, but you can also call
> a method taking a functional interface on the "def" type! In that case the
> above "references" variant is used (the DefBootstrap then delegates after
> type resolving to the LambdaBootstrap). At the end of an invokedynamic call,
> field access, array lookup, map lookup, list lookup, lambda lookup is just a
> compiled method handle including all MIC, PIC, Megamorphic caching logic
> (using those many methods in Java's "MethodHandles" class for combining
> MHs)! Except for Bootstrapping method references, no byte code is
> generated at runtime (of course java runtime does it, but we do not). We just
> combine methodhandles!
> 
> Yeah... there was recently a post to the mlvm list in this direction
> showing where LambdaFactory is actually faster (and runtime bytecode
> generation outside the MethodHandles). But I guess the penalty is not so
> important for this
> 
> > Most static definitions that are used in created method handles are part of
> the "Def" (https://goo.gl/g7GD5t) and "DefMath" (https://goo.gl/AN4SJ1)
> classes. If you look at Def.java, you also see many cases where it backports
> some MethodHandles stuff only available in Java 9+, but falls back to Java 9
> native code when Java 9 is used. Actually some methods missing in Java 8's
> MethodHandles implementation were added to Java 9, because of our
> feedback (e.g. array length lookup for implementing array[].length using
> invokedynamic).
> 
> yepp, thanks for that
> 
> > One thing: Because of the strict types and complexity/slowness with
> caching, Painless does not allow method overloading. The above whitelist
> uses alias names for methods that use overloading. One famous example are
> the group() methods on regex matchers (named, unnamed groups). Some
> users complained, but that's under control. The whitelist also adds addon
> methods (like Groovy) to some java classes in the same way like aliases.
> 
> Can you still define classes at all? I mean if you take that away as
> well, then you have of course no reason to check parameter types ;)
> Maybe I should then call this to be between Ruby and Java.

Yes. See above.

> >>> In my opinion that project is wrong, because the security manager
> >>> mechanisms provide enough protection. The problem is that rarely
> >>> anyone can use a security manager properly. Anyway... Groovy won't be
> >>> able to do any call Java cannot do in principle in this version. That
> >>> is not because of keeping security in mind, that is more because of
> >>> the module system, that enforces this
> >
> > This is partly right, but also not always applicable in that simple form.
> Elasticsearch uses SecurityManager to encapsulate all of its plugins, script
> engines,... and also itsself. It prevents stuff like deep reflection (setAccessible
> is completely disallowed anywhere in Elasticsearch) or System.exit/halt().
> 
> Just to mention it. Groovy can actually handle setAccessible not being
> allowed at all.
> 
> > File system is also restricted to not escape config/index folders. I think
> Elasticsearch's implementation of SecurityManager is one ogf those
> implementation that really work correctly. It opened a lot of bug reports in
> foreign projects to fix their code (e.g, missing doPrivileged or calls to
> setAccessible without proper try/catch). Of course this helps to make
> Elasticsearch safe for the typical security issues (code escapes sandbox). But
> on the other hand it does not help, if code in a script calls some public class
> in Elasticsearchs's core and executes a command to delete an index from
> inside a query script.
> 
> then you ca forbid scripts to make those calls, at least in Groovy that
> is possible.
> 
> > Of course you could also guard all public APIs of Elasticsearch with
> securitymanager, but as you know, many stuff is very performance critical, so
> you cannot really guard everything. Of course in the future, all access to
> lower level "Apache Lucene" or internals can be shielded by the module
> system, but that's not yet possible (Elasticsearch has Java 8 as minimum
> requirement).
> 
> Which means you create a module at runtime in which those script classes
> reside in?
> 
> > Groovy has the functionality of blacklists (so you can exclude certain classes
> from being called by scripts), but Elasticsearch decided to design it's script
> language the other way round. It is solely "whitelist" based. In short, you
> cannot access anything from the Elasticsearch/Java code that is not
> explicitely whitelisted (that's called "Definition" in Painless):
> https://goo.gl/ehdjvh with those whitelists: https://goo.gl/adCU88
> 
> yes, that makes things more easy, can totally understand that.
> 
> [...]
> > BTW: There are also some goodies in this script engine: If it figures out at
> runtime, that Java 9 is used, the byte code generated does not use
> StringBuilder when concatting strings, instead it uses invokedynamic with
> StringConcatFactory. The code to do that is quite "generic" and applied with a
> subclass of ASM's GeneratorAdapter:
> https://github.com/elastic/elasticsearch/blob/6dadce47613a3c69d928940bc
> c1b2043e0a0184a/modules/lang-
> painless/src/main/java/org/elasticsearch/painless/MethodWriter.java#L238-
> L292 (if I have some time, I will extend it to use the better
> makeConcatWithConstants() to improve strings with many constant parts).
> 
> maybe we "steal" that ;)

No problem. Have fun studying it! Actually, it was not that hard to implement when serializing
the Antlr AST to stack-based bytecode. For StringConcatFactory you just have to record the
types of arguments while pushing them on stack and build the invokedynamic signature from
all of them at the end. For StringBuilder you do the type logic after every argument by taking
type and compiling to an append() call. There is only one thing: For StringConcatFactory you
have to limit number of arguments...: Once you have to many parts to concat on stack you call
invokedynamic and take the result of the call as first argument for next round.

> bye Jochen

Uwe



Mime
View raw message