From dev-return-4508-archive-asf-public=cust-asf.ponee.io@groovy.apache.org Mon Apr 2 12:11:37 2018 Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by mx-eu-01.ponee.io (Postfix) with SMTP id 7212C180627 for ; Mon, 2 Apr 2018 12:11:36 +0200 (CEST) Received: (qmail 87004 invoked by uid 500); 2 Apr 2018 10:11:35 -0000 Mailing-List: contact dev-help@groovy.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@groovy.apache.org Delivered-To: mailing list dev@groovy.apache.org Received: (qmail 86994 invoked by uid 99); 2 Apr 2018 10:11:35 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd4-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 02 Apr 2018 10:11:35 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd4-us-west.apache.org (ASF Mail Server at spamd4-us-west.apache.org) with ESMTP id 12EBBC013E for ; Mon, 2 Apr 2018 10:11:35 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd4-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -13.011 X-Spam-Level: X-Spam-Status: No, score=-13.011 tagged_above=-999 required=6.31 tests=[ENV_AND_HDR_SPF_MATCH=-0.5, KAM_SHORT=0.001, RCVD_IN_DNSWL_HI=-5, RCVD_IN_MSPIKE_H2=-0.001, SPF_PASS=-0.001, T_RP_MATCHES_RCVD=-0.01, USER_IN_DEF_SPF_WL=-7.5] autolearn=disabled Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd4-us-west.apache.org [10.40.0.11]) (amavisd-new, port 10024) with ESMTP id lNmT1ecl6X9j for ; Mon, 2 Apr 2018 10:11:32 +0000 (UTC) Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with SMTP id 2D0B85F1B4 for ; Mon, 2 Apr 2018 10:11:32 +0000 (UTC) Received: (qmail 86661 invoked by uid 99); 2 Apr 2018 10:11:31 -0000 Received: from mail-relay.apache.org (HELO mailrelay1-lw-us.apache.org) (207.244.88.152) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 02 Apr 2018 10:11:31 +0000 Received: from VEGA (p4FC05417.dip0.t-ipconnect.de [79.192.84.23]) by mailrelay1-lw-us.apache.org (ASF Mail Server at mailrelay1-lw-us.apache.org) with ESMTPSA id 721892A9F; Mon, 2 Apr 2018 10:11:29 +0000 (UTC) From: "Uwe Schindler" To: , "'Jochen Theodorou'" , References: <3d869294-595c-5e7e-fb5e-ebc43a6c67dc@gmx.org> <0fb38e3b-7b8f-ccb5-af83-dac0e6bab999@arscreat.com> In-Reply-To: <0fb38e3b-7b8f-ccb5-af83-dac0e6bab999@arscreat.com> Subject: RE: new MOP under Java9 module system findings Date: Mon, 2 Apr 2018 12:11:24 +0200 Message-ID: <029f01d3ca6a$f66eb170$e34c1450$@apache.org> MIME-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Mailer: Microsoft Outlook 16.0 Content-Language: de Thread-Index: AQFEp0lEYY/PlOXhFQGar/UKNeQfwwH2C/AvAWIqMrECEIysmKTgL5Yg Hi, > I think I found the article I was referring to: > https://www.compose.com/articles/elasticsearch-security-update-groovy- > scripting-dropped/ > (2015-03): > "After talking with the Groovy developers, Elasticsearch have decided > that Groovy will never be sufficiently safe in a sandbox and have > removed it from the list of sandboxed languages." >=20 > ( I have used Solr, but not Elasticsearch. It seems they now have > "Painless", a Groovy-subset-compatible language as their main = scripting > language: > https://www.elastic.co/guide/en/elasticsearch/painless/6.2/painless- > specification.html > ... ) Yes, that script language was designed from scratch with only = invokedynamic and MethodHandles in mind. It also uses a pure = whitelist-approach (explicit whitelisting what you can call, so no = blacklists needed), but still allows most groovy syntax. As = Elasticsearch is very performance critical (e.g. you may call a script = for every search result which may be up to billion of times per search = execution), we tried to avoid boxing and dynamic lookups by using = extensive caching in the indy call sites. On the other hand it still = allows the "def" type. The compiler tries to guess most types during = compilation and only if it cannot guess the type it writes an = invokedynamic. But nevertheless, by using invokedynamic with good caches = (we go up to megamorphic if needed!) the slowdown by using invokedynamic = for most scripts is neglectible!!! (less than 10% when full hotspot = compilation is done). The main reason for creating a new script language = was not only security (this was just good for selling it to users at = that time), but also speed. Groovy behaves very bad when types are = variying and especially in the "def" case it uses way too much = reflection (the invokedynamic part has no good caches and still uses = reflection all the time on the dynamic lookup in the call site type = handler). The script language also supports lambdas and method references, but in = contrast to Groovy, those are lambdas, not closures! So it uses = LambdaMetaFactory (actually it uses it's own implementation, because = LambdaMetaFactory is too strict with types and has some bugs in Java 9 = that make it behave differently than the spec; java code does not see = this, but dynamic languages break heavily with Java 9 when they use = LambdaMetaFactory). As said before, the compiler tries to guess all types at compile time = and passes the native types to the invokedynamic, unless it can directly = generate a real method call (everything is known). For dynamic ("def") = stuff it delegates everything to DefBootstrap (https://goo.gl/DdNJJG), = which has several modes: - simple method calls: here it uses a polymorphic cache with up to 5 = receiver items (the usual guardWithTest chain using MethodHandles). Once = it sees more receivers, it reverts the whole cache and goes megamorphic = (it builds a methodhandle that calls ClassValue.get on every call). This = is slower, but better than resolving every time. Keep in mind: This one = has no bugs with ClassValue, as all seen types are local and can be = cleaned up by GC correctly. The problems of Groovy with ClassValue are = homemade, sorry. ClassValue works perfectly, if you do not misuse it!!! = Unfortunately, Groovy has a monomorphic cache only, which was one reason = why groovy was behaving bad in Elasticsearch! - fields, array loads/stores: same as method calls, it just translates = to a methodhandle like for simple method calls (including all caches), = as mentioned above. - (binary-)operators: As you have mostly 2 variable types here, it's = hard to build good caches. So it goes for a monomorphic cache here. In = most cases this is not an issue, as operator constructs don't have = varying types (unless both sides are DEF). Still, once seen at runtime = it stays constant for most cases. - references (these are method references): Like in Java, lambdas = (with/without captures) are also compiled to static/instance methods in = the script class. Lambdas are also tried to be resolved at compile time, but you can also = call a method taking a functional interface on the "def" type! In that = case the above "references" variant is used (the DefBootstrap then = delegates after type resolving to the LambdaBootstrap). At the end of an = invokedynamic call, field access, array lookup, map lookup, list lookup, = lambda lookup is just a compiled method handle including all MIC, PIC, = Megamorphic caching logic (using those many methods in Java's = "MethodHandles" class for combining MHs)! Except for Bootstrapping = method references, no byte code is generated at runtime (of course java = runtime does it, but we do not). We just combine methodhandles! Most static definitions that are used in created method handles are part = of the "Def" (https://goo.gl/g7GD5t) and "DefMath" = (https://goo.gl/AN4SJ1) classes. If you look at Def.java, you also see = many cases where it backports some MethodHandles stuff only available in = Java 9+, but falls back to Java 9 native code when Java 9 is used. = Actually some methods missing in Java 8's MethodHandles implementation = were added to Java 9, because of our feedback (e.g. array length lookup = for implementing array[].length using invokedynamic). One thing: Because of the strict types and complexity/slowness with = caching, Painless does not allow method overloading. The above whitelist = uses alias names for methods that use overloading. One famous example = are the group() methods on regex matchers (named, unnamed groups). Some = users complained, but that's under control. The whitelist also adds = addon methods (like Groovy) to some java classes in the same way like = aliases. > > In my opinion that project is wrong, because the security manager > > mechanisms provide enough protection. The problem is that rarely > > anyone can use a security manager properly. Anyway... Groovy won't = be > > able to do any call Java cannot do in principle in this version. = That > > is not because of keeping security in mind, that is more because of > > the module system, that enforces this This is partly right, but also not always applicable in that simple = form. Elasticsearch uses SecurityManager to encapsulate all of its = plugins, script engines,... and also itsself. It prevents stuff like = deep reflection (setAccessible is completely disallowed anywhere in = Elasticsearch) or System.exit/halt(). File system is also restricted to = not escape config/index folders. I think Elasticsearch's implementation = of SecurityManager is one ogf those implementation that really work = correctly. It opened a lot of bug reports in foreign projects to fix = their code (e.g, missing doPrivileged or calls to setAccessible without = proper try/catch). Of course this helps to make Elasticsearch safe for = the typical security issues (code escapes sandbox). But on the other = hand it does not help, if code in a script calls some public class in = Elasticsearchs's core and executes a command to delete an index from = inside a query script. Of course you could also guard all public APIs of Elasticsearch with = securitymanager, but as you know, many stuff is very performance = critical, so you cannot really guard everything. Of course in the = future, all access to lower level "Apache Lucene" or internals can be = shielded by the module system, but that's not yet possible = (Elasticsearch has Java 8 as minimum requirement). Groovy has the functionality of blacklists (so you can exclude certain = classes from being called by scripts), but Elasticsearch decided to = design it's script language the other way round. It is solely = "whitelist" based. In short, you cannot access anything from the = Elasticsearch/Java code that is not explicitely whitelisted (that's = called "Definition" in Painless): https://goo.gl/ehdjvh with those = whitelists: https://goo.gl/adCU88 > Whatever the reason it will be more secure, my line of thought was, = that > if some (clever) way to work around this exists, to maybe still not go > along that route, even though I don't like to see this feature go and > people who use this in tests will surely miss it... > (Of course that argument is mute if they could have fixed their = problem > through proper security manager use). I agree with that! That was the main reason to redesign the script = language in Painless. It was made mostly compatible to Groovy, but the = backend and compiler was designed to use MethodHandles and Invokedynamic = only. There is no reflection inside (some parts of the above definition = use unreflect, but that's more on startup only when building the = whitelist graph, because reflection allows to better inspect stuff at = runtime, while MethodHandles are typed from the beginning - you cannot = get a methodhandle without exact types). During execution of a script = there is nowhere a call to any reflective field/method nor is there = setAccessible anywhere. The Invokedynamic call site does everything = (includes caching) with MethodHandles: DefBootstrap, LambdaBootstrap. BTW: There are also some goodies in this script engine: If it figures = out at runtime, that Java 9 is used, the byte code generated does not = use StringBuilder when concatting strings, instead it uses invokedynamic = with StringConcatFactory. The code to do that is quite "generic" and = applied with a subclass of ASM's GeneratorAdapter: = https://github.com/elastic/elasticsearch/blob/6dadce47613a3c69d928940bcc1= b2043e0a0184a/modules/lang-painless/src/main/java/org/elasticsearch/painl= ess/MethodWriter.java#L238-L292 (if I have some time, I will extend it = to use the better makeConcatWithConstants() to improve strings with many = constant parts). Uwe