Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id AC834200CBC for ; Tue, 20 Jun 2017 21:45:09 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id AA89E160BE1; Tue, 20 Jun 2017 19:45:09 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 71400160BCC for ; Tue, 20 Jun 2017 21:45:08 +0200 (CEST) Received: (qmail 66107 invoked by uid 500); 20 Jun 2017 19:45:07 -0000 Mailing-List: contact dev-help@flink.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@flink.apache.org Delivered-To: mailing list dev@flink.apache.org Received: (qmail 66096 invoked by uid 99); 20 Jun 2017 19:45:07 -0000 Received: from mail-relay.apache.org (HELO mail-relay.apache.org) (140.211.11.15) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 20 Jun 2017 19:45:07 +0000 Received: from mail-pf0-f182.google.com (mail-pf0-f182.google.com [209.85.192.182]) by mail-relay.apache.org (ASF Mail Server at mail-relay.apache.org) with ESMTPSA id 3F3E01A00C5 for ; Tue, 20 Jun 2017 19:45:07 +0000 (UTC) Received: by mail-pf0-f182.google.com with SMTP id c73so13821962pfk.2 for ; Tue, 20 Jun 2017 12:45:07 -0700 (PDT) X-Gm-Message-State: AKS2vOzzv1wF3HZXWmfL4QHwk4MWMn35vFAJ9oCjRr6ElvYHYWmvBx9a tlrrHVHQeuqeRIuv+fmo2FzNkj764g== X-Received: by 10.98.38.129 with SMTP id m123mr1945248pfm.183.1497987906788; Tue, 20 Jun 2017 12:45:06 -0700 (PDT) MIME-Version: 1.0 Received: by 10.100.161.76 with HTTP; Tue, 20 Jun 2017 12:44:51 -0700 (PDT) In-Reply-To: <0ba95b86-d9af-56e6-cd6b-cee5103a4308@apache.org> References: <997c685c-d95b-b796-a183-a22d6d384946@apache.org> <0ba95b86-d9af-56e6-cd6b-cee5103a4308@apache.org> From: Stephan Ewen Date: Tue, 20 Jun 2017 21:44:51 +0200 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: [DISCUSS] Changing Flink's shading model To: "dev@flink.apache.org" Content-Type: multipart/alternative; boundary="94eb2c0c9f4e0fe68c0552697de2" archived-at: Tue, 20 Jun 2017 19:45:09 -0000 --94eb2c0c9f4e0fe68c0552697de2 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable I like this approach. Two additional things can be mention here: - We need to deploy these artifacts independently and not as part of the build. That is a manual step once per "bump" in the dependency of that library. - We reduce the shading complexity of the original build and should thus also speed up build times :-) Stephan On Tue, Jun 20, 2017 at 1:15 PM, Chesnay Schepler wrote: > I would like to start working on this. > > I've looked into adding a flink-shaded-guava module. Working against the > shaded namespaces seems > to work without problems from the IDE, and we could forbid un-shaded > usages with checkstyle. > > So for the list of dependencies that we want to shade we currently got: > > * asm > * guava > * netty > * hadoop > * curator > > I've had a chat with Stephan Ewan and he brought up kryo + chill as well. > > The nice thing is that we can do this incrementally, one dependency at a > time. As such i would propose > to go through the whole process for guava and see what problems arise. > > This would include adding a flink-shaded module and a child > flink-shaded-guava module to the flink repository > that are not part of the build process, replacing all usages of guava in > Flink, adding the > checkstyle rule (optional) and deploying the artifact to maven central. > > > On 11.05.2017 10:54, Stephan Ewen wrote: > >> @Ufuk - I have never set up artifact deployment in Maven, could need so= me >> help there. >> >> Regarding shading Netty, I agree, would be good to do that as well... >> >> On Thu, May 11, 2017 at 10:52 AM, Ufuk Celebi wrote: >> >> The advantages you've listed sound really compelling to me. >>> >>> - Do you have time to implement these changes or do we need a volunteer= ? >>> ;) >>> >>> - I assume that republishing the artifacts as you propose doesn't have >>> any new legal implications since we already publish them with our >>> JARs, right? >>> >>> - We might think about adding Netty to the list of shaded artifacts >>> since some dependency conflicts were reported recently. Would have to >>> double check the reported issues before doing that though. ;-) >>> >>> =E2=80=93 Ufuk >>> >>> >>> On Wed, May 10, 2017 at 8:45 PM, Stephan Ewen wrote: >>> >>>> @chesnay: I used ASM as an example in the proposal. Maybe I did not sa= y >>>> that clearly. >>>> >>>> If we like that approach, we should deal with the other libraries (at >>>> >>> least >>> >>>> the frequently used ones) in the same way. >>>> >>>> >>>> I would imagine to have a project layout like that: >>>> >>>> flink-shaded-deps >>>> - flink-shaded-asm >>>> - flink-shaded-guava >>>> - flink-shaded-curator >>>> - flink-shaded-hadoop >>>> >>>> >>>> "flink-shaded-deps" would not be built every time (and not be released >>>> every time), but only when needed. >>>> >>>> >>>> >>>> >>>> >>>> >>>> On Wed, May 10, 2017 at 7:28 PM, Chesnay Schepler >>>> wrote: >>>> >>>> I like the idea, thank you for bringing it up. >>>>> >>>>> Given that the raised problems aren't really ASM specific would it ma= ke >>>>> sense to create one flink-shaded module that contains all frequently >>>>> >>>> shaded >>> >>>> libraries? (or maybe even all shaded dependencies by core modules) The >>>>> proposal limits the scope of this to ASM and i was wondering why. >>>>> >>>>> I also remember that there was a discussion recently about why we sha= de >>>>> things at all, and the idea of working against the shaded namespaces >>>>> was >>>>> brought up. Back then i was expressing doubts as to whether IDE's wou= ld >>>>> properly support this; what's the state on that? >>>>> >>>>> On 10.05.2017 18:18, Stephan Ewen wrote: >>>>> >>>>> Hi! >>>>>> >>>>>> This is a discussion about altering the way we handle dependencies a= nd >>>>>> shading in Flink. >>>>>> I ran into quite a view problems trying to adjust / fix some shading >>>>>> issues >>>>>> during release validation. >>>>>> >>>>>> The issue is tracked under: https://issues.apache.org/jira >>>>>> /browse/FLINK-6529 >>>>>> Bring this discussion thread up because it is a bigger issue >>>>>> >>>>>> *Problem* >>>>>> >>>>>> Currently, Flink shades dependencies like ASM and Guava into all jar= s >>>>>> >>>>> of >>> >>>> projects that reference it and relocate the classes. >>>>>> >>>>>> There are some drawbacks to that approach, let's discuss them at the >>>>>> example of ASM: >>>>>> >>>>>> - The ASM classes are for example in flink-core, flink-java, >>>>>> flink-scala, >>>>>> flink-runtime, etc. >>>>>> >>>>>> - Users that reference these dependencies have the classes >>>>>> multiple >>>>>> times >>>>>> in the classpath. That is unclean (works, through, because the class= es >>>>>> >>>>> are >>> >>>> identical). The same happens when building the final dist. jar. >>>>>> >>>>>> - Some of these dependencies require to include license files in >>>>>> the >>>>>> shaded jar. It is hard to impossible to build a good automatic >>>>>> solution >>>>>> for >>>>>> that, partly due to Maven's very poor cross-project path support >>>>>> >>>>>> - Most importantly: Scala does not support shading really well. >>>>>> >>>>> Scala >>> >>>> classes have references to classes in more places than just the class >>>>>> names >>>>>> (apparently for Scala reflect support). Referencing a Scala project >>>>>> >>>>> with >>> >>>> shaded ASM still requires to add a reference to unshaded ASM (at least >>>>>> >>>>> as >>> >>>> a >>>>>> compile dependency). >>>>>> >>>>>> *Proposal* >>>>>> >>>>>> I propose that we build and deploy a asm-flink-shaded version of ASM >>>>>> >>>>> and >>> >>>> directly program against the relocated namespaces. Since we never use >>>>>> classes that we relocate in public interfaces, Flink users will neve= r >>>>>> >>>>> see >>> >>>> the relocated class names. Internally, it does not hurt to use them. >>>>>> >>>>>> - Proper maven dependency management, no hidden (shaded) >>>>>> >>>>> dependencies >>> >>>> - One copy of each class for shaded dependencies >>>>>> >>>>>> - Proper Scala interoperability >>>>>> >>>>>> - Natural License management (license is part of deployed >>>>>> asm-flink-shaded jar) >>>>>> >>>>>> >>>>>> Happy to hear thoughts! >>>>>> >>>>>> Stephan >>>>>> >>>>>> >>>>>> > --94eb2c0c9f4e0fe68c0552697de2--