Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id E0389200CAE for ; Wed, 21 Jun 2017 17:00:46 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id DEF4E160BD0; Wed, 21 Jun 2017 15:00:46 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 0A5DC160BE2 for ; Wed, 21 Jun 2017 17:00:45 +0200 (CEST) Received: (qmail 72838 invoked by uid 500); 21 Jun 2017 15:00:45 -0000 Mailing-List: contact dev-help@flink.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@flink.apache.org Delivered-To: mailing list dev@flink.apache.org Received: (qmail 72827 invoked by uid 99); 21 Jun 2017 15:00:45 -0000 Received: from mail-relay.apache.org (HELO mail-relay.apache.org) (140.211.11.15) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 21 Jun 2017 15:00:45 +0000 Received: from mail-io0-f169.google.com (mail-io0-f169.google.com [209.85.223.169]) by mail-relay.apache.org (ASF Mail Server at mail-relay.apache.org) with ESMTPSA id BAA171A00A2 for ; Wed, 21 Jun 2017 15:00:44 +0000 (UTC) Received: by mail-io0-f169.google.com with SMTP id k93so5323502ioi.2 for ; Wed, 21 Jun 2017 08:00:44 -0700 (PDT) X-Gm-Message-State: AKS2vOzxCFuQISKu7hyF4z2s628ifPy9XQRZxL8NsT1XlE0C64R9CPZs jBCw3KVaOA6ZqBZhRMMg3L4DV8VNQA== X-Received: by 10.107.12.219 with SMTP id 88mr31607193iom.191.1498057244108; Wed, 21 Jun 2017 08:00:44 -0700 (PDT) MIME-Version: 1.0 Received: by 10.79.33.9 with HTTP; Wed, 21 Jun 2017 08:00:23 -0700 (PDT) In-Reply-To: <33569446-bae8-103a-3ce1-53fd54073822@apache.org> References: <997c685c-d95b-b796-a183-a22d6d384946@apache.org> <0ba95b86-d9af-56e6-cd6b-cee5103a4308@apache.org> <33569446-bae8-103a-3ce1-53fd54073822@apache.org> From: Robert Metzger Date: Wed, 21 Jun 2017 17:00:23 +0200 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: [DISCUSS] Changing Flink's shading model To: "dev@flink.apache.org" Cc: Stephan Ewen Content-Type: multipart/alternative; boundary="001a113f8ebee3b8d7055279a1ff" archived-at: Wed, 21 Jun 2017 15:00:47 -0000 --001a113f8ebee3b8d7055279a1ff Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Okay, I'll request a repo for the shading. On Wed, Jun 21, 2017 at 1:38 PM, Chesnay Schepler wrote: > I like your suggestion Robert. A lot actually. > > Having the actual dependency version (i.e 18 for guava) in the version > should improve clarity a lot. > > Originally i intended to release 1 artifact per Flink version, with the > normal versioning scheme > that we use. But given that the shaded dependencies aren't changed often > (even rarely might be a stretch), > and aren't actually coupled to the Flink release cycle this doesn't make = a > lot of sense. > > Having separate repos looks like a reasonable separation of concerns. The > release for Flink itself > would work just like it does now; we don't have to modify any scripts or > do extra steps. > > Since the build, release and development process are separate (since > flink-shaded isn't part of Flink build > process, has a separate release process and changes to it will /never > /require immediate changes to Flink) > it seems like a very good candidate to move it into a separate repo. > > > On 21.06.2017 11:26, Robert Metzger wrote: > >> Its not completely clear to me how we want to version the shaded >> dependencies, and where we are putting them. >> >> One concern are the official apache release rules. If we want to release >> something to maven central, we need to do a proper vote over a source >> archive. >> I would propose to create a new repository "flink-shaded.git" that >> contains >> the following maven module structure: >> - flink-shaded: 1 >> - flink-shaded-asm: 1-5.2 >> - flink-shaded-guava: 1-18.0 >> - ... >> >> The number indicates the version (for ASM, I've just guessed). >> The version for the parent "flink-shaded" needs to be updated on each >> parent pom change (new module added, new / changed plugins, ...) >> >> We could create a separate release script in this repository that create= s >> the flink-shaded-src.zip from the code and deploys the artifacts to the >> maven staging area. >> >> The advantage of a separate repo would be that we don't need to maintain >> separate maven projects in the same git repo. >> Also, the src archives for the release vote can be created from the repo >> content (without much filtering). >> >> >> On Tue, Jun 20, 2017 at 9:44 PM, Stephan Ewen wrote: >> >> I like this approach. >>> >>> Two additional things can be mention here: >>> >>> - We need to deploy these artifacts independently and not as part of >>> the >>> build. That is a manual step once per "bump" in the dependency of that >>> library. >>> >>> - We reduce the shading complexity of the original build and should >>> thus >>> also speed up build times :-) >>> >>> Stephan >>> >>> >>> >>> >>> On Tue, Jun 20, 2017 at 1:15 PM, Chesnay Schepler >>> wrote: >>> >>> I would like to start working on this. >>>> >>>> I've looked into adding a flink-shaded-guava module. Working against t= he >>>> shaded namespaces seems >>>> to work without problems from the IDE, and we could forbid un-shaded >>>> usages with checkstyle. >>>> >>>> So for the list of dependencies that we want to shade we currently got= : >>>> >>>> * asm >>>> * guava >>>> * netty >>>> * hadoop >>>> * curator >>>> >>>> I've had a chat with Stephan Ewan and he brought up kryo + chill as >>>> well. >>>> >>>> The nice thing is that we can do this incrementally, one dependency at= a >>>> time. As such i would propose >>>> to go through the whole process for guava and see what problems arise. >>>> >>>> This would include adding a flink-shaded module and a child >>>> flink-shaded-guava module to the flink repository >>>> that are not part of the build process, replacing all usages of guava = in >>>> Flink, adding the >>>> checkstyle rule (optional) and deploying the artifact to maven central= . >>>> >>>> >>>> On 11.05.2017 10:54, Stephan Ewen wrote: >>>> >>>> @Ufuk - I have never set up artifact deployment in Maven, could need >>>>> >>>> some >>> >>>> help there. >>>>> >>>>> Regarding shading Netty, I agree, would be good to do that as well... >>>>> >>>>> On Thu, May 11, 2017 at 10:52 AM, Ufuk Celebi wrote: >>>>> >>>>> The advantages you've listed sound really compelling to me. >>>>> >>>>>> - Do you have time to implement these changes or do we need a >>>>>> >>>>> volunteer? >>> >>>> ;) >>>>>> >>>>>> - I assume that republishing the artifacts as you propose doesn't ha= ve >>>>>> any new legal implications since we already publish them with our >>>>>> JARs, right? >>>>>> >>>>>> - We might think about adding Netty to the list of shaded artifacts >>>>>> since some dependency conflicts were reported recently. Would have t= o >>>>>> double check the reported issues before doing that though. ;-) >>>>>> >>>>>> =E2=80=93 Ufuk >>>>>> >>>>>> >>>>>> On Wed, May 10, 2017 at 8:45 PM, Stephan Ewen >>>>>> >>>>> wrote: >>> >>>> @chesnay: I used ASM as an example in the proposal. Maybe I did not >>>>>>> >>>>>> say >>> >>>> that clearly. >>>>>>> >>>>>>> If we like that approach, we should deal with the other libraries (= at >>>>>>> >>>>>>> least >>>>>> >>>>>> the frequently used ones) in the same way. >>>>>>> >>>>>>> >>>>>>> I would imagine to have a project layout like that: >>>>>>> >>>>>>> flink-shaded-deps >>>>>>> - flink-shaded-asm >>>>>>> - flink-shaded-guava >>>>>>> - flink-shaded-curator >>>>>>> - flink-shaded-hadoop >>>>>>> >>>>>>> >>>>>>> "flink-shaded-deps" would not be built every time (and not be >>>>>>> released >>>>>>> every time), but only when needed. >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> On Wed, May 10, 2017 at 7:28 PM, Chesnay Schepler < >>>>>>> chesnay@apache.org >>>>>>> wrote: >>>>>>> >>>>>>> I like the idea, thank you for bringing it up. >>>>>>> >>>>>>>> Given that the raised problems aren't really ASM specific would it >>>>>>>> >>>>>>> make >>> >>>> sense to create one flink-shaded module that contains all frequently >>>>>>>> >>>>>>>> shaded >>>>>>> libraries? (or maybe even all shaded dependencies by core modules) >>>>>>> The >>>>>>> >>>>>>>> proposal limits the scope of this to ASM and i was wondering why. >>>>>>>> >>>>>>>> I also remember that there was a discussion recently about why we >>>>>>>> >>>>>>> shade >>> >>>> things at all, and the idea of working against the shaded namespaces >>>>>>>> was >>>>>>>> brought up. Back then i was expressing doubts as to whether IDE's >>>>>>>> >>>>>>> would >>> >>>> properly support this; what's the state on that? >>>>>>>> >>>>>>>> On 10.05.2017 18:18, Stephan Ewen wrote: >>>>>>>> >>>>>>>> Hi! >>>>>>>> >>>>>>>>> This is a discussion about altering the way we handle dependencie= s >>>>>>>>> >>>>>>>> and >>> >>>> shading in Flink. >>>>>>>>> I ran into quite a view problems trying to adjust / fix some >>>>>>>>> shading >>>>>>>>> issues >>>>>>>>> during release validation. >>>>>>>>> >>>>>>>>> The issue is tracked under: https://issues.apache.org/jira >>>>>>>>> /browse/FLINK-6529 >>>>>>>>> Bring this discussion thread up because it is a bigger issue >>>>>>>>> >>>>>>>>> *Problem* >>>>>>>>> >>>>>>>>> Currently, Flink shades dependencies like ASM and Guava into all >>>>>>>>> >>>>>>>> jars >>> >>>> of >>>>>>>> >>>>>>> projects that reference it and relocate the classes. >>>>>>> >>>>>>>> There are some drawbacks to that approach, let's discuss them at t= he >>>>>>>>> example of ASM: >>>>>>>>> >>>>>>>>> - The ASM classes are for example in flink-core, flink-java, >>>>>>>>> flink-scala, >>>>>>>>> flink-runtime, etc. >>>>>>>>> >>>>>>>>> - Users that reference these dependencies have the classes >>>>>>>>> multiple >>>>>>>>> times >>>>>>>>> in the classpath. That is unclean (works, through, because the >>>>>>>>> >>>>>>>> classes >>> >>>> are >>>>>>>> >>>>>>> identical). The same happens when building the final dist. jar. >>>>>>> >>>>>>>> - Some of these dependencies require to include license files >>>>>>>>> in >>>>>>>>> the >>>>>>>>> shaded jar. It is hard to impossible to build a good automatic >>>>>>>>> solution >>>>>>>>> for >>>>>>>>> that, partly due to Maven's very poor cross-project path support >>>>>>>>> >>>>>>>>> - Most importantly: Scala does not support shading really >>>>>>>>> well. >>>>>>>>> >>>>>>>>> Scala >>>>>>>> >>>>>>> classes have references to classes in more places than just the cla= ss >>>>>>> >>>>>>>> names >>>>>>>>> (apparently for Scala reflect support). Referencing a Scala proje= ct >>>>>>>>> >>>>>>>>> with >>>>>>>> >>>>>>> shaded ASM still requires to add a reference to unshaded ASM (at >>>>>>> least >>>>>>> >>>>>>>> as >>>>>>>> >>>>>>> a >>>>>>> >>>>>>>> compile dependency). >>>>>>>>> >>>>>>>>> *Proposal* >>>>>>>>> >>>>>>>>> I propose that we build and deploy a asm-flink-shaded version of >>>>>>>>> ASM >>>>>>>>> >>>>>>>>> and >>>>>>>> >>>>>>> directly program against the relocated namespaces. Since we never u= se >>>>>>> >>>>>>>> classes that we relocate in public interfaces, Flink users will >>>>>>>>> >>>>>>>> never >>> >>>> see >>>>>>>> >>>>>>> the relocated class names. Internally, it does not hurt to use them= . >>>>>>> >>>>>>>> - Proper maven dependency management, no hidden (shaded) >>>>>>>>> >>>>>>>>> dependencies >>>>>>>> >>>>>>> - One copy of each class for shaded dependencies >>>>>>> >>>>>>>> - Proper Scala interoperability >>>>>>>>> >>>>>>>>> - Natural License management (license is part of deployed >>>>>>>>> asm-flink-shaded jar) >>>>>>>>> >>>>>>>>> >>>>>>>>> Happy to hear thoughts! >>>>>>>>> >>>>>>>>> Stephan >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> > --001a113f8ebee3b8d7055279a1ff--