flink-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stephan Ewen <se...@apache.org>
Subject Re: [DISCUSS] Changing Flink's shading model
Date Tue, 20 Jun 2017 19:44:51 GMT
I like this approach.

Two additional things can be mention here:

  - We need to deploy these artifacts independently and not as part of the
build. That is a manual step once per "bump" in the dependency of that
library.

  - We reduce the shading complexity of the original build and should thus
also speed up build times :-)

Stephan




On Tue, Jun 20, 2017 at 1:15 PM, Chesnay Schepler <chesnay@apache.org>
wrote:

> I would like to start working on this.
>
> I've looked into adding a flink-shaded-guava module. Working against the
> shaded namespaces seems
> to work without problems from the IDE, and we could forbid un-shaded
> usages with checkstyle.
>
> So for the list of dependencies that we want to shade we currently got:
>
>  * asm
>  * guava
>  * netty
>  * hadoop
>  * curator
>
> I've had a chat with Stephan Ewan and he brought up kryo + chill as well.
>
> The nice thing is that we can do this incrementally, one dependency at a
> time. As such i would propose
> to go through the whole process for guava and see what problems arise.
>
> This would include adding a flink-shaded module and a child
> flink-shaded-guava module to the flink repository
> that are not part of the build process, replacing all usages of guava in
> Flink, adding the
> checkstyle rule (optional) and deploying the artifact to maven central.
>
>
> On 11.05.2017 10:54, Stephan Ewen wrote:
>
>> @Ufuk  - I have never set up artifact deployment in Maven, could need some
>> help there.
>>
>> Regarding shading Netty, I agree, would be good to do that as well...
>>
>> On Thu, May 11, 2017 at 10:52 AM, Ufuk Celebi <uce@apache.org> wrote:
>>
>> The advantages you've listed sound really compelling to me.
>>>
>>> - Do you have time to implement these changes or do we need a volunteer?
>>> ;)
>>>
>>> - I assume that republishing the artifacts as you propose doesn't have
>>> any new legal implications since we already publish them with our
>>> JARs, right?
>>>
>>> - We might think about adding Netty to the list of shaded artifacts
>>> since some dependency conflicts were reported recently. Would have to
>>> double check the reported issues before doing that though. ;-)
>>>
>>> – Ufuk
>>>
>>>
>>> On Wed, May 10, 2017 at 8:45 PM, Stephan Ewen <sewen@apache.org> wrote:
>>>
>>>> @chesnay: I used ASM as an example in the proposal. Maybe I did not say
>>>> that clearly.
>>>>
>>>> If we like that approach, we should deal with the other libraries (at
>>>>
>>> least
>>>
>>>> the frequently used ones) in the same way.
>>>>
>>>>
>>>> I would imagine to have a project layout like that:
>>>>
>>>> flink-shaded-deps
>>>>    - flink-shaded-asm
>>>>    - flink-shaded-guava
>>>>    - flink-shaded-curator
>>>>    - flink-shaded-hadoop
>>>>
>>>>
>>>> "flink-shaded-deps" would not be built every time (and not be released
>>>> every time), but only when needed.
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> On Wed, May 10, 2017 at 7:28 PM, Chesnay Schepler <chesnay@apache.org>
>>>> wrote:
>>>>
>>>> I like the idea, thank you for bringing it up.
>>>>>
>>>>> Given that the raised problems aren't really ASM specific would it make
>>>>> sense to create one flink-shaded module that contains all frequently
>>>>>
>>>> shaded
>>>
>>>> libraries? (or maybe even all shaded dependencies by core modules) The
>>>>> proposal limits the scope of this to ASM and i was wondering why.
>>>>>
>>>>> I also remember that there was a discussion recently about why we shade
>>>>> things at all, and the idea of working against the shaded namespaces
>>>>> was
>>>>> brought up. Back then i was expressing doubts as to whether IDE's would
>>>>> properly support this; what's the state on that?
>>>>>
>>>>> On 10.05.2017 18:18, Stephan Ewen wrote:
>>>>>
>>>>> Hi!
>>>>>>
>>>>>> This is a discussion about altering the way we handle dependencies
and
>>>>>> shading in Flink.
>>>>>> I ran into quite a view problems trying to adjust / fix some shading
>>>>>> issues
>>>>>> during release validation.
>>>>>>
>>>>>> The issue is tracked under: https://issues.apache.org/jira
>>>>>> /browse/FLINK-6529
>>>>>> Bring this discussion thread up because it is a bigger issue
>>>>>>
>>>>>> *Problem*
>>>>>>
>>>>>> Currently, Flink shades dependencies like ASM and Guava into all
jars
>>>>>>
>>>>> of
>>>
>>>> projects that reference it and relocate the classes.
>>>>>>
>>>>>> There are some drawbacks to that approach, let's discuss them at
the
>>>>>> example of ASM:
>>>>>>
>>>>>>     - The ASM classes are for example in flink-core, flink-java,
>>>>>> flink-scala,
>>>>>> flink-runtime, etc.
>>>>>>
>>>>>>     - Users that reference these dependencies have the classes
>>>>>> multiple
>>>>>> times
>>>>>> in the classpath. That is unclean (works, through, because the classes
>>>>>>
>>>>> are
>>>
>>>> identical). The same happens when building the final dist. jar.
>>>>>>
>>>>>>     - Some of these dependencies require to include license files
in
>>>>>> the
>>>>>> shaded jar. It is hard to impossible to build a good automatic
>>>>>> solution
>>>>>> for
>>>>>> that, partly due to Maven's very poor cross-project path support
>>>>>>
>>>>>>     - Most importantly: Scala does not support shading really well.
>>>>>>
>>>>> Scala
>>>
>>>> classes have references to classes in more places than just the class
>>>>>> names
>>>>>> (apparently for Scala reflect support). Referencing a Scala project
>>>>>>
>>>>> with
>>>
>>>> shaded ASM still requires to add a reference to unshaded ASM (at least
>>>>>>
>>>>> as
>>>
>>>> a
>>>>>> compile dependency).
>>>>>>
>>>>>> *Proposal*
>>>>>>
>>>>>> I propose that we build and deploy a asm-flink-shaded version of
ASM
>>>>>>
>>>>> and
>>>
>>>> directly program against the relocated namespaces. Since we never use
>>>>>> classes that we relocate in public interfaces, Flink users will never
>>>>>>
>>>>> see
>>>
>>>> the relocated class names. Internally, it does not hurt to use them.
>>>>>>
>>>>>>     - Proper maven dependency management, no hidden (shaded)
>>>>>>
>>>>> dependencies
>>>
>>>>     - One copy of each class for shaded dependencies
>>>>>>
>>>>>>     - Proper Scala interoperability
>>>>>>
>>>>>>     - Natural License management (license is part of deployed
>>>>>> asm-flink-shaded jar)
>>>>>>
>>>>>>
>>>>>> Happy to hear thoughts!
>>>>>>
>>>>>> Stephan
>>>>>>
>>>>>>
>>>>>>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message