flink-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Robert Metzger <rmetz...@apache.org>
Subject Re: [DISCUSS] Changing Flink's shading model
Date Wed, 21 Jun 2017 15:00:23 GMT
Okay, I'll request a repo for the shading.

On Wed, Jun 21, 2017 at 1:38 PM, Chesnay Schepler <chesnay@apache.org>
wrote:

> I like your suggestion Robert. A lot actually.
>
> Having the actual dependency version (i.e 18 for guava) in the version
> should improve clarity a lot.
>
> Originally i intended to release 1 artifact per Flink version, with the
> normal versioning scheme
> that we use. But given that the shaded dependencies aren't changed often
> (even rarely might be a stretch),
> and aren't actually coupled to the Flink release cycle this doesn't make a
> lot of sense.
>
> Having separate repos looks like a reasonable separation of concerns. The
> release for Flink itself
> would work just like it does now; we don't have to modify any scripts or
> do extra steps.
>
> Since the build, release and development process are separate (since
> flink-shaded isn't part of Flink build
> process, has a separate release process and changes to it will /never
> /require immediate changes to Flink)
> it seems like a very good candidate to move it into a separate repo.
>
>
> On 21.06.2017 11:26, Robert Metzger wrote:
>
>> Its not completely clear to me how we want to version the shaded
>> dependencies, and where we are putting them.
>>
>> One concern are the official apache release rules. If we want to release
>> something to maven central, we need to do a proper vote over a source
>> archive.
>> I would propose to create a new repository "flink-shaded.git" that
>> contains
>> the following maven module structure:
>> - flink-shaded: 1
>>     - flink-shaded-asm: 1-5.2
>>     - flink-shaded-guava: 1-18.0
>>     - ...
>>
>> The number indicates the version (for ASM, I've just guessed).
>> The version for the parent "flink-shaded" needs to be updated on each
>> parent pom change (new module added, new / changed plugins, ...)
>>
>> We could create a separate release script in this repository that creates
>> the flink-shaded-src.zip from the code and deploys the artifacts to the
>> maven staging area.
>>
>> The advantage of a separate repo would be that we don't need to maintain
>> separate maven projects in the same git repo.
>> Also, the src archives for the release vote can be created from the repo
>> content (without much filtering).
>>
>>
>> On Tue, Jun 20, 2017 at 9:44 PM, Stephan Ewen <sewen@apache.org> wrote:
>>
>> I like this approach.
>>>
>>> Two additional things can be mention here:
>>>
>>>    - We need to deploy these artifacts independently and not as part of
>>> the
>>> build. That is a manual step once per "bump" in the dependency of that
>>> library.
>>>
>>>    - We reduce the shading complexity of the original build and should
>>> thus
>>> also speed up build times :-)
>>>
>>> Stephan
>>>
>>>
>>>
>>>
>>> On Tue, Jun 20, 2017 at 1:15 PM, Chesnay Schepler <chesnay@apache.org>
>>> wrote:
>>>
>>> I would like to start working on this.
>>>>
>>>> I've looked into adding a flink-shaded-guava module. Working against the
>>>> shaded namespaces seems
>>>> to work without problems from the IDE, and we could forbid un-shaded
>>>> usages with checkstyle.
>>>>
>>>> So for the list of dependencies that we want to shade we currently got:
>>>>
>>>>   * asm
>>>>   * guava
>>>>   * netty
>>>>   * hadoop
>>>>   * curator
>>>>
>>>> I've had a chat with Stephan Ewan and he brought up kryo + chill as
>>>> well.
>>>>
>>>> The nice thing is that we can do this incrementally, one dependency at a
>>>> time. As such i would propose
>>>> to go through the whole process for guava and see what problems arise.
>>>>
>>>> This would include adding a flink-shaded module and a child
>>>> flink-shaded-guava module to the flink repository
>>>> that are not part of the build process, replacing all usages of guava in
>>>> Flink, adding the
>>>> checkstyle rule (optional) and deploying the artifact to maven central.
>>>>
>>>>
>>>> On 11.05.2017 10:54, Stephan Ewen wrote:
>>>>
>>>> @Ufuk  - I have never set up artifact deployment in Maven, could need
>>>>>
>>>> some
>>>
>>>> help there.
>>>>>
>>>>> Regarding shading Netty, I agree, would be good to do that as well...
>>>>>
>>>>> On Thu, May 11, 2017 at 10:52 AM, Ufuk Celebi <uce@apache.org>
wrote:
>>>>>
>>>>> The advantages you've listed sound really compelling to me.
>>>>>
>>>>>> - Do you have time to implement these changes or do we need a
>>>>>>
>>>>> volunteer?
>>>
>>>> ;)
>>>>>>
>>>>>> - I assume that republishing the artifacts as you propose doesn't
have
>>>>>> any new legal implications since we already publish them with our
>>>>>> JARs, right?
>>>>>>
>>>>>> - We might think about adding Netty to the list of shaded artifacts
>>>>>> since some dependency conflicts were reported recently. Would have
to
>>>>>> double check the reported issues before doing that though. ;-)
>>>>>>
>>>>>> – Ufuk
>>>>>>
>>>>>>
>>>>>> On Wed, May 10, 2017 at 8:45 PM, Stephan Ewen <sewen@apache.org>
>>>>>>
>>>>> wrote:
>>>
>>>> @chesnay: I used ASM as an example in the proposal. Maybe I did not
>>>>>>>
>>>>>> say
>>>
>>>> that clearly.
>>>>>>>
>>>>>>> If we like that approach, we should deal with the other libraries
(at
>>>>>>>
>>>>>>> least
>>>>>>
>>>>>> the frequently used ones) in the same way.
>>>>>>>
>>>>>>>
>>>>>>> I would imagine to have a project layout like that:
>>>>>>>
>>>>>>> flink-shaded-deps
>>>>>>>     - flink-shaded-asm
>>>>>>>     - flink-shaded-guava
>>>>>>>     - flink-shaded-curator
>>>>>>>     - flink-shaded-hadoop
>>>>>>>
>>>>>>>
>>>>>>> "flink-shaded-deps" would not be built every time (and not be
>>>>>>> released
>>>>>>> every time), but only when needed.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Wed, May 10, 2017 at 7:28 PM, Chesnay Schepler <
>>>>>>> chesnay@apache.org
>>>>>>> wrote:
>>>>>>>
>>>>>>> I like the idea, thank you for bringing it up.
>>>>>>>
>>>>>>>> Given that the raised problems aren't really ASM specific
would it
>>>>>>>>
>>>>>>> make
>>>
>>>> sense to create one flink-shaded module that contains all frequently
>>>>>>>>
>>>>>>>> shaded
>>>>>>> libraries? (or maybe even all shaded dependencies by core modules)
>>>>>>> The
>>>>>>>
>>>>>>>> proposal limits the scope of this to ASM and i was wondering
why.
>>>>>>>>
>>>>>>>> I also remember that there was a discussion recently about
why we
>>>>>>>>
>>>>>>> shade
>>>
>>>> things at all, and the idea of working against the shaded namespaces
>>>>>>>> was
>>>>>>>> brought up. Back then i was expressing doubts as to whether
IDE's
>>>>>>>>
>>>>>>> would
>>>
>>>> properly support this; what's the state on that?
>>>>>>>>
>>>>>>>> On 10.05.2017 18:18, Stephan Ewen wrote:
>>>>>>>>
>>>>>>>> Hi!
>>>>>>>>
>>>>>>>>> This is a discussion about altering the way we handle
dependencies
>>>>>>>>>
>>>>>>>> and
>>>
>>>> shading in Flink.
>>>>>>>>> I ran into quite a view problems trying to adjust / fix
some
>>>>>>>>> shading
>>>>>>>>> issues
>>>>>>>>> during release validation.
>>>>>>>>>
>>>>>>>>> The issue is tracked under: https://issues.apache.org/jira
>>>>>>>>> /browse/FLINK-6529
>>>>>>>>> Bring this discussion thread up because it is a bigger
issue
>>>>>>>>>
>>>>>>>>> *Problem*
>>>>>>>>>
>>>>>>>>> Currently, Flink shades dependencies like ASM and Guava
into all
>>>>>>>>>
>>>>>>>> jars
>>>
>>>> of
>>>>>>>>
>>>>>>> projects that reference it and relocate the classes.
>>>>>>>
>>>>>>>> There are some drawbacks to that approach, let's discuss
them at the
>>>>>>>>> example of ASM:
>>>>>>>>>
>>>>>>>>>      - The ASM classes are for example in flink-core,
flink-java,
>>>>>>>>> flink-scala,
>>>>>>>>> flink-runtime, etc.
>>>>>>>>>
>>>>>>>>>      - Users that reference these dependencies have the
classes
>>>>>>>>> multiple
>>>>>>>>> times
>>>>>>>>> in the classpath. That is unclean (works, through, because
the
>>>>>>>>>
>>>>>>>> classes
>>>
>>>> are
>>>>>>>>
>>>>>>> identical). The same happens when building the final dist. jar.
>>>>>>>
>>>>>>>>      - Some of these dependencies require to include license
files
>>>>>>>>> in
>>>>>>>>> the
>>>>>>>>> shaded jar. It is hard to impossible to build a good
automatic
>>>>>>>>> solution
>>>>>>>>> for
>>>>>>>>> that, partly due to Maven's very poor cross-project path
support
>>>>>>>>>
>>>>>>>>>      - Most importantly: Scala does not support shading
really
>>>>>>>>> well.
>>>>>>>>>
>>>>>>>>> Scala
>>>>>>>>
>>>>>>> classes have references to classes in more places than just the
class
>>>>>>>
>>>>>>>> names
>>>>>>>>> (apparently for Scala reflect support). Referencing a
Scala project
>>>>>>>>>
>>>>>>>>> with
>>>>>>>>
>>>>>>> shaded ASM still requires to add a reference to unshaded ASM
(at
>>>>>>> least
>>>>>>>
>>>>>>>> as
>>>>>>>>
>>>>>>> a
>>>>>>>
>>>>>>>> compile dependency).
>>>>>>>>>
>>>>>>>>> *Proposal*
>>>>>>>>>
>>>>>>>>> I propose that we build and deploy a asm-flink-shaded
version of
>>>>>>>>> ASM
>>>>>>>>>
>>>>>>>>> and
>>>>>>>>
>>>>>>> directly program against the relocated namespaces. Since we never
use
>>>>>>>
>>>>>>>> classes that we relocate in public interfaces, Flink users
will
>>>>>>>>>
>>>>>>>> never
>>>
>>>> see
>>>>>>>>
>>>>>>> the relocated class names. Internally, it does not hurt to use
them.
>>>>>>>
>>>>>>>>      - Proper maven dependency management, no hidden (shaded)
>>>>>>>>>
>>>>>>>>> dependencies
>>>>>>>>
>>>>>>>      - One copy of each class for shaded dependencies
>>>>>>>
>>>>>>>>      - Proper Scala interoperability
>>>>>>>>>
>>>>>>>>>      - Natural License management (license is part of
deployed
>>>>>>>>> asm-flink-shaded jar)
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Happy to hear thoughts!
>>>>>>>>>
>>>>>>>>> Stephan
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message