flink-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stephan Ewen <se...@apache.org>
Subject Re: [DISCUSS] Changing Flink's shading model
Date Mon, 26 Jun 2017 12:29:46 GMT
How should it work when one of the shaded dependencies is updated? We
probably do not want to release all of them, just because the overall
version number is in their version number.

How about that:

  - Under normal circumstances, we only increase the version of the root
project, when we add / bump a shaded dependency version

  - Shaded dependencies include the version in the same. That way we can
possibly have two different versions of a dependency, such as
"flink-shaded-kryo-2" and "flink-shaded-kryo-3".
  - Shaded dependencies should have the version in the relocation pattern
as well, for the same reason as above (unless the two versions have
separate namespaces already).

  - The released version of the module and artifact is always "1.0" unless
we find that we did shading/relocation errors, in which case we need to
re-release that artifact (1.1, 1.2, ...)



On Mon, Jun 26, 2017 at 2:23 PM, Chesnay Schepler <chesnay@apache.org>
wrote:

> the guava version is already included in the version of the flink-shaded
> module.
>
> For example, for the first release of flink-shaded-guava the version would
> be 1-18.0.
>
> 1 is the version of flink-shaded itself, 18.0 is the guava version.
>
>
> On 26.06.2017 14:01, Stephan Ewen wrote:
>
>> Looks good, thanks Chesnay!.
>>
>> How about including the dependency version names in the module names, like
>> "flink-shaded-guava-18.0"?
>>
>> On Mon, Jun 26, 2017 at 11:49 AM, Chesnay Schepler <chesnay@apache.org>
>> wrote:
>>
>> The new repo was created and is accessible here:
>>> https://github.com/apache/flink-shaded
>>>
>>> I've already opened a PR to add a shaded guava module.
>>>
>>> Once the shaded-guava module is merged I would like to do a first release
>>> of flink-shaded,
>>> only containing guava. I already have a branch with all the changes
>>> necessary to
>>> integrate this dependency into Flink.
>>>
>>> The alternative would be to first create shaded modules for all
>>> dependencies and make a
>>> single release, but I feel that would delay things quite a bit.
>>>
>>>
>>>
>>> On 21.06.2017 17:00, Robert Metzger wrote:
>>>
>>> Okay, I'll request a repo for the shading.
>>>>
>>>> On Wed, Jun 21, 2017 at 1:38 PM, Chesnay Schepler <chesnay@apache.org>
>>>> wrote:
>>>>
>>>> I like your suggestion Robert. A lot actually.
>>>>
>>>>> Having the actual dependency version (i.e 18 for guava) in the version
>>>>> should improve clarity a lot.
>>>>>
>>>>> Originally i intended to release 1 artifact per Flink version, with the
>>>>> normal versioning scheme
>>>>> that we use. But given that the shaded dependencies aren't changed
>>>>> often
>>>>> (even rarely might be a stretch),
>>>>> and aren't actually coupled to the Flink release cycle this doesn't
>>>>> make
>>>>> a
>>>>> lot of sense.
>>>>>
>>>>> Having separate repos looks like a reasonable separation of concerns.
>>>>> The
>>>>> release for Flink itself
>>>>> would work just like it does now; we don't have to modify any scripts
>>>>> or
>>>>> do extra steps.
>>>>>
>>>>> Since the build, release and development process are separate (since
>>>>> flink-shaded isn't part of Flink build
>>>>> process, has a separate release process and changes to it will /never
>>>>> /require immediate changes to Flink)
>>>>> it seems like a very good candidate to move it into a separate repo.
>>>>>
>>>>>
>>>>> On 21.06.2017 11:26, Robert Metzger wrote:
>>>>>
>>>>> Its not completely clear to me how we want to version the shaded
>>>>>
>>>>>> dependencies, and where we are putting them.
>>>>>>
>>>>>> One concern are the official apache release rules. If we want to
>>>>>> release
>>>>>> something to maven central, we need to do a proper vote over a source
>>>>>> archive.
>>>>>> I would propose to create a new repository "flink-shaded.git" that
>>>>>> contains
>>>>>> the following maven module structure:
>>>>>> - flink-shaded: 1
>>>>>>       - flink-shaded-asm: 1-5.2
>>>>>>       - flink-shaded-guava: 1-18.0
>>>>>>       - ...
>>>>>>
>>>>>> The number indicates the version (for ASM, I've just guessed).
>>>>>> The version for the parent "flink-shaded" needs to be updated on
each
>>>>>> parent pom change (new module added, new / changed plugins, ...)
>>>>>>
>>>>>> We could create a separate release script in this repository that
>>>>>> creates
>>>>>> the flink-shaded-src.zip from the code and deploys the artifacts
to
>>>>>> the
>>>>>> maven staging area.
>>>>>>
>>>>>> The advantage of a separate repo would be that we don't need to
>>>>>> maintain
>>>>>> separate maven projects in the same git repo.
>>>>>> Also, the src archives for the release vote can be created from the
>>>>>> repo
>>>>>> content (without much filtering).
>>>>>>
>>>>>>
>>>>>> On Tue, Jun 20, 2017 at 9:44 PM, Stephan Ewen <sewen@apache.org>
>>>>>> wrote:
>>>>>>
>>>>>> I like this approach.
>>>>>>
>>>>>> Two additional things can be mention here:
>>>>>>>
>>>>>>>      - We need to deploy these artifacts independently and not
as
>>>>>>> part
>>>>>>> of
>>>>>>> the
>>>>>>> build. That is a manual step once per "bump" in the dependency
of
>>>>>>> that
>>>>>>> library.
>>>>>>>
>>>>>>>      - We reduce the shading complexity of the original build
and
>>>>>>> should
>>>>>>> thus
>>>>>>> also speed up build times :-)
>>>>>>>
>>>>>>> Stephan
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Tue, Jun 20, 2017 at 1:15 PM, Chesnay Schepler <
>>>>>>> chesnay@apache.org>
>>>>>>> wrote:
>>>>>>>
>>>>>>> I would like to start working on this.
>>>>>>>
>>>>>>> I've looked into adding a flink-shaded-guava module. Working
against
>>>>>>>> the
>>>>>>>> shaded namespaces seems
>>>>>>>> to work without problems from the IDE, and we could forbid
un-shaded
>>>>>>>> usages with checkstyle.
>>>>>>>>
>>>>>>>> So for the list of dependencies that we want to shade we
currently
>>>>>>>> got:
>>>>>>>>
>>>>>>>>     * asm
>>>>>>>>     * guava
>>>>>>>>     * netty
>>>>>>>>     * hadoop
>>>>>>>>     * curator
>>>>>>>>
>>>>>>>> I've had a chat with Stephan Ewan and he brought up kryo
+ chill as
>>>>>>>> well.
>>>>>>>>
>>>>>>>> The nice thing is that we can do this incrementally, one
dependency
>>>>>>>> at a
>>>>>>>> time. As such i would propose
>>>>>>>> to go through the whole process for guava and see what problems
>>>>>>>> arise.
>>>>>>>>
>>>>>>>> This would include adding a flink-shaded module and a child
>>>>>>>> flink-shaded-guava module to the flink repository
>>>>>>>> that are not part of the build process, replacing all usages
of
>>>>>>>> guava
>>>>>>>> in
>>>>>>>> Flink, adding the
>>>>>>>> checkstyle rule (optional) and deploying the artifact to
maven
>>>>>>>> central.
>>>>>>>>
>>>>>>>>
>>>>>>>> On 11.05.2017 10:54, Stephan Ewen wrote:
>>>>>>>>
>>>>>>>> @Ufuk  - I have never set up artifact deployment in Maven,
could
>>>>>>>> need
>>>>>>>> some
>>>>>>>> help there.
>>>>>>>>
>>>>>>>> Regarding shading Netty, I agree, would be good to do that
as
>>>>>>>>> well...
>>>>>>>>>
>>>>>>>>> On Thu, May 11, 2017 at 10:52 AM, Ufuk Celebi <uce@apache.org>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>> The advantages you've listed sound really compelling
to me.
>>>>>>>>>
>>>>>>>>> - Do you have time to implement these changes or do we
need a
>>>>>>>>>
>>>>>>>>>> volunteer?
>>>>>>>>>>
>>>>>>>>> ;)
>>>>>>>>
>>>>>>>> - I assume that republishing the artifacts as you propose
doesn't
>>>>>>>>>
>>>>>>>>>> have
>>>>>>>>>> any new legal implications since we already publish
them with our
>>>>>>>>>> JARs, right?
>>>>>>>>>>
>>>>>>>>>> - We might think about adding Netty to the list of
shaded
>>>>>>>>>> artifacts
>>>>>>>>>> since some dependency conflicts were reported recently.
Would have
>>>>>>>>>> to
>>>>>>>>>> double check the reported issues before doing that
though. ;-)
>>>>>>>>>>
>>>>>>>>>> – Ufuk
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Wed, May 10, 2017 at 8:45 PM, Stephan Ewen <sewen@apache.org>
>>>>>>>>>>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>> @chesnay: I used ASM as an example in the proposal. Maybe
I did not
>>>>>>>>
>>>>>>>> say
>>>>>>>>> that clearly.
>>>>>>>>> If we like that approach, we should deal with the other
libraries
>>>>>>>>> (at
>>>>>>>>>
>>>>>>>>>> least
>>>>>>>>>>>
>>>>>>>>>>> the frequently used ones) in the same way.
>>>>>>>>>>
>>>>>>>>>> I would imagine to have a project layout like that:
>>>>>>>>>>>
>>>>>>>>>>> flink-shaded-deps
>>>>>>>>>>>       - flink-shaded-asm
>>>>>>>>>>>       - flink-shaded-guava
>>>>>>>>>>>       - flink-shaded-curator
>>>>>>>>>>>       - flink-shaded-hadoop
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> "flink-shaded-deps" would not be built every
time (and not be
>>>>>>>>>>> released
>>>>>>>>>>> every time), but only when needed.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Wed, May 10, 2017 at 7:28 PM, Chesnay Schepler
<
>>>>>>>>>>> chesnay@apache.org
>>>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>> I like the idea, thank you for bringing it up.
>>>>>>>>>>>
>>>>>>>>>>> Given that the raised problems aren't really
ASM specific would
>>>>>>>>>>> it
>>>>>>>>>>>
>>>>>>>>>>>> make
>>>>>>>>>>>>
>>>>>>>>>>> sense to create one flink-shaded module that
contains all
>>>>>>>>>> frequently
>>>>>>>>>>
>>>>>>>>> shaded
>>>>>>>>>
>>>>>>>>>> libraries? (or maybe even all shaded dependencies
by core modules)
>>>>>>>>>>> The
>>>>>>>>>>>
>>>>>>>>>>> proposal limits the scope of this to ASM and
i was wondering why.
>>>>>>>>>>>
>>>>>>>>>>>> I also remember that there was a discussion
recently about why
>>>>>>>>>>>> we
>>>>>>>>>>>>
>>>>>>>>>>>> shade
>>>>>>>>>>>>
>>>>>>>>>>> things at all, and the idea of working against
the shaded
>>>>>>>>>> namespaces
>>>>>>>>>>
>>>>>>>>> was
>>>>>>>>>
>>>>>>>>>> brought up. Back then i was expressing doubts as
to whether IDE's
>>>>>>>>>>>>
>>>>>>>>>>>> would
>>>>>>>>>>>>
>>>>>>>>>>> properly support this; what's the state on that?
>>>>>>>>>>
>>>>>>>>> On 10.05.2017 18:18, Stephan Ewen wrote:
>>>>>>>>>
>>>>>>>>>> Hi!
>>>>>>>>>>>>
>>>>>>>>>>>> This is a discussion about altering the way
we handle
>>>>>>>>>>>> dependencies
>>>>>>>>>>>>
>>>>>>>>>>>>> and
>>>>>>>>>>>>>
>>>>>>>>>>>> shading in Flink.
>>>>>>>>>>>
>>>>>>>>>> I ran into quite a view problems trying to adjust
/ fix some
>>>>>>>>>
>>>>>>>>>> shading
>>>>>>>>>>>>> issues
>>>>>>>>>>>>> during release validation.
>>>>>>>>>>>>>
>>>>>>>>>>>>> The issue is tracked under: https://issues.apache.org/jira
>>>>>>>>>>>>> /browse/FLINK-6529
>>>>>>>>>>>>> Bring this discussion thread up because
it is a bigger issue
>>>>>>>>>>>>>
>>>>>>>>>>>>> *Problem*
>>>>>>>>>>>>>
>>>>>>>>>>>>> Currently, Flink shades dependencies
like ASM and Guava into
>>>>>>>>>>>>> all
>>>>>>>>>>>>>
>>>>>>>>>>>>> jars
>>>>>>>>>>>>>
>>>>>>>>>>>> of
>>>>>>>>>>>
>>>>>>>>>> projects that reference it and relocate the classes.
>>>>>>>>>
>>>>>>>>>> There are some drawbacks to that approach, let's
discuss them at
>>>>>>>>>>>
>>>>>>>>>>>> the
>>>>>>>>>>>>
>>>>>>>>>>>> example of ASM:
>>>>>>>>>>>>>
>>>>>>>>>>>>>        - The ASM classes are for example
in flink-core,
>>>>>>>>>>>>> flink-java,
>>>>>>>>>>>>> flink-scala,
>>>>>>>>>>>>> flink-runtime, etc.
>>>>>>>>>>>>>
>>>>>>>>>>>>>        - Users that reference these dependencies
have the
>>>>>>>>>>>>> classes
>>>>>>>>>>>>> multiple
>>>>>>>>>>>>> times
>>>>>>>>>>>>> in the classpath. That is unclean (works,
through, because the
>>>>>>>>>>>>>
>>>>>>>>>>>>> classes
>>>>>>>>>>>>>
>>>>>>>>>>>> are
>>>>>>>>>>>
>>>>>>>>>> identical). The same happens when building the final
dist. jar.
>>>>>>>>>
>>>>>>>>>>        - Some of these dependencies require to include
license
>>>>>>>>>>> files
>>>>>>>>>>>
>>>>>>>>>>>> in
>>>>>>>>>>>>> the
>>>>>>>>>>>>> shaded jar. It is hard to impossible
to build a good automatic
>>>>>>>>>>>>> solution
>>>>>>>>>>>>> for
>>>>>>>>>>>>> that, partly due to Maven's very poor
cross-project path
>>>>>>>>>>>>> support
>>>>>>>>>>>>>
>>>>>>>>>>>>>        - Most importantly: Scala does
not support shading
>>>>>>>>>>>>> really
>>>>>>>>>>>>> well.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Scala
>>>>>>>>>>>>>
>>>>>>>>>>>>> classes have references to classes in
more places than just the
>>>>>>>>>>>>
>>>>>>>>>>> class
>>>>>>>>>>>
>>>>>>>>>>> names
>>>>>>>>>>>
>>>>>>>>>>>> (apparently for Scala reflect support). Referencing
a Scala
>>>>>>>>>>>>> project
>>>>>>>>>>>>>
>>>>>>>>>>>>> with
>>>>>>>>>>>>>
>>>>>>>>>>>>> shaded ASM still requires to add a reference
to unshaded ASM
>>>>>>>>>>>> (at
>>>>>>>>>>>>
>>>>>>>>>>> least
>>>>>>>>>>>
>>>>>>>>>>> as
>>>>>>>>>>>
>>>>>>>>>>>> a
>>>>>>>>>>>>
>>>>>>>>>>> compile dependency).
>>>>>>>>>>>
>>>>>>>>>>>> *Proposal*
>>>>>>>>>>>>>
>>>>>>>>>>>>> I propose that we build and deploy a
asm-flink-shaded version
>>>>>>>>>>>>> of
>>>>>>>>>>>>> ASM
>>>>>>>>>>>>>
>>>>>>>>>>>>> and
>>>>>>>>>>>>>
>>>>>>>>>>>>> directly program against the relocated
namespaces. Since we
>>>>>>>>>>>> never
>>>>>>>>>>>>
>>>>>>>>>>> use
>>>>>>>>>>>
>>>>>>>>>>> classes that we relocate in public interfaces,
Flink users will
>>>>>>>>>>>
>>>>>>>>>>>> never
>>>>>>>>>>>>
>>>>>>>>>>>> see
>>>>>>>>>>>
>>>>>>>>>> the relocated class names. Internally, it does not
hurt to use
>>>>>>>>> them.
>>>>>>>>>
>>>>>>>>>>        - Proper maven dependency management, no hidden
(shaded)
>>>>>>>>>>>
>>>>>>>>>>>> dependencies
>>>>>>>>>>>>>
>>>>>>>>>>>>>        - One copy of each class for shaded
dependencies
>>>>>>>>>>>>
>>>>>>>>>>>        - Proper Scala interoperability
>>>>>>>>>>>
>>>>>>>>>>>>        - Natural License management (license
is part of deployed
>>>>>>>>>>>>> asm-flink-shaded jar)
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Happy to hear thoughts!
>>>>>>>>>>>>>
>>>>>>>>>>>>> Stephan
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message