flink-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stephan Ewen <se...@apache.org>
Subject Re: [DISCUSS] Changing Flink's shading model
Date Mon, 26 Jun 2017 12:01:28 GMT
Looks good, thanks Chesnay!.

How about including the dependency version names in the module names, like
"flink-shaded-guava-18.0"?

On Mon, Jun 26, 2017 at 11:49 AM, Chesnay Schepler <chesnay@apache.org>
wrote:

> The new repo was created and is accessible here:
> https://github.com/apache/flink-shaded
>
> I've already opened a PR to add a shaded guava module.
>
> Once the shaded-guava module is merged I would like to do a first release
> of flink-shaded,
> only containing guava. I already have a branch with all the changes
> necessary to
> integrate this dependency into Flink.
>
> The alternative would be to first create shaded modules for all
> dependencies and make a
> single release, but I feel that would delay things quite a bit.
>
>
>
> On 21.06.2017 17:00, Robert Metzger wrote:
>
>> Okay, I'll request a repo for the shading.
>>
>> On Wed, Jun 21, 2017 at 1:38 PM, Chesnay Schepler <chesnay@apache.org>
>> wrote:
>>
>> I like your suggestion Robert. A lot actually.
>>>
>>> Having the actual dependency version (i.e 18 for guava) in the version
>>> should improve clarity a lot.
>>>
>>> Originally i intended to release 1 artifact per Flink version, with the
>>> normal versioning scheme
>>> that we use. But given that the shaded dependencies aren't changed often
>>> (even rarely might be a stretch),
>>> and aren't actually coupled to the Flink release cycle this doesn't make
>>> a
>>> lot of sense.
>>>
>>> Having separate repos looks like a reasonable separation of concerns. The
>>> release for Flink itself
>>> would work just like it does now; we don't have to modify any scripts or
>>> do extra steps.
>>>
>>> Since the build, release and development process are separate (since
>>> flink-shaded isn't part of Flink build
>>> process, has a separate release process and changes to it will /never
>>> /require immediate changes to Flink)
>>> it seems like a very good candidate to move it into a separate repo.
>>>
>>>
>>> On 21.06.2017 11:26, Robert Metzger wrote:
>>>
>>> Its not completely clear to me how we want to version the shaded
>>>> dependencies, and where we are putting them.
>>>>
>>>> One concern are the official apache release rules. If we want to release
>>>> something to maven central, we need to do a proper vote over a source
>>>> archive.
>>>> I would propose to create a new repository "flink-shaded.git" that
>>>> contains
>>>> the following maven module structure:
>>>> - flink-shaded: 1
>>>>      - flink-shaded-asm: 1-5.2
>>>>      - flink-shaded-guava: 1-18.0
>>>>      - ...
>>>>
>>>> The number indicates the version (for ASM, I've just guessed).
>>>> The version for the parent "flink-shaded" needs to be updated on each
>>>> parent pom change (new module added, new / changed plugins, ...)
>>>>
>>>> We could create a separate release script in this repository that
>>>> creates
>>>> the flink-shaded-src.zip from the code and deploys the artifacts to the
>>>> maven staging area.
>>>>
>>>> The advantage of a separate repo would be that we don't need to maintain
>>>> separate maven projects in the same git repo.
>>>> Also, the src archives for the release vote can be created from the repo
>>>> content (without much filtering).
>>>>
>>>>
>>>> On Tue, Jun 20, 2017 at 9:44 PM, Stephan Ewen <sewen@apache.org> wrote:
>>>>
>>>> I like this approach.
>>>>
>>>>> Two additional things can be mention here:
>>>>>
>>>>>     - We need to deploy these artifacts independently and not as part
>>>>> of
>>>>> the
>>>>> build. That is a manual step once per "bump" in the dependency of that
>>>>> library.
>>>>>
>>>>>     - We reduce the shading complexity of the original build and should
>>>>> thus
>>>>> also speed up build times :-)
>>>>>
>>>>> Stephan
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On Tue, Jun 20, 2017 at 1:15 PM, Chesnay Schepler <chesnay@apache.org>
>>>>> wrote:
>>>>>
>>>>> I would like to start working on this.
>>>>>
>>>>>> I've looked into adding a flink-shaded-guava module. Working against
>>>>>> the
>>>>>> shaded namespaces seems
>>>>>> to work without problems from the IDE, and we could forbid un-shaded
>>>>>> usages with checkstyle.
>>>>>>
>>>>>> So for the list of dependencies that we want to shade we currently
>>>>>> got:
>>>>>>
>>>>>>    * asm
>>>>>>    * guava
>>>>>>    * netty
>>>>>>    * hadoop
>>>>>>    * curator
>>>>>>
>>>>>> I've had a chat with Stephan Ewan and he brought up kryo + chill
as
>>>>>> well.
>>>>>>
>>>>>> The nice thing is that we can do this incrementally, one dependency
>>>>>> at a
>>>>>> time. As such i would propose
>>>>>> to go through the whole process for guava and see what problems arise.
>>>>>>
>>>>>> This would include adding a flink-shaded module and a child
>>>>>> flink-shaded-guava module to the flink repository
>>>>>> that are not part of the build process, replacing all usages of guava
>>>>>> in
>>>>>> Flink, adding the
>>>>>> checkstyle rule (optional) and deploying the artifact to maven
>>>>>> central.
>>>>>>
>>>>>>
>>>>>> On 11.05.2017 10:54, Stephan Ewen wrote:
>>>>>>
>>>>>> @Ufuk  - I have never set up artifact deployment in Maven, could
need
>>>>>> some
>>>>>> help there.
>>>>>>
>>>>>>> Regarding shading Netty, I agree, would be good to do that as
well...
>>>>>>>
>>>>>>> On Thu, May 11, 2017 at 10:52 AM, Ufuk Celebi <uce@apache.org>
>>>>>>> wrote:
>>>>>>>
>>>>>>> The advantages you've listed sound really compelling to me.
>>>>>>>
>>>>>>> - Do you have time to implement these changes or do we need a
>>>>>>>>
>>>>>>>> volunteer?
>>>>>>>
>>>>>> ;)
>>>>>>
>>>>>>> - I assume that republishing the artifacts as you propose doesn't
>>>>>>>> have
>>>>>>>> any new legal implications since we already publish them
with our
>>>>>>>> JARs, right?
>>>>>>>>
>>>>>>>> - We might think about adding Netty to the list of shaded
artifacts
>>>>>>>> since some dependency conflicts were reported recently. Would
have
>>>>>>>> to
>>>>>>>> double check the reported issues before doing that though.
;-)
>>>>>>>>
>>>>>>>> – Ufuk
>>>>>>>>
>>>>>>>>
>>>>>>>> On Wed, May 10, 2017 at 8:45 PM, Stephan Ewen <sewen@apache.org>
>>>>>>>>
>>>>>>>> wrote:
>>>>>>>
>>>>>> @chesnay: I used ASM as an example in the proposal. Maybe I did not
>>>>>>
>>>>>>> say
>>>>>>>>
>>>>>>> that clearly.
>>>>>>
>>>>>>> If we like that approach, we should deal with the other libraries
(at
>>>>>>>>>
>>>>>>>>> least
>>>>>>>>>
>>>>>>>> the frequently used ones) in the same way.
>>>>>>>>
>>>>>>>>>
>>>>>>>>> I would imagine to have a project layout like that:
>>>>>>>>>
>>>>>>>>> flink-shaded-deps
>>>>>>>>>      - flink-shaded-asm
>>>>>>>>>      - flink-shaded-guava
>>>>>>>>>      - flink-shaded-curator
>>>>>>>>>      - flink-shaded-hadoop
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> "flink-shaded-deps" would not be built every time (and
not be
>>>>>>>>> released
>>>>>>>>> every time), but only when needed.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Wed, May 10, 2017 at 7:28 PM, Chesnay Schepler <
>>>>>>>>> chesnay@apache.org
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>> I like the idea, thank you for bringing it up.
>>>>>>>>>
>>>>>>>>> Given that the raised problems aren't really ASM specific
would it
>>>>>>>>>>
>>>>>>>>>> make
>>>>>>>>>
>>>>>>>> sense to create one flink-shaded module that contains all
frequently
>>>>>>
>>>>>>> shaded
>>>>>>>>>>
>>>>>>>>> libraries? (or maybe even all shaded dependencies by
core modules)
>>>>>>>>> The
>>>>>>>>>
>>>>>>>>> proposal limits the scope of this to ASM and i was wondering
why.
>>>>>>>>>>
>>>>>>>>>> I also remember that there was a discussion recently
about why we
>>>>>>>>>>
>>>>>>>>>> shade
>>>>>>>>>
>>>>>>>> things at all, and the idea of working against the shaded
namespaces
>>>>>>
>>>>>>> was
>>>>>>>>>> brought up. Back then i was expressing doubts as
to whether IDE's
>>>>>>>>>>
>>>>>>>>>> would
>>>>>>>>>
>>>>>>>> properly support this; what's the state on that?
>>>>>>
>>>>>>> On 10.05.2017 18:18, Stephan Ewen wrote:
>>>>>>>>>>
>>>>>>>>>> Hi!
>>>>>>>>>>
>>>>>>>>>> This is a discussion about altering the way we handle
dependencies
>>>>>>>>>>>
>>>>>>>>>>> and
>>>>>>>>>>
>>>>>>>>> shading in Flink.
>>>>>>
>>>>>>> I ran into quite a view problems trying to adjust / fix some
>>>>>>>>>>> shading
>>>>>>>>>>> issues
>>>>>>>>>>> during release validation.
>>>>>>>>>>>
>>>>>>>>>>> The issue is tracked under: https://issues.apache.org/jira
>>>>>>>>>>> /browse/FLINK-6529
>>>>>>>>>>> Bring this discussion thread up because it is
a bigger issue
>>>>>>>>>>>
>>>>>>>>>>> *Problem*
>>>>>>>>>>>
>>>>>>>>>>> Currently, Flink shades dependencies like ASM
and Guava into all
>>>>>>>>>>>
>>>>>>>>>>> jars
>>>>>>>>>>
>>>>>>>>> of
>>>>>>
>>>>>>> projects that reference it and relocate the classes.
>>>>>>>>>
>>>>>>>>> There are some drawbacks to that approach, let's discuss
them at
>>>>>>>>>> the
>>>>>>>>>>
>>>>>>>>>>> example of ASM:
>>>>>>>>>>>
>>>>>>>>>>>       - The ASM classes are for example in flink-core,
>>>>>>>>>>> flink-java,
>>>>>>>>>>> flink-scala,
>>>>>>>>>>> flink-runtime, etc.
>>>>>>>>>>>
>>>>>>>>>>>       - Users that reference these dependencies
have the classes
>>>>>>>>>>> multiple
>>>>>>>>>>> times
>>>>>>>>>>> in the classpath. That is unclean (works, through,
because the
>>>>>>>>>>>
>>>>>>>>>>> classes
>>>>>>>>>>
>>>>>>>>> are
>>>>>>
>>>>>>> identical). The same happens when building the final dist. jar.
>>>>>>>>>
>>>>>>>>>       - Some of these dependencies require to include
license files
>>>>>>>>>>
>>>>>>>>>>> in
>>>>>>>>>>> the
>>>>>>>>>>> shaded jar. It is hard to impossible to build
a good automatic
>>>>>>>>>>> solution
>>>>>>>>>>> for
>>>>>>>>>>> that, partly due to Maven's very poor cross-project
path support
>>>>>>>>>>>
>>>>>>>>>>>       - Most importantly: Scala does not support
shading really
>>>>>>>>>>> well.
>>>>>>>>>>>
>>>>>>>>>>> Scala
>>>>>>>>>>>
>>>>>>>>>> classes have references to classes in more places
than just the
>>>>>>>>> class
>>>>>>>>>
>>>>>>>>> names
>>>>>>>>>>
>>>>>>>>>>> (apparently for Scala reflect support). Referencing
a Scala
>>>>>>>>>>> project
>>>>>>>>>>>
>>>>>>>>>>> with
>>>>>>>>>>>
>>>>>>>>>> shaded ASM still requires to add a reference to unshaded
ASM (at
>>>>>>>>> least
>>>>>>>>>
>>>>>>>>> as
>>>>>>>>>>
>>>>>>>>>> a
>>>>>>>>>
>>>>>>>>> compile dependency).
>>>>>>>>>>
>>>>>>>>>>> *Proposal*
>>>>>>>>>>>
>>>>>>>>>>> I propose that we build and deploy a asm-flink-shaded
version of
>>>>>>>>>>> ASM
>>>>>>>>>>>
>>>>>>>>>>> and
>>>>>>>>>>>
>>>>>>>>>> directly program against the relocated namespaces.
Since we never
>>>>>>>>> use
>>>>>>>>>
>>>>>>>>> classes that we relocate in public interfaces, Flink
users will
>>>>>>>>>> never
>>>>>>>>>>
>>>>>>>>> see
>>>>>>
>>>>>>> the relocated class names. Internally, it does not hurt to use
them.
>>>>>>>>>
>>>>>>>>>       - Proper maven dependency management, no hidden
(shaded)
>>>>>>>>>>
>>>>>>>>>>> dependencies
>>>>>>>>>>>
>>>>>>>>>>       - One copy of each class for shaded dependencies
>>>>>>>>>
>>>>>>>>>       - Proper Scala interoperability
>>>>>>>>>>
>>>>>>>>>>>       - Natural License management (license is
part of deployed
>>>>>>>>>>> asm-flink-shaded jar)
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Happy to hear thoughts!
>>>>>>>>>>>
>>>>>>>>>>> Stephan
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message