flink-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chesnay Schepler <ches...@apache.org>
Subject Re: [DISCUSS] Changing Flink's shading model
Date Wed, 21 Jun 2017 11:38:59 GMT
I like your suggestion Robert. A lot actually.

Having the actual dependency version (i.e 18 for guava) in the version 
should improve clarity a lot.

Originally i intended to release 1 artifact per Flink version, with the 
normal versioning scheme
that we use. But given that the shaded dependencies aren't changed often 
(even rarely might be a stretch),
and aren't actually coupled to the Flink release cycle this doesn't make 
a lot of sense.

Having separate repos looks like a reasonable separation of concerns. 
The release for Flink itself
would work just like it does now; we don't have to modify any scripts or 
do extra steps.

Since the build, release and development process are separate (since 
flink-shaded isn't part of Flink build
process, has a separate release process and changes to it will /never 
/require immediate changes to Flink)
it seems like a very good candidate to move it into a separate repo.

On 21.06.2017 11:26, Robert Metzger wrote:
> Its not completely clear to me how we want to version the shaded
> dependencies, and where we are putting them.
>
> One concern are the official apache release rules. If we want to release
> something to maven central, we need to do a proper vote over a source
> archive.
> I would propose to create a new repository "flink-shaded.git" that contains
> the following maven module structure:
> - flink-shaded: 1
>     - flink-shaded-asm: 1-5.2
>     - flink-shaded-guava: 1-18.0
>     - ...
>
> The number indicates the version (for ASM, I've just guessed).
> The version for the parent "flink-shaded" needs to be updated on each
> parent pom change (new module added, new / changed plugins, ...)
>
> We could create a separate release script in this repository that creates
> the flink-shaded-src.zip from the code and deploys the artifacts to the
> maven staging area.
>
> The advantage of a separate repo would be that we don't need to maintain
> separate maven projects in the same git repo.
> Also, the src archives for the release vote can be created from the repo
> content (without much filtering).
>
>
> On Tue, Jun 20, 2017 at 9:44 PM, Stephan Ewen <sewen@apache.org> wrote:
>
>> I like this approach.
>>
>> Two additional things can be mention here:
>>
>>    - We need to deploy these artifacts independently and not as part of the
>> build. That is a manual step once per "bump" in the dependency of that
>> library.
>>
>>    - We reduce the shading complexity of the original build and should thus
>> also speed up build times :-)
>>
>> Stephan
>>
>>
>>
>>
>> On Tue, Jun 20, 2017 at 1:15 PM, Chesnay Schepler <chesnay@apache.org>
>> wrote:
>>
>>> I would like to start working on this.
>>>
>>> I've looked into adding a flink-shaded-guava module. Working against the
>>> shaded namespaces seems
>>> to work without problems from the IDE, and we could forbid un-shaded
>>> usages with checkstyle.
>>>
>>> So for the list of dependencies that we want to shade we currently got:
>>>
>>>   * asm
>>>   * guava
>>>   * netty
>>>   * hadoop
>>>   * curator
>>>
>>> I've had a chat with Stephan Ewan and he brought up kryo + chill as well.
>>>
>>> The nice thing is that we can do this incrementally, one dependency at a
>>> time. As such i would propose
>>> to go through the whole process for guava and see what problems arise.
>>>
>>> This would include adding a flink-shaded module and a child
>>> flink-shaded-guava module to the flink repository
>>> that are not part of the build process, replacing all usages of guava in
>>> Flink, adding the
>>> checkstyle rule (optional) and deploying the artifact to maven central.
>>>
>>>
>>> On 11.05.2017 10:54, Stephan Ewen wrote:
>>>
>>>> @Ufuk  - I have never set up artifact deployment in Maven, could need
>> some
>>>> help there.
>>>>
>>>> Regarding shading Netty, I agree, would be good to do that as well...
>>>>
>>>> On Thu, May 11, 2017 at 10:52 AM, Ufuk Celebi <uce@apache.org> wrote:
>>>>
>>>> The advantages you've listed sound really compelling to me.
>>>>> - Do you have time to implement these changes or do we need a
>> volunteer?
>>>>> ;)
>>>>>
>>>>> - I assume that republishing the artifacts as you propose doesn't have
>>>>> any new legal implications since we already publish them with our
>>>>> JARs, right?
>>>>>
>>>>> - We might think about adding Netty to the list of shaded artifacts
>>>>> since some dependency conflicts were reported recently. Would have to
>>>>> double check the reported issues before doing that though. ;-)
>>>>>
>>>>> – Ufuk
>>>>>
>>>>>
>>>>> On Wed, May 10, 2017 at 8:45 PM, Stephan Ewen <sewen@apache.org>
>> wrote:
>>>>>> @chesnay: I used ASM as an example in the proposal. Maybe I did not
>> say
>>>>>> that clearly.
>>>>>>
>>>>>> If we like that approach, we should deal with the other libraries
(at
>>>>>>
>>>>> least
>>>>>
>>>>>> the frequently used ones) in the same way.
>>>>>>
>>>>>>
>>>>>> I would imagine to have a project layout like that:
>>>>>>
>>>>>> flink-shaded-deps
>>>>>>     - flink-shaded-asm
>>>>>>     - flink-shaded-guava
>>>>>>     - flink-shaded-curator
>>>>>>     - flink-shaded-hadoop
>>>>>>
>>>>>>
>>>>>> "flink-shaded-deps" would not be built every time (and not be released
>>>>>> every time), but only when needed.
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Wed, May 10, 2017 at 7:28 PM, Chesnay Schepler <chesnay@apache.org
>>>>>> wrote:
>>>>>>
>>>>>> I like the idea, thank you for bringing it up.
>>>>>>> Given that the raised problems aren't really ASM specific would
it
>> make
>>>>>>> sense to create one flink-shaded module that contains all frequently
>>>>>>>
>>>>>> shaded
>>>>>> libraries? (or maybe even all shaded dependencies by core modules)
The
>>>>>>> proposal limits the scope of this to ASM and i was wondering
why.
>>>>>>>
>>>>>>> I also remember that there was a discussion recently about why
we
>> shade
>>>>>>> things at all, and the idea of working against the shaded namespaces
>>>>>>> was
>>>>>>> brought up. Back then i was expressing doubts as to whether IDE's
>> would
>>>>>>> properly support this; what's the state on that?
>>>>>>>
>>>>>>> On 10.05.2017 18:18, Stephan Ewen wrote:
>>>>>>>
>>>>>>> Hi!
>>>>>>>> This is a discussion about altering the way we handle dependencies
>> and
>>>>>>>> shading in Flink.
>>>>>>>> I ran into quite a view problems trying to adjust / fix some
shading
>>>>>>>> issues
>>>>>>>> during release validation.
>>>>>>>>
>>>>>>>> The issue is tracked under: https://issues.apache.org/jira
>>>>>>>> /browse/FLINK-6529
>>>>>>>> Bring this discussion thread up because it is a bigger issue
>>>>>>>>
>>>>>>>> *Problem*
>>>>>>>>
>>>>>>>> Currently, Flink shades dependencies like ASM and Guava into
all
>> jars
>>>>>>> of
>>>>>> projects that reference it and relocate the classes.
>>>>>>>> There are some drawbacks to that approach, let's discuss
them at the
>>>>>>>> example of ASM:
>>>>>>>>
>>>>>>>>      - The ASM classes are for example in flink-core, flink-java,
>>>>>>>> flink-scala,
>>>>>>>> flink-runtime, etc.
>>>>>>>>
>>>>>>>>      - Users that reference these dependencies have the classes
>>>>>>>> multiple
>>>>>>>> times
>>>>>>>> in the classpath. That is unclean (works, through, because
the
>> classes
>>>>>>> are
>>>>>> identical). The same happens when building the final dist. jar.
>>>>>>>>      - Some of these dependencies require to include license
files in
>>>>>>>> the
>>>>>>>> shaded jar. It is hard to impossible to build a good automatic
>>>>>>>> solution
>>>>>>>> for
>>>>>>>> that, partly due to Maven's very poor cross-project path
support
>>>>>>>>
>>>>>>>>      - Most importantly: Scala does not support shading really
well.
>>>>>>>>
>>>>>>> Scala
>>>>>> classes have references to classes in more places than just the class
>>>>>>>> names
>>>>>>>> (apparently for Scala reflect support). Referencing a Scala
project
>>>>>>>>
>>>>>>> with
>>>>>> shaded ASM still requires to add a reference to unshaded ASM (at
least
>>>>>>> as
>>>>>> a
>>>>>>>> compile dependency).
>>>>>>>>
>>>>>>>> *Proposal*
>>>>>>>>
>>>>>>>> I propose that we build and deploy a asm-flink-shaded version
of ASM
>>>>>>>>
>>>>>>> and
>>>>>> directly program against the relocated namespaces. Since we never
use
>>>>>>>> classes that we relocate in public interfaces, Flink users
will
>> never
>>>>>>> see
>>>>>> the relocated class names. Internally, it does not hurt to use them.
>>>>>>>>      - Proper maven dependency management, no hidden (shaded)
>>>>>>>>
>>>>>>> dependencies
>>>>>>      - One copy of each class for shaded dependencies
>>>>>>>>      - Proper Scala interoperability
>>>>>>>>
>>>>>>>>      - Natural License management (license is part of deployed
>>>>>>>> asm-flink-shaded jar)
>>>>>>>>
>>>>>>>>
>>>>>>>> Happy to hear thoughts!
>>>>>>>>
>>>>>>>> Stephan
>>>>>>>>
>>>>>>>>
>>>>>>>>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message