flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Aljoscha Krettek <aljos...@apache.org>
Subject Re: [DISCUSS] Breaking the Scala API for Scala 2.12 Support
Date Mon, 08 Oct 2018 13:04:37 GMT
Breaking the API (or not breaking it but requiring explicit types when using Scala 2.12) and
the Maven infrastructure to actually build a 2.12 release.

> On 8. Oct 2018, at 13:00, Chesnay Schepler <chesnay@apache.org> wrote:
> And the remaining parts would only be about breaking the API?
> On 08.10.2018 12:24, Aljoscha Krettek wrote:
>> I have an open PR that does everything we can do for preparing the code base for
Scala 2.12 without breaking the API: https://github.com/apache/flink/pull/6784
>>> On 8. Oct 2018, at 09:56, Chesnay Schepler <chesnay@apache.org> wrote:
>>> I'd rather not maintain 2 master branches. Beyond the maintenance overhead I'm
>>> wondering about the benefit, as the API break still has to happen at some point.
>>> @Aljoscha how much work for supporting scala 2.12 can be merged without breaking
the API?
>>> If this is the only blocker I suggest to make the breaking change in 1.8.
>>> On 05.10.2018 10:31, Till Rohrmann wrote:
>>>> Thanks Aljoscha for starting this discussion. The described problem brings
>>>> us indeed a bit into a pickle. Even with option 1) I think it is somewhat
>>>> API breaking because everyone who used lambdas without types needs to add
>>>> them now. Consequently, I only see two real options out of the ones you've
>>>> proposed:
>>>> 1) Disambiguate the API (either by removing
>>>> reduceGroup(GroupReduceFunction) or by renaming it to reduceGroupJ)
>>>> 2) Maintain a 2.11 and 2.12 master branch until we phase 2.11 completely
>>>> Removing the reduceGroup(GroupReduceFunction) in option 1 is a bit
>>>> problematic because then all Scala API users who have implemented a
>>>> GroupReduceFunction need to convert it into a Scala lambda. Moreover, I
>>>> think it will be problematic with RichGroupReduceFunction which you need
>>>> get access to the RuntimeContext.
>>>> Maintaining two master branches puts a lot of burden onto the developers
>>>> always keep the two branches in sync. Ideally I would like to avoid this.
>>>> I also played a little bit around with implicit conversions to add the
>>>> lambda methods in Scala 2.11 on demand, but I was not able to get it work
>>>> smoothly.
>>>> I'm cross posting this thread to user as well to get some more user
>>>> feedback.
>>>> Cheers,
>>>> Till
>>>> On Thu, Oct 4, 2018 at 7:36 PM Elias Levy <fearsome.lucidity@gmail.com>
>>>> wrote:
>>>>> The second alternative, with the addition of methods that take functions
>>>>> with Scala types, seems the most sensible.  I wonder if there is a need
>>>>> then to maintain the *J Java parameter methods, or whether users could
>>>>> access the functionality by converting the Scala DataStreams to Java
>>>>> .javaStream and whatever the equivalent is for DataSets.
>>>>> On Thu, Oct 4, 2018 at 8:10 AM Aljoscha Krettek <aljoscha@apache.org>
>>>>> wrote:
>>>>>> Hi,
>>>>>> I'm currently working on
>>>>> https://issues.apache.org/jira/browse/FLINK-7811,
>>>>>> with the goal of adding support for Scala 2.12. There is a bit of
>>>>> hurdle
>>>>>> and I have to explain some context first.
>>>>>> With Scala 2.12, lambdas are implemented using the lambda mechanism
>>>>>> Java 8, i.e. Scala lambdas are now SAMs (Single Abstract Method).
>>>>>> means that the following two method definitions can both take a lambda:
>>>>>> def map[R](mapper: MapFunction[T, R]): DataSet[R]
>>>>>> def map[R](fun: T => R): DataSet[R]
>>>>>> The Scala compiler gives precedence to the lambda version when you
>>>>>> map() with a lambda in simple cases, so it works here. You could
>>>>> call
>>>>>> map() with a lambda if the lambda version of the method weren't here
>>>>>> because they are now considered the same. For Scala 2.11 we need
>>>>>> signatures, though, to allow calling with a lambda and with a
>>>>> MapFunction.
>>>>>> The problem is with more complicated method signatures, like:
>>>>>> def reduceGroup[R](fun: (scala.Iterator[T], Collector[R]) => Unit):
>>>>>> DataSet[R]
>>>>>> def reduceGroup[R](reducer: GroupReduceFunction[T, R]): DataSet[R]
>>>>>> (for reference, GroupReduceFunction is a SAM with void
>>>>>> reduce(java.lang.Iterable<T> values, Collector<O> out))
>>>>>> These two signatures are not the same but similar enough for the
>>>>>> 2.12 compiler to "get confused". In Scala 2.11, I could call
>>>>> reduceGroup()
>>>>>> with a lambda that doesn't have parameter type definitions and things
>>>>> would
>>>>>> be fine. With Scala 2.12 I can't do that because the compiler can't
>>>>> figure
>>>>>> out which method to call and requires explicit type definitions on
>>>>>> lambda parameters.
>>>>>> I see some solutions for this:
>>>>>> 1. Keep the methods as is, this would force people to always explicitly
>>>>>> specify parameter types on their lambdas.
>>>>>> 2. Rename the second method to reduceGroupJ() to signal that it takes
>>>>>> user function that takes Java-style interfaces (the first parameter
>>>>>> java.lang.Iterable while the Scala lambda takes a scala.Iterator).
>>>>>> disambiguates the code, users can use lambdas without specifying
>>>>>> parameter types but breaks the API.
>>>>>> One effect of 2. would be that we can add a reduceGroup() method
>>>>>> takes a api.scala.GroupReduceFunction that takes proper Scala types,
>>>>>> it would allow people to implement user functions without having
to cast
>>>>>> the various Iterator/Iterable parameters.
>>>>>> Either way, people would have to adapt their code when moving to
>>>>>> 2.12 in some way, depending on what style of methods they use.
>>>>>> There is also solution 2.5:
>>>>>> 2.5 Rename the methods only in the Scala 2.12 build of Flink and
keep the
>>>>>> old method names for Scala 2.11. This would require some infrastructure
>>>>> and
>>>>>> I don't yet know how it can be done in a sane way.
>>>>>> What do you think? I personally would be in favour of 2. but it breaks
>>>>> the
>>>>>> existing API.
>>>>>> Best,
>>>>>> Aljoscha

View raw message