flink-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Aljoscha Krettek <aljos...@apache.org>
Subject Re: [DISCUSS] Remove Combinable Annotation from DataSet API
Date Wed, 13 Jan 2016 16:19:16 GMT
I’m also for Approach #1. I like simplifying things.
> On 13 Jan 2016, at 14:25, Vasiliki Kalavri <vasilikikalavri@gmail.com> wrote:
> 
> Hi,
> 
> ​+1 for removing the Combinable annotation​. Approach 1 sounds like the
> best option to me.
> 
> 
> On 13 January 2016 at 14:11, Till Rohrmann <trohrmann@apache.org> wrote:
> 
>> Hi Fabian,
>> 
>> thanks for bringing this issue up. I agree with you that it would be nice
>> to remove the Combinable annotation if it is not really needed. Now if
>> people are not aware of the Combinable interface then they might think that
>> they are actually using combiners by simply implementing CombineFunction.
>> 
>> 
> ​has happened to me :S​
> 
> 
> 
>> I would also be in favour of approach 1 combined with a migration guide
>> where we highlight possible problems with a faulty combine implementation.
>> 
>> 
> Migration guide is a ​good idea​!
> 
> 
> 
>> Cheers,
>> Till
>> ​
>> 
>> 
> ​-Vasia.​
> 
> 
> 
>> On Wed, Jan 13, 2016 at 1:53 PM, Fabian Hueske <fhueske@gmail.com> wrote:
>> 
>>> Hi everybody,
>>> 
>>> 
>>> 
>>> As part of the upcoming 1.0 release we are stabilizing Flink's APIs.
>>> 
>>> I would like to start a discussion about removing the Combinable
>> annotation
>>> from the DataSet API.
>>> 
>>> 
>>> 
>>> # The Current State:
>>> 
>>> The DataSet API offers two interface for combine functions,
>> CombineFunction
>>> and GroupCombineFunction, which can be added to a GroupReduceFunctions
>>> (GRF).
>>> 
>>> 
>>> However, implementing a combine interface does not make a GRF combinable.
>>> In addition, the GRF class needs to be annotated with the Combinable
>>> annotation. The RichGroupReduceFunction has a default implementation of
>>> combine, which forwards the combine parameters to the reduce method. This
>>> default implementation is not used, unless the class that extends RGRF
>> has
>>> the Combinable annotation.
>>> 
>>> 
>>> 
>>> In addition to the Combinable annotation, the DataSet API
>>> GroupReduceOperator features the setCombinable(boolean) method to
>> override
>>> a missing or existing Combinable annotation.
>>> 
>>> 
>>> 
>>> # My Proposal:
>>> 
>>> I propose to remove the Combinable annotation because it is not required
>>> and complicates the definition of combiners. Instead, the combine method
>> of
>>> a GroupReduceFunction should be executed if it implements one of the
>>> combine function interfaces. This would require to remove the default
>>> combine implementation of the RichGroupReduceFunction as well.
>>> 
>>> This would be an API breaking change and has a few implications.
>>> 
>>> 
>>> 
>>> # Possible Implementation:
>>> 
>>> There are a few ways to implement this change.
>>> 
>>> - Remove Combinable annotation or mark it deprecated (and keep effect)
>>> 
>>> - Remove default combine method from RichGroupReduceFunction or deprecate
>>> it
>>> 
>>> 
>>> 
>>> Approach 1:
>>> - Remove Combinable annotation
>>> - Remove default combine() method from RichGroupReduceFunction
>>> - Effect:
>>>    - All RichGroupReduceFunctions that do either have the Combinable
>>> annotation or implement the combine method do not compile anymore
>>>    - GroupReduceFunctions that have the Combinable annotation do not
>>> compile anymore
>>>    - GroupReduceFunctions that implement a combine interface without
>>> having the annotation (and not being set to combinable during program
>>> construction) will execute the previously not executed combine method.
>> This
>>> might change the behavior of the program. In worst case, the program
>> might
>>> silently produce wrong results (or crash) if the combiner implementation
>>> was faulty. In best case, the program executes faster.
>>> 
>>> 
>>> 
>>> Approach 2:
>>> - As Approach 1
>>> - In addition extend both combine interfaces with a deprecated marker
>>> method. This will ensure that all functions that implement a combinable
>>> interface do not compile anymore and need to be fixed. This could prevent
>>> the silent failure as in Approach 1, but would also cause an additional
>> API
>>> breaking change once the deprecated marker method is removed again.
>>> 
>>> 
>>> 
>>> Approach 3:
>>> - Mark Combinable annotation deprecated
>>> - Mark combine() method in RichGroupReduceFunction as deprecated
>>> - Effect:
>>>    - There'll be a couple of deprecation warnings.
>>>    - We face the same problem with silent failures as in Approach 1.
>>>    - We have to check if RichGroupReduceFunction's override combine or
>> not
>>> (can be done with reflection). If the method is not overridden we do not
>>> execute it (unless there is a Combinable annotation) and we are fine. If
>> it
>>> is overridden and no Combinable annotation has been defined, we have the
>>> same problem with silent failures as before.
>>>    - After we remove the deprecated annotation and method, we have the
>>> same effect as with Approach 1.
>>> 
>>> 
>>> 
>>> There are more alternatives, but these are the most viable, IMO.
>>> 
>>> 
>>> 
>>> I think, if we want to remove the combinable annotation, we should do it
>>> now.
>>> 
>>> Given the three options, would go for Approach 1. Yes, breaks a lot of
>> code
>>> and yes there is the possibility of computing incorrect results.
>> Approach 2
>>> is safer but would mean another API breaking change in the future.
>> Approach
>>> 3 comes with fewer breaking changes but has the same problem of silent
>>> failures.
>>> 
>>> IMO, the breaking API changes of Approach 1 are even desirable because
>> they
>>> will make users aware that this feature changed.
>>> 
>>> 
>>> 
>>> What do you think?
>>> 
>>> 
>>> 
>>> Cheers, Fabian
>>> 
>> 


Mime
View raw message