flink-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Till Rohrmann <trohrm...@apache.org>
Subject Re: [DISCUSS] Remove Combinable Annotation from DataSet API
Date Wed, 13 Jan 2016 13:11:27 GMT
Hi Fabian,

thanks for bringing this issue up. I agree with you that it would be nice
to remove the Combinable annotation if it is not really needed. Now if
people are not aware of the Combinable interface then they might think that
they are actually using combiners by simply implementing CombineFunction.

I would also be in favour of approach 1 combined with a migration guide
where we highlight possible problems with a faulty combine implementation.

Cheers,
Till
‚Äč

On Wed, Jan 13, 2016 at 1:53 PM, Fabian Hueske <fhueske@gmail.com> wrote:

> Hi everybody,
>
>
>
> As part of the upcoming 1.0 release we are stabilizing Flink's APIs.
>
> I would like to start a discussion about removing the Combinable annotation
> from the DataSet API.
>
>
>
> # The Current State:
>
> The DataSet API offers two interface for combine functions, CombineFunction
> and GroupCombineFunction, which can be added to a GroupReduceFunctions
> (GRF).
>
>
> However, implementing a combine interface does not make a GRF combinable.
> In addition, the GRF class needs to be annotated with the Combinable
> annotation. The RichGroupReduceFunction has a default implementation of
> combine, which forwards the combine parameters to the reduce method. This
> default implementation is not used, unless the class that extends RGRF has
> the Combinable annotation.
>
>
>
> In addition to the Combinable annotation, the DataSet API
> GroupReduceOperator features the setCombinable(boolean) method to override
> a missing or existing Combinable annotation.
>
>
>
> # My Proposal:
>
> I propose to remove the Combinable annotation because it is not required
> and complicates the definition of combiners. Instead, the combine method of
> a GroupReduceFunction should be executed if it implements one of the
> combine function interfaces. This would require to remove the default
> combine implementation of the RichGroupReduceFunction as well.
>
> This would be an API breaking change and has a few implications.
>
>
>
> # Possible Implementation:
>
> There are a few ways to implement this change.
>
> - Remove Combinable annotation or mark it deprecated (and keep effect)
>
> - Remove default combine method from RichGroupReduceFunction or deprecate
> it
>
>
>
> Approach 1:
> - Remove Combinable annotation
> - Remove default combine() method from RichGroupReduceFunction
> - Effect:
>     - All RichGroupReduceFunctions that do either have the Combinable
> annotation or implement the combine method do not compile anymore
>     - GroupReduceFunctions that have the Combinable annotation do not
> compile anymore
>     - GroupReduceFunctions that implement a combine interface without
> having the annotation (and not being set to combinable during program
> construction) will execute the previously not executed combine method. This
> might change the behavior of the program. In worst case, the program might
> silently produce wrong results (or crash) if the combiner implementation
> was faulty. In best case, the program executes faster.
>
>
>
> Approach 2:
> - As Approach 1
> - In addition extend both combine interfaces with a deprecated marker
> method. This will ensure that all functions that implement a combinable
> interface do not compile anymore and need to be fixed. This could prevent
> the silent failure as in Approach 1, but would also cause an additional API
> breaking change once the deprecated marker method is removed again.
>
>
>
> Approach 3:
> - Mark Combinable annotation deprecated
> - Mark combine() method in RichGroupReduceFunction as deprecated
> - Effect:
>     - There'll be a couple of deprecation warnings.
>     - We face the same problem with silent failures as in Approach 1.
>     - We have to check if RichGroupReduceFunction's override combine or not
> (can be done with reflection). If the method is not overridden we do not
> execute it (unless there is a Combinable annotation) and we are fine. If it
> is overridden and no Combinable annotation has been defined, we have the
> same problem with silent failures as before.
>     - After we remove the deprecated annotation and method, we have the
> same effect as with Approach 1.
>
>
>
> There are more alternatives, but these are the most viable, IMO.
>
>
>
> I think, if we want to remove the combinable annotation, we should do it
> now.
>
> Given the three options, would go for Approach 1. Yes, breaks a lot of code
> and yes there is the possibility of computing incorrect results. Approach 2
> is safer but would mean another API breaking change in the future. Approach
> 3 comes with fewer breaking changes but has the same problem of silent
> failures.
>
> IMO, the breaking API changes of Approach 1 are even desirable because they
> will make users aware that this feature changed.
>
>
>
> What do you think?
>
>
>
> Cheers, Fabian
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message