flink-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chesnay Schepler <chesnay.schep...@fu-berlin.de>
Subject Re: [DISCUSS] Policy on keeping layer alternatives in sync
Date Sat, 27 Sep 2014 21:53:10 GMT
I agree with Kostas, and believe that postponing will imo straight up 
not work since people tend to be *very* busy close to a release, even 
without having to port features to several APIs.

I furthermore don't think we will get anywhere by creating one policy to 
rule them all (especially a rigid one), because there are fundamental 
differences between a) the APIs b) scope of a feature; and there not 
being a point in setting up a policy when it is very likely that we wont 
abide by it.

With the increasing number of API's it's quite a tall order expecting a 
version for each of them from a single contributor. Even know that would 
be 3 (Java, Scala, Streaming(?)) with 2 more to come in the somewhat 
near future (Python, SQL (not sure if relevant)). It is a *massive 
*entry barrier, as well as a major time investment on the contributors 
part. This should also hold for simple features (certainly at the 

If (and only if) Scala is as thin as i am made to believe i would be for 
a hard policy here. I would exclude other API`s from this. The overhead 
from getting to know all API's and debugging unfamiliar code would eat 
up way to much time, which could easily break our neck. It's not just 
about syncing the API's, but doing so in an efficient manner. For them I 
would much rather have 2-3 people per API that are somewhat responsible 
for porting these features, preferably in a more concentrated effort 
(aka batches).

On 27.9.2014 21:03, Kostas Tzoumas wrote:
> If we allow out-of-sync APIs (and backends) until the time of a release,
> aren't we just postponing the syncing problem to the time of the release,
> which is a pretty bad time to have such a problem?
> On Fri, Sep 26, 2014 at 8:49 PM, Robert Metzger <rmetzger@apache.org> wrote:
>> Hi,
>> I'm also in favor of having a strict policy regarding the Java and Scala
>> API.
>> In my understanding is the new Scala API a thin layer above the Java one,
>> so adding new methods should be straightforward (given that there are
>> plenty of examples as a reference).
>> Robert
>> On Fri, Sep 26, 2014 at 11:04 AM, Ufuk Celebi <uce@apache.org> wrote:
>>> Hey Fabian,
>>> thanks for bringing this up.
>>> I would vote to have a hard policy regarding the Scala and Java API as
>>> these are our main user facing APIs.
>>> If there was a fundamental problem or language feature, which could not
>> be
>>> supported/ported in/to the other API, I would be OK if it was only
>>> available in one. But small additions to the APIs like outer joins, which
>>> can be in sync should also be in sync.
>>> If someone does not want to add the corresponding feature to the other
>>> APIs, I would go for a pull request with a request for someone else to
>> port
>>> the missing part it.
>>> I think it is very important for users to be able to assume that all APIs
>>> have the same "power". Otherwise we might end up in a situation (and I
>>> think we already had it with the broadcast variables for a time), where
>>> users have to pick the API, which matches their use case and not their
>>> preference.
>>> Best,
>>> Ufuk
>>> On 26 Sep 2014, at 10:43, Fabian Hueske <fhueske@apache.org> wrote:
>>>> Hi,
>>>> as you all know, Flink has a layered architecture with multiple
>>>> alternatives for certain levels.
>>>> Exampels are:
>>>> - Programming APIs: Java, Scala, (and Python in progress)
>>>> - Processing Backends: distributed runtime (former Nephele), Java
>>>> Collections, (and potentially Tez in the future)
>>>> The challenge with multiple alternatives that serve the same purpuse is
>>>> that these should be in sync.
>>>> A feature that is added to the Java API should also be added to the
>> Scala
>>>> API (and other APIs in the future). The same applies to new runtime
>>>> strategies and operators, such as outer joins.
>>>> I think we need a policy how to keep the features of different layer
>>>> alternatives in sync.
>>>> With the recent update of the Scala API, a ScalaAPICompletenessTest was
>>>> added that checks whether the Scala API offers the same methods as the
>>> Java
>>>> API. Adding a feature to the Java API breaks the build and requires to
>>>> either adapt the Scala API as well or exclude the added methods from
>> the
>>>> APICompletenessTest.
>>>> While this test is a great tool to make sure that that APIs are synced,
>>>> this basically requires that APIs are always synced, i.e., a
>> modification
>>>> of the Java API must go with an equivalent change of the Scala API.
>>>> If we make this a tight policy and force compatibility at all times,
>>>> contributors must know about several different technologies (Scala
>>> Compiler
>>>> Macros, Python, the implementation details of multiple runtime
>> backends,
>>>> ...). This sounds like a huge entrance barrier to me.
>>>> To make it clear, I am definitely in favor of keeping APIs and backends
>>> in
>>>> sync.
>>>> However, I propose to enforce this only for releases, i.e., allow
>>>> out-of-sync APIs on the master branch and fix the APIs for releases.
>>>> With this additional requirement, we also need to think twice which
>>>> features to add as multiple components of the system will be affected.
>>>> What do you guys think?

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message