flink-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Aljoscha Krettek <aljos...@apache.org>
Subject Re: [DISCUSS] Policy on keeping layer alternatives in sync
Date Mon, 29 Sep 2014 07:56:16 GMT
We could use blocking issues on Jira to mark things that need to be
resolved before a release.

On Sat, Sep 27, 2014 at 11:53 PM, Chesnay Schepler <
chesnay.schepler@fu-berlin.de> wrote:

> I agree with Kostas, and believe that postponing will imo straight up not
> work since people tend to be *very* busy close to a release, even without
> having to port features to several APIs.
>
> I furthermore don't think we will get anywhere by creating one policy to
> rule them all (especially a rigid one), because there are fundamental
> differences between a) the APIs b) scope of a feature; and there not being
> a point in setting up a policy when it is very likely that we wont abide by
> it.
>
> With the increasing number of API's it's quite a tall order expecting a
> version for each of them from a single contributor. Even know that would be
> 3 (Java, Scala, Streaming(?)) with 2 more to come in the somewhat near
> future (Python, SQL (not sure if relevant)). It is a *massive *entry
> barrier, as well as a major time investment on the contributors part. This
> should also hold for simple features (certainly at the beginning).
>
> If (and only if) Scala is as thin as i am made to believe i would be for a
> hard policy here. I would exclude other API`s from this. The overhead from
> getting to know all API's and debugging unfamiliar code would eat up way to
> much time, which could easily break our neck. It's not just about syncing
> the API's, but doing so in an efficient manner. For them I would much
> rather have 2-3 people per API that are somewhat responsible for porting
> these features, preferably in a more concentrated effort (aka batches).
>
>
> On 27.9.2014 21:03, Kostas Tzoumas wrote:
>
>> If we allow out-of-sync APIs (and backends) until the time of a release,
>> aren't we just postponing the syncing problem to the time of the release,
>> which is a pretty bad time to have such a problem?
>>
>>
>> On Fri, Sep 26, 2014 at 8:49 PM, Robert Metzger <rmetzger@apache.org>
>> wrote:
>>
>>  Hi,
>>>
>>> I'm also in favor of having a strict policy regarding the Java and Scala
>>> API.
>>> In my understanding is the new Scala API a thin layer above the Java one,
>>> so adding new methods should be straightforward (given that there are
>>> plenty of examples as a reference).
>>>
>>> Robert
>>>
>>> On Fri, Sep 26, 2014 at 11:04 AM, Ufuk Celebi <uce@apache.org> wrote:
>>>
>>>  Hey Fabian,
>>>>
>>>> thanks for bringing this up.
>>>>
>>>> I would vote to have a hard policy regarding the Scala and Java API as
>>>> these are our main user facing APIs.
>>>>
>>>> If there was a fundamental problem or language feature, which could not
>>>>
>>> be
>>>
>>>> supported/ported in/to the other API, I would be OK if it was only
>>>> available in one. But small additions to the APIs like outer joins,
>>>> which
>>>> can be in sync should also be in sync.
>>>>
>>>> If someone does not want to add the corresponding feature to the other
>>>> APIs, I would go for a pull request with a request for someone else to
>>>>
>>> port
>>>
>>>> the missing part it.
>>>>
>>>> I think it is very important for users to be able to assume that all
>>>> APIs
>>>> have the same "power". Otherwise we might end up in a situation (and I
>>>> think we already had it with the broadcast variables for a time), where
>>>> users have to pick the API, which matches their use case and not their
>>>> preference.
>>>>
>>>> Best,
>>>>
>>>> Ufuk
>>>>
>>>> On 26 Sep 2014, at 10:43, Fabian Hueske <fhueske@apache.org> wrote:
>>>>
>>>>  Hi,
>>>>>
>>>>> as you all know, Flink has a layered architecture with multiple
>>>>> alternatives for certain levels.
>>>>> Exampels are:
>>>>> - Programming APIs: Java, Scala, (and Python in progress)
>>>>> - Processing Backends: distributed runtime (former Nephele), Java
>>>>> Collections, (and potentially Tez in the future)
>>>>>
>>>>> The challenge with multiple alternatives that serve the same purpuse
is
>>>>> that these should be in sync.
>>>>> A feature that is added to the Java API should also be added to the
>>>>>
>>>> Scala
>>>
>>>> API (and other APIs in the future). The same applies to new runtime
>>>>> strategies and operators, such as outer joins.
>>>>>
>>>>> I think we need a policy how to keep the features of different layer
>>>>> alternatives in sync.
>>>>> With the recent update of the Scala API, a ScalaAPICompletenessTest was
>>>>> added that checks whether the Scala API offers the same methods as the
>>>>>
>>>> Java
>>>>
>>>>> API. Adding a feature to the Java API breaks the build and requires to
>>>>> either adapt the Scala API as well or exclude the added methods from
>>>>>
>>>> the
>>>
>>>> APICompletenessTest.
>>>>> While this test is a great tool to make sure that that APIs are synced,
>>>>> this basically requires that APIs are always synced, i.e., a
>>>>>
>>>> modification
>>>
>>>> of the Java API must go with an equivalent change of the Scala API.
>>>>> If we make this a tight policy and force compatibility at all times,
>>>>> contributors must know about several different technologies (Scala
>>>>>
>>>> Compiler
>>>>
>>>>> Macros, Python, the implementation details of multiple runtime
>>>>>
>>>> backends,
>>>
>>>> ...). This sounds like a huge entrance barrier to me.
>>>>>
>>>>> To make it clear, I am definitely in favor of keeping APIs and backends
>>>>>
>>>> in
>>>>
>>>>> sync.
>>>>> However, I propose to enforce this only for releases, i.e., allow
>>>>> out-of-sync APIs on the master branch and fix the APIs for releases.
>>>>> With this additional requirement, we also need to think twice which
>>>>> features to add as multiple components of the system will be affected.
>>>>>
>>>>> What do you guys think?
>>>>>
>>>>
>>>>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message