flink-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Márton Balassi <mbala...@apache.org>
Subject Re: Change in the JobManager API
Date Sun, 21 Sep 2014 17:01:45 GMT
Thanks for pointing that out.
I personally prefer that this way it is not necessary to explicitly "close"
a DataSet or DataStream with a sink. We need to update the corresponding
tests however.

On Sun, Sep 21, 2014 at 6:49 PM, Stephan Ewen <sewen@apache.org> wrote:

> There is one more affect of the changes: Since there is no more
> distinction between input/output vertices and since disconnected flows are
> also accepted now, the job manager will not reject any more certain graphs
> that it used to reject.
>
> That is actually desirable, but I think the streaming API made use of that
> behavior to validate that the programs have at least a connected source and
> sink.
>
> This need checks at a different point now.
>
>
> On Sat, Sep 20, 2014 at 8:25 PM, Stephan Ewen <sewen@apache.org> wrote:
>
>> Edit: I have not pushed it, I am about to push ;-)
>>
>> Just needed to rebase on the latest master an tests are pending...
>>
>> On Sat, Sep 20, 2014 at 8:24 PM, Stephan Ewen <sewen@apache.org> wrote:
>>
>>> Hi!
>>>
>>> I have just pushed a big patch to rework the JobManager job and
>>> scheduling classes. It fixes some scalability and robstness issues,
>>> simplifies the task hierarchies, and makes the code ready for some of
>>> the prepared next features (incremental/interactive jobs).
>>>
>>> The pull request is https://github.com/apache/incubator-flink/pull/122
>>>
>>> What will affect developers that go against the lower level APIs (like
>>> the streaming parts) is the following:
>>>
>>>  - No more distrinction between input/intermediate/output tasks
>>>  - Intermediate data sets have a data structure now. This implies that
>>> some methods change slightly (more in name than in meaning).
>>>    In the future, data sets can be consumed many times, but for now, the
>>> network stack supports only one cosumer.
>>>  - The conceptual change that receivers attach senders as inputs (and
>>> grab their outgoing data streams), rather than senders forwarding to
>>>    receivers means that the wiring of JobGraphs is now the other way
>>> around.
>>>  - No more distinction between in-memory and network channels. All
>>> channels have always been automatically in-memory, when senders
>>>    and receiver are co-located. The flag was purely a scheduler hint,
>>> which is obsolete now (see below).
>>>
>>>
>>> Most importantly:
>>>  - The scheduling is a bit different now. Instread of instance sharing,
>>> we now have SlotSharing Groups, which give you
>>>    a way to share resources across tasks, but they behave more dynamic,
>>> which is important for more dynamic environments,
>>>    and when a cluster has less task slots than the parallelism of some
>>> tasks is.
>>>  - For cases that need strict co-location of tasks, we now have
>>> CoLocationConstraints. The Batch API uses them to ensure that
>>>    head, tail, and tasks inside a closed-loop iteration are co-located.
>>>
>>>  Stephan
>>>
>>>
>>
>>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message