flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Flavio Pompermaier <pomperma...@okkam.it>
Subject Re: Flink slots, threads, task, etc
Date Wed, 19 Apr 2017 07:25:56 GMT
Hi Aljoscha,
thanks for the reply, it was not urgent and I was aware of the FF...btw,
congratulations for it, I saw many interesting talks!
Flink community has grown a lot since it was Stratosphere ;)
Just one last question: in many of my use cases it could be helpful to see
how many of the created splits were "consumed" by an inputFormat/source.
Is it possible to monitor this part somewhere in the dashboards or with a
custom metric?

On Tue, Apr 18, 2017 at 5:24 PM, Aljoscha Krettek <aljoscha@apache.org>

> Hi,
> sorry for not getting any responses but I think everyone was quite busy
> with Flink Forward SF. I’m also no expert on the topic but I’ll try and
> give some answers.
> Regarding a Google Doc version, I don’t think that there is any. You would
> have to modify the Markdown version we have in the doc.
> For the other answers I’ll reuse an example program that consists of
> Source -> Map -> Sink, with chaining disabled and parallelism 2. We’ll this
> have three Tasks: Source, Map, and Sink, with each having two subtasks.
> Let’s denote the subtasks by a number in parenthesis so the first subtask
> for Source is Source(1), second one is Source(2). I’ll also refer to
> Source(1) -> Map(1) -> Sink(1) as a slice of the execution graph since
> these can be executed within one slot.
> Regarding 1, I think this is true. However, a single slot can execute a
> complete slice of the execution graph where each subtask (from a different
> task) would be executed by its own thread.
> Regarding 2.1, Yes, I think it cannot run multiple subtasks of the same
> task while it is possible (and in fact done) to execute all the subtasks of
> a slide in the same slot.
> Regarding 2.2, This is so to allow executing a pipeline of parallelism 8
> using a cluster that has 8 free slots. Basically, each slice fills one slot.
> Regarding 3, I don’t really have an answer.
> Regarding 4, Yes, this can get a bit out of hand if you have very long
> pipelines.
> Best,
> Aljoscha
> On 11. Apr 2017, at 14:37, Flavio Pompermaier <pompermaier@okkam.it>
> wrote:
> Any feedback here..?
> On Wed, Apr 5, 2017 at 7:43 PM, Flavio Pompermaier <pompermaier@okkam.it>
> wrote:
>> Hi to all,
>> I had a very long but useful chat with Fabian and I understood a lot of
>> concepts that was not clear at all to me. We started from the Flink runtime
>> documentation page (https://ci.apache.org/project
>> s/flink/flink-docs-release-1.2/concepts/runtime.html) but
>> I discovered that the terminology is very inconsistent and misleading
>> along the page...
>> For example, one of the very first sentences is :
>> "Flink chains operator subtasks together into tasks. Each task is
>> executed by one thread."
>> What I first understood was that every operator can be executed only by a
>> single thread in all the cluster....probably it should be better "one
>> thread per task slot" (at least).
>> Moreover, if I'm not wrong, a Task Slot can execute only 1 subtask (aka
>> parallel instance) of each task and there's no limit to the number of
>> subtasks per slot (and this is not highlighted at all in that document).
>> The only constraint is that they should belong to different tasks (right?).
>> If there's a google doc version of that page I could try to rewrite it
>> down in order to make it easier to understand some parts...however I still
>> have some more questions:
>>    1. Is it correct that a single Task Slot can execute only a single
>>    subtask of each task and that this task is executed by a single thread
>>    within the slot)?
>>    2. If it so:
>>       1. why at that page there's written "By default, Flink allows
>>       subtasks to share slots even if they are subtasks of different tasks, so
>>       long as they are from the same job"? It seems that it is more common to run
>>       multiple subtasks of the same task (in a slot) than executing different
>>       substasks of different tasks, although this is still permitted...from what
>>       I understood a slot cannot run multiple subtask of the same task at all!
>>       2. and why this constraint? Is there any good reason for that? A
>>       subtask is mapped to 1 thread in the TaskManager, so why a TM with 2 slots
>>       can run 2 subtasks of the same task (in the same JVM) while a TM with 1
>>       slot cannot  (while it can execute an arbitrary number of subtasks of
>>       different tasks)?
>>    3. It it is not so, there's no images representing such a situation
>>    in that page...
>>    4. Isn't dangerous to allow (potentially) an unlimited number of
>>    threads per TM slot??
>> Cheers,
>> Flavio

View raw message