flink-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stephan Ewen <se...@apache.org>
Subject Re: Collision of task number values for the same task
Date Tue, 31 May 2016 11:54:43 GMT
It could be that

(a) The task failed and was restarted.

(b) The program has multiple steps (collect() print()), so that parts of
the graph get re-executed.

(c) You have two operators with the same name that become tasks with the
same name.

Do any of those explanations make sense in your setting?


On Tue, May 31, 2016 at 12:48 PM, Alexander Alexandrov <
alexander.s.alexandrov@gmail.com> wrote:

> Sure, you can find them attached here (both jobmanager and taskmanager,
> the problem was observed in the jobmanager logs).
> If needed I can also share the binary to reproduce the issue.
> I think the problem is related to the fact that the input splits are
> lazily assigned to the task slots, and it seems that in case of 8 splits
> for 4 slots, we get each (x/y) combination twice.
> Moreover, I am currently analyzing the structure of the log files, and it
> seems that the task ID is not reported consistently across the different
> messages [1,2,3]. This makes the implementation of an ETL job that extracts
> the statistics from the log and feed them into a database quite hard.
> Would it be possible to push a fix which adds the task ID consistently
> across all messages in the 1.0.x line? If yes, I will open a JIRA and work
> on that this week.
> I would like to get feedback from other people who are parsing jobmanager
> / taskamanager logs on that in order to avoid possible backwards
> compatibility with job analysis tools on the release line.
> [1]
> https://github.com/apache/flink/blob/da23ee38e5b36ddf26a6a5a807efbbbcbfe1d517/flink-runtime/src/main/java/org/apache/flink/runtime/executiongraph/Execution.java#L370-L371
> [2]
> https://github.com/apache/flink/blob/da23ee38e5b36ddf26a6a5a807efbbbcbfe1d517/flink-runtime/src/main/java/org/apache/flink/runtime/executiongraph/Execution.java#L991-L992
> Regards,
> A.
> 2016-05-31 12:01 GMT+02:00 Ufuk Celebi <uce@apache.org>:
>> On Tue, May 31, 2016 at 11:53 AM, Alexander Alexandrov
>> <alexander.s.alexandrov@gmail.com> wrote:
>> > Can somebody shed a light on the execution semantics of the scheduler
>> which
>> > will explain this behavior?
>> The execution IDs are unique per execution attempt. Having two tasks
>> with the same subtask index running at the same time is unexpected.
>> Can you share the complete logs, please?
>> – Ufuk

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message