flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Balaji Rajagopalan <balaji.rajagopa...@olacabs.com>
Subject Re: Streaming job failure due to loss of Taskmanagers
Date Wed, 23 Mar 2016 12:06:19 GMT
Ravinder,
  You could first try to fun standalone examples of flink without hadoop,
and then try to run flink on hadoop. Your trouble seems to point to some
hadoop cluster set up rather than anything to do with flink ?

balaji

On Wed, Mar 23, 2016 at 5:27 PM, Ravinder Kaur <neetu0404@gmail.com> wrote:

> Hello All,
>
> After trying to debug, the log of jobmanager suggests that the failed
> taskmanagers has stopped sending heartbeat messages. After this the TMs
> were detected unreachable by the JM.
>
> If the actor system is dead why is it not restarted by the supervisor
> during the Job? The TM is re-registered at the JM only after the job is
> switched to failed. The failure is still unclear. Can someone help diagnose
> the issue?
>
> Kind Regards,
> Ravinder Kaur
>
>
>
>
> On Mon, Mar 21, 2016 at 10:41 PM, Ravinder Kaur <neetu0404@gmail.com>
> wrote:
>
>> Hi Ufuk,
>>
>> Here is the log of the JobManager
>>
>> 15:14:26,030 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph
>>        - Keyed Aggregation -> Sink: Unnamed (19/25)
>> (25cd0a9effb94d3c23fc4b35e45971e7) switched from SCHEDULED to DEPLOYING
>> 15:14:26,030 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph
>>        - Deploying Keyed Aggregation -> Sink: Unnamed (19/25) (attempt #0)
>> to vm-10-155-208-157
>> 15:14:26,035 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph
>>        - Keyed Aggregation -> Sink: Unnamed (20/25)
>> (eab969f46a8ce40ff6544a1c39663455) switched from CREATED to SCHEDULED
>> 15:14:26,036 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph
>>        - Keyed Aggregation -> Sink: Unnamed (20/25)
>> (eab969f46a8ce40ff6544a1c39663455) switched from SCHEDULED to DEPLOYING
>> 15:14:26,036 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph
>>        - Deploying Keyed Aggregation -> Sink: Unnamed (20/25) (attempt #0)
>> to vm-10-155-208-157
>> 15:14:26,039 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph
>>        - Keyed Aggregation -> Sink: Unnamed (18/25)
>> (382440047bd2ff051c51e546dad90ea9) switched from SCHEDULED to DEPLOYING
>> 15:14:26,039 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph
>>        - Deploying Keyed Aggregation -> Sink: Unnamed (18/25) (attempt #0)
>> to slave2
>> 15:14:26,082 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph
>>        - Keyed Aggregation -> Sink: Unnamed (6/25)
>> (5522c46bb47fe0b0005833a25e9c9e72) switched from DEPLOYING to RUNNING
>> 15:14:26,091 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph
>>        - Keyed Aggregation -> Sink: Unnamed (5/25)
>> (edbb3ee3eb7c0a1977b3dba20aee1d7f) switched from DEPLOYING to RUNNING
>> 15:14:26,093 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph
>>        - Keyed Aggregation -> Sink: Unnamed (7/25)
>> (b51d5cce7e2664bab04aa41d649cb632) switched from DEPLOYING to RUNNING
>> 15:14:26,094 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph
>>        - Keyed Aggregation -> Sink: Unnamed (8/25)
>> (634b686958f682aca02bee1c01a978c2) switched from DEPLOYING to RUNNING
>> 15:14:26,130 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph
>>        - Keyed Aggregation -> Sink: Unnamed (21/25)
>> (63fd4446c027aa10248f828871201a0d) switched from CREATED to SCHEDULED
>> 15:14:26,142 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph
>>        - Keyed Aggregation -> Sink: Unnamed (22/25)
>> (aa668d03252e7c9cd70e78399002fc31) switched from CREATED to SCHEDULED
>> 15:14:26,147 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph
>>        - Keyed Aggregation -> Sink: Unnamed (21/25)
>> (63fd4446c027aa10248f828871201a0d) switched from SCHEDULED to DEPLOYING
>> 15:14:26,148 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph
>>        - Deploying Keyed Aggregation -> Sink: Unnamed (21/25) (attempt #0)
>> to vm-10-155-208-157
>> 15:14:26,168 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph
>>        - Keyed Aggregation -> Sink: Unnamed (22/25)
>> (aa668d03252e7c9cd70e78399002fc31) switched from SCHEDULED to DEPLOYING
>> 15:14:26,168 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph
>>        - Deploying Keyed Aggregation -> Sink: Unnamed (22/25) (attempt #0)
>> to vm-10-155-208-135
>> 15:14:26,169 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph
>>        - Keyed Aggregation -> Sink: Unnamed (23/25)
>> (a35248ddf4ac4b296464ace643154664) switched from CREATED to SCHEDULED
>> 15:14:26,171 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph
>>        - Keyed Aggregation -> Sink: Unnamed (23/25)
>> (a35248ddf4ac4b296464ace643154664) switched from SCHEDULED to DEPLOYING
>> 15:14:26,171 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph
>>        - Deploying Keyed Aggregation -> Sink: Unnamed (23/25) (attempt #0)
>> to vm-10-155-208-135
>> 15:14:26,177 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph
>>        - Keyed Aggregation -> Sink: Unnamed (24/25)
>> (1d8b303a7cf313e110812336e98d577c) switched from CREATED to SCHEDULED
>> 15:14:26,192 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph
>>        - Keyed Aggregation -> Sink: Unnamed (24/25)
>> (1d8b303a7cf313e110812336e98d577c) switched from SCHEDULED to DEPLOYING
>> 15:14:26,192 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph
>>        - Deploying Keyed Aggregation -> Sink: Unnamed (24/25) (attempt #0)
>> to vm-10-155-208-135
>> 15:14:26,247 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph
>>        - Keyed Aggregation -> Sink: Unnamed (25/25)
>> (bf8593ce3db7957427ad8e15e29fb975) switched from CREATED to SCHEDULED
>> 15:14:26,259 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph
>>        - Keyed Aggregation -> Sink: Unnamed (25/25)
>> (bf8593ce3db7957427ad8e15e29fb975) switched from SCHEDULED to DEPLOYING
>> 15:14:26,260 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph
>>        - Deploying Keyed Aggregation -> Sink: Unnamed (25/25) (attempt #0)
>> to vm-10-155-208-135
>> 15:14:26,520 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph
>>        - Keyed Aggregation -> Sink: Unnamed (9/25)
>> (46f7e3af97a70041ad878ca24992cc0a) switched from DEPLOYING to RUNNING
>> 15:14:26,530 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph
>>        - Keyed Aggregation -> Sink: Unnamed (10/25)
>> (528fee08fc63c38ddd9ef0347cee570e) switched from DEPLOYING to RUNNING
>> 15:14:26,530 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph
>>        - Keyed Aggregation -> Sink: Unnamed (12/25)
>> (d6eeb17c2105a091e42aeae7f3587fdd) switched from DEPLOYING to RUNNING
>> 15:14:26,739 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph
>>        - Keyed Aggregation -> Sink: Unnamed (11/25)
>> (1ad15af7f337f751eeae6e9ad38bb062) switched from DEPLOYING to RUNNING
>> 15:14:26,740 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph
>>        - Keyed Aggregation -> Sink: Unnamed (15/25)
>> (a65d4b2835d31143889e386a8704334d) switched from DEPLOYING to RUNNING
>> 15:14:26,747 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph
>>        - Keyed Aggregation -> Sink: Unnamed (24/25)
>> (1d8b303a7cf313e110812336e98d577c) switched from DEPLOYING to RUNNING
>> 15:14:26,748 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph
>>        - Keyed Aggregation -> Sink: Unnamed (22/25)
>> (aa668d03252e7c9cd70e78399002fc31) switched from DEPLOYING to RUNNING
>> 15:14:26,748 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph
>>        - Keyed Aggregation -> Sink: Unnamed (13/25)
>> (1fe93db943e87964f4e379b6f4f8878d) switched from DEPLOYING to RUNNING
>> 15:14:26,749 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph
>>        - Keyed Aggregation -> Sink: Unnamed (14/25)
>> (564818d5f16b250aaee285b81320c7ea) switched from DEPLOYING to RUNNING
>> 15:14:26,755 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph
>>        - Keyed Aggregation -> Sink: Unnamed (17/25)
>> (bc78671ffc663b8f43c8a46beb3a0d5c) switched from DEPLOYING to RUNNING
>> 15:14:26,775 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph
>>        - Keyed Aggregation -> Sink: Unnamed (23/25)
>> (a35248ddf4ac4b296464ace643154664) switched from DEPLOYING to RUNNING
>> 15:14:26,776 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph
>>        - Keyed Aggregation -> Sink: Unnamed (16/25)
>> (92b86c8f173898de99c02b2a4613a4a2) switched from DEPLOYING to RUNNING
>> 15:14:26,776 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph
>>        - Keyed Aggregation -> Sink: Unnamed (18/25)
>> (382440047bd2ff051c51e546dad90ea9) switched from DEPLOYING to RUNNING
>> 15:14:26,825 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph
>>        - Keyed Aggregation -> Sink: Unnamed (25/25)
>> (bf8593ce3db7957427ad8e15e29fb975) switched from DEPLOYING to RUNNING
>> 15:14:27,099 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph
>>        - Keyed Aggregation -> Sink: Unnamed (21/25)
>> (63fd4446c027aa10248f828871201a0d) switched from DEPLOYING to RUNNING
>> 15:14:27,102 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph
>>        - Keyed Aggregation -> Sink: Unnamed (19/25)
>> (25cd0a9effb94d3c23fc4b35e45971e7) switched from DEPLOYING to RUNNING
>> 15:14:27,103 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph
>>        - Keyed Aggregation -> Sink: Unnamed (20/25)
>> (eab969f46a8ce40ff6544a1c39663455) switched from DEPLOYING to RUNNING
>> 15:22:36,119 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph
>>        - Source: Read Text File Source -> Flat Map (1/25)
>> (7b81553dbfad2211d6860cff187b0fd4) switched from RUNNING to FINISHED
>> 15:22:43,697 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph
>>        - Source: Read Text File Source -> Flat Map (8/25)
>> (dbcf94270f3b6c748cd7fb42d62087d0) switched from RUNNING to FINISHED
>> 15:22:46,998 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph
>>        - Source: Read Text File Source -> Flat Map (15/25)
>> (5a9126d43ee0622f7fc24b541c4da568) switched from RUNNING to FINISHED
>> 15:23:00,969 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph
>>        - Source: Read Text File Source -> Flat Map (2/25)
>> (d08cae46dcb5a9dc621161ec1c80a79e) switched from RUNNING to FINISHED
>> 15:23:25,615 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph
>>        - Source: Read Text File Source -> Flat Map (9/25)
>> (9011ead6d1309e3d5b957a6f827ee58d) switched from RUNNING to FINISHED
>> 15:23:26,734 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph
>>        - Source: Read Text File Source -> Flat Map (16/25)
>> (a490f46d6eaac7de26e2461f81246559) switched from RUNNING to FINISHED
>> 15:24:04,736 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph
>>        - Source: Read Text File Source -> Flat Map (10/25)
>> (f7659a6f39daa8831458e3bd612ad16a) switched from RUNNING to FINISHED
>> 15:24:16,436 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph
>>        - Source: Read Text File Source -> Flat Map (17/25)
>> (688b6dc1ed6c0bed284b751bc45bff6d) switched from RUNNING to FINISHED
>> 15:24:17,705 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph
>>        - Source: Read Text File Source -> Flat Map (3/25)
>> (004de7884dc33d95e79337ad05774cf8) switched from RUNNING to FINISHED
>> 15:27:05,900 WARN  akka.remote.RemoteWatcher
>>         - Detected unreachable: [akka.tcp://flink@10.155.208.136:42624]
>> 15:27:05,913 INFO  org.apache.flink.runtime.jobmanager.JobManager
>>        - Task manager akka.tcp://
>> flink@10.155.208.136:42624/user/taskmanager terminated.
>> 15:27:05,913 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph
>>        - Source: Read Text File Source -> Flat Map (20/25)
>> (9cd65cbf15aafe0241b9d59e48b79f52) switched from RUNNING to FAILED
>> 15:27:05,930 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph
>>        - Source: Read Text File Source -> Flat Map (4/25)
>> (b1fc9f23faa6275ddd14d4537bceb7ef) switched from RUNNING to CANCELING
>> 15:27:05,936 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph
>>        - Source: Read Text File Source -> Flat Map (5/25)
>> (4a1547a3bc7ef7eaffa682f3cc255b40) switched from RUNNING to CANCELING
>> 15:27:05,937 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph
>>        - Source: Read Text File Source -> Flat Map (6/25)
>> (2151d07383014506c5f1b0283b1986de) switched from RUNNING to CANCELING
>> 15:27:05,937 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph
>>        - Source: Read Text File Source -> Flat Map (7/25)
>> (e5dede629edfb80e8342168c342c5a70) switched from RUNNING to CANCELING
>> 15:27:05,937 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph
>>        - Source: Read Text File Source -> Flat Map (11/25)
>> (e1179250546953f4199626d73c3062dd) switched from RUNNING to CANCELING
>> 15:27:05,937 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph
>>        - Source: Read Text File Source -> Flat Map (12/25)
>> (1b8401bd1721755149014feeb3366d91) switched from RUNNING to CANCELING
>> 15:27:05,937 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph
>>        - Source: Read Text File Source -> Flat Map (13/25)
>> (a6a0f05e0362dbf0505c1439893e53e4) switched from RUNNING to CANCELING
>> 15:27:05,937 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph
>>        - Source: Read Text File Source -> Flat Map (13/25)
>> (a6a0f05e0362dbf0505c1439893e53e4) switched from RUNNING to CANCELING
>> 15:27:05,937 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph
>>        - Source: Read Text File Source -> Flat Map (14/25)
>> (d71366c16e0d903b14b86e71ba9ba969) switched from RUNNING to CANCELING
>> 15:27:05,937 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph
>>        - Source: Read Text File Source -> Flat Map (18/25)
>> (6d4e73f0e2b9cc164a464610a4976505) switched from RUNNING to CANCELING
>> 15:27:05,938 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph
>>        - Source: Read Text File Source -> Flat Map (19/25)
>> (332523796c07c7a8e14a6acc0d651538) switched from RUNNING to CANCELING
>> 15:27:05,938 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph
>>        - Source: Read Text File Source -> Flat Map (21/25)
>> (09bbdc8b4cb21d1b8a4ec5dc94fbf54f) switched from RUNNING to CANCELING
>> 15:27:05,938 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph
>>        - Source: Read Text File Source -> Flat Map (22/25)
>> (1573e487f0ed3ffd648ee460c7b78e64) switched from RUNNING to CANCELING
>> 15:27:05,938 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph
>>        - Source: Read Text File Source -> Flat Map (23/25)
>> (7fef73c6ffbefd45f0d57007f7d85977) switched from RUNNING to CANCELING
>> 15:27:05,938 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph
>>        - Source: Read Text File Source -> Flat Map (24/25)
>> (96df0f3c0990c5a573c67a31a6207fa2) switched from RUNNING to CANCELING
>> 15:27:05,938 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph
>>        - Source: Read Text File Source -> Flat Map (25/25)
>> (d7c01a3fc5049263833d33457d57f6b4) switched from RUNNING to CANCELING
>> 15:27:05,939 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph
>>        - Keyed Aggregation -> Sink: Unnamed (1/25)
>> (5455ef7528959e20d37d497038536dd7) switched from RUNNING to CANCELING
>> 15:27:05,939 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph
>>        - Keyed Aggregation -> Sink: Unnamed (2/25)
>> (680cbf0c9c3de3d7610747298f2de3af) switched from RUNNING to CANCELING
>> 15:27:05,939 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph
>>        - Keyed Aggregation -> Sink: Unnamed (3/25)
>> (a0a635a93364f40fb666bb8a5cd66907) switched from RUNNING to CANCELING
>> 15:27:05,939 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph
>>        - Keyed Aggregation -> Sink: Unnamed (4/25)
>> (6f50674481dc0ff1e9ff49cde5f3cd4b) switched from RUNNING to CANCELING
>> 15:27:05,939 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph
>>        - Keyed Aggregation -> Sink: Unnamed (5/25)
>> (edbb3ee3eb7c0a1977b3dba20aee1d7f) switched from RUNNING to CANCELING
>> 15:27:05,939 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph
>>        - Keyed Aggregation -> Sink: Unnamed (6/25)
>> (5522c46bb47fe0b0005833a25e9c9e72) switched from RUNNING to CANCELING
>> 15:27:05,939 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph
>>        - Keyed Aggregation -> Sink: Unnamed (7/25)
>> (b51d5cce7e2664bab04aa41d649cb632) switched from RUNNING to CANCELING
>> 15:27:05,940 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph
>>        - Keyed Aggregation -> Sink: Unnamed (8/25)
>> (634b686958f682aca02bee1c01a978c2) switched from RUNNING to CANCELING
>> 15:27:05,940 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph
>>        - Keyed Aggregation -> Sink: Unnamed (9/25)
>> (46f7e3af97a70041ad878ca24992cc0a) switched from RUNNING to CANCELING
>> 15:27:05,940 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph
>>        - Keyed Aggregation -> Sink: Unnamed (10/25)
>> (528fee08fc63c38ddd9ef0347cee570e) switched from RUNNING to CANCELING
>> 15:27:05,940 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph
>>        - Keyed Aggregation -> Sink: Unnamed (11/25)
>> (1ad15af7f337f751eeae6e9ad38bb062) switched from RUNNING to CANCELING
>> 15:27:05,940 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph
>>        - Keyed Aggregation -> Sink: Unnamed (12/25)
>> (d6eeb17c2105a091e42aeae7f3587fdd) switched from RUNNING to CANCELING
>> 15:27:05,940 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph
>>        - Keyed Aggregation -> Sink: Unnamed (13/25)
>> (1fe93db943e87964f4e379b6f4f8878d) switched from RUNNING to CANCELING
>> 15:27:05,940 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph
>>        - Keyed Aggregation -> Sink: Unnamed (14/25)
>> (564818d5f16b250aaee285b81320c7ea) switched from RUNNING to CANCELING
>> 15:27:05,941 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph
>>        - Keyed Aggregation -> Sink: Unnamed (15/25)
>> (a65d4b2835d31143889e386a8704334d) switched from RUNNING to CANCELING
>> 15:27:05,941 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph
>>        - Keyed Aggregation -> Sink: Unnamed (16/25)
>> (92b86c8f173898de99c02b2a4613a4a2) switched from RUNNING to CANCELING
>> 15:27:05,941 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph
>>        - Keyed Aggregation -> Sink: Unnamed (17/25)
>> (bc78671ffc663b8f43c8a46beb3a0d5c) switched from RUNNING to CANCELING
>> 15:27:05,941 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph
>>        - Keyed Aggregation -> Sink: Unnamed (18/25)
>> (382440047bd2ff051c51e546dad90ea9) switched from RUNNING to CANCELING
>> 15:27:05,941 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph
>>        - Keyed Aggregation -> Sink: Unnamed (19/25)
>> (25cd0a9effb94d3c23fc4b35e45971e7) switched from RUNNING to CANCELING
>> 15:27:05,941 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph
>>        - Keyed Aggregation -> Sink: Unnamed (20/25)
>> (eab969f46a8ce40ff6544a1c39663455) switched from RUNNING to CANCELING
>> 15:27:05,942 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph
>>        - Keyed Aggregation -> Sink: Unnamed (21/25)
>> (63fd4446c027aa10248f828871201a0d) switched from RUNNING to CANCELING
>> 15:27:05,942 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph
>>        - Keyed Aggregation -> Sink: Unnamed (22/25)
>> (aa668d03252e7c9cd70e78399002fc31) switched from RUNNING to CANCELING
>> 15:27:05,942 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph
>>        - Keyed Aggregation -> Sink: Unnamed (23/25)
>> (a35248ddf4ac4b296464ace643154664) switched from RUNNING to CANCELING
>> 15:27:05,942 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph
>>        - Keyed Aggregation -> Sink: Unnamed (24/25)
>> (1d8b303a7cf313e110812336e98d577c) switched from RUNNING to CANCELING
>> 15:27:05,942 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph
>>        - Keyed Aggregation -> Sink: Unnamed (25/25)
>> (bf8593ce3db7957427ad8e15e29fb975) switched from RUNNING to CANCELING
>> 15:27:05,943 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph
>>        - Keyed Aggregation -> Sink: Unnamed (16/25)
>> (92b86c8f173898de99c02b2a4613a4a2) switched from CANCELING to FAILED
>> 15:27:05,943 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph
>>        - Source: Read Text File Source -> Flat Map (24/25)
>> (96df0f3c0990c5a573c67a31a6207fa2) switched from CANCELING to FAILED
>> 15:27:05,944 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph
>>        - Keyed Aggregation -> Sink: Unnamed (15/25)
>> (a65d4b2835d31143889e386a8704334d) switched from CANCELING to FAILED
>> 15:27:05,944 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph
>>        - Keyed Aggregation -> Sink: Unnamed (17/25)
>> (bc78671ffc663b8f43c8a46beb3a0d5c) switched from CANCELING to FAILED
>> 15:27:05,944 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph
>>        - Source: Read Text File Source -> Flat Map (13/25)
>> (a6a0f05e0362dbf0505c1439893e53e4) switched from CANCELING to FAILED
>> 15:27:05,945 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph
>>        - Source: Read Text File Source -> Flat Map (6/25)
>> (2151d07383014506c5f1b0283b1986de) switched from CANCELING to FAILED
>> 15:27:05,945 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph
>>        - Keyed Aggregation -> Sink: Unnamed (18/25)
>> (382440047bd2ff051c51e546dad90ea9) switched from CANCELING to FAILED
>> 15:27:05,945 INFO  org.apache.flink.runtime.instance.InstanceManager
>>         - Unregistered task manager akka.tcp://
>> flink@10.155.208.136:42624/user/taskmanager. Number of registered task
>> managers 6. Number of available slots 21.
>> 15:27:05,947 INFO  org.apache.flink.runtime.jobmanager.JobManager
>>        - Status of job 6b4b496acabca01aa6690fa6b156b184 (WordCount from
>> SocketTextStream Example) changed to FAILING.
>> java.lang.Exception: The slot in which the task was executed has been
>> released. Probably loss of TaskManager 07b034c041457fb1cb9eba017be8408e @
>> slave2 - 4 slots - URL: akka.tcp://
>> flink@10.155.208.136:42624/user/taskmanager
>>         at
>> org.apache.flink.runtime.instance.SimpleSlot.releaseSlot(SimpleSlot.java:153)
>>         at
>> org.apache.flink.runtime.instance.SlotSharingGroupAssignment.releaseSharedSlot(SlotSharingGroupAssignment.java:547)
>>         at
>> org.apache.flink.runtime.instance.SharedSlot.releaseSlot(SharedSlot.java:119)
>>         at
>> org.apache.flink.runtime.instance.Instance.markDead(Instance.java:156)
>>         at
>> org.apache.flink.runtime.instance.InstanceManager.unregisterTaskManager(InstanceManager.java:215)
>>         at
>> org.apache.flink.runtime.jobmanager.JobManager$$anonfun$handleMessage$1.applyOrElse(JobManager.scala:696)
>>         at
>> scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36)
>>         at
>> org.apache.flink.runtime.LeaderSessionMessageFilter$$anonfun$receive$1.applyOrElse(LeaderSessionMessageFilter.scala:44)
>>         at
>> scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36)
>>         at
>> org.apache.flink.runtime.LogMessages$$anon$1.apply(LogMessages.scala:33)
>>         at
>> org.apache.flink.runtime.LogMessages$$anon$1.apply(LogMessages.scala:28)
>>         at
>> scala.PartialFunction$class.applyOrElse(PartialFunction.scala:123)
>>         at
>> org.apache.flink.runtime.LogMessages$$anon$1.applyOrElse(LogMessages.scala:28)
>>         at akka.actor.Actor$class.aroundReceive(Actor.scala:465)
>>         at
>> org.apache.flink.runtime.jobmanager.JobManager.aroundReceive(JobManager.scala:100)
>>         at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516)
>>         at
>> akka.actor.dungeon.DeathWatch$class.receivedTerminated(DeathWatch.scala:46)
>>         at akka.actor.ActorCell.receivedTerminated(ActorCell.scala:369)
>>         at akka.actor.ActorCell.autoReceiveMessage(ActorCell.scala:501)
>>         at akka.actor.ActorCell.invoke(ActorCell.scala:486)
>>         at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:254)
>>         at akka.dispatch.Mailbox.run(Mailbox.scala:221)
>>         at akka.dispatch.Mailbox.exec(Mailbox.scala:231)
>>         at
>> scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
>>         at
>> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
>>  at
>> scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
>>         at
>> scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
>> 15:27:06,020 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph
>>        - Keyed Aggregation -> Sink: Unnamed (9/25)
>> (46f7e3af97a70041ad878ca24992cc0a) switched from CANCELING to CANCELED
>> 15:27:06,020 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph
>>        - Keyed Aggregation -> Sink: Unnamed (10/25)
>> (528fee08fc63c38ddd9ef0347cee570e) switched from CANCELING to CANCELED
>> 15:27:06,021 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph
>>        - Keyed Aggregation -> Sink: Unnamed (12/25)
>> (d6eeb17c2105a091e42aeae7f3587fdd) switched from CANCELING to CANCELED
>> 15:27:06,030 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph
>>        - Keyed Aggregation -> Sink: Unnamed (19/25)
>> (25cd0a9effb94d3c23fc4b35e45971e7) switched from CANCELING to CANCELED
>> 15:27:06,031 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph
>>        - Keyed Aggregation -> Sink: Unnamed (20/25)
>> (eab969f46a8ce40ff6544a1c39663455) switched from CANCELING to CANCELED
>> 15:27:06,033 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph
>>        - Keyed Aggregation -> Sink: Unnamed (21/25)
>> (63fd4446c027aa10248f828871201a0d) switched from CANCELING to CANCELED
>> 15:27:06,034 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph
>>        - Keyed Aggregation -> Sink: Unnamed (11/25)
>> (1ad15af7f337f751eeae6e9ad38bb062) switched from CANCELING to CANCELED
>> 15:27:06,034 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph
>>        - Keyed Aggregation -> Sink: Unnamed (13/25)
>> (1fe93db943e87964f4e379b6f4f8878d) switched from CANCELING to CANCELED
>> 15:27:06,035 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph
>>        - Keyed Aggregation -> Sink: Unnamed (14/25)
>> (564818d5f16b250aaee285b81320c7ea) switched from CANCELING to CANCELED
>> 15:27:06,898 INFO  org.apache.flink.runtime.jobmanager.JobManager
>>        - Task manager akka.tcp://
>> flink@10.155.208.137:56659/user/taskmanager terminated.
>> 15:27:06,899 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph
>>        - Source: Read Text File Source -> Flat Map (19/25)
>> (332523796c07c7a8e14a6acc0d651538) switched from CANCELING to FAILED
>> 15:27:06,900 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph
>>        - Keyed Aggregation -> Sink: Unnamed (2/25)
>> (680cbf0c9c3de3d7610747298f2de3af) switched from CANCELING to FAILED
>> 15:27:06,897 WARN  akka.remote.RemoteWatcher
>>         - Detected unreachable: [akka.tcp://flink@10.155.208.137:56659]
>> 15:27:06,902 WARN  akka.remote.RemoteWatcher
>>         - Detected unreachable: [akka.tcp://flink@10.155.208.135:56307]
>> 15:27:06,902 WARN  akka.remote.RemoteWatcher
>>         - Detected unreachable: [akka.tcp://flink@10.155.208.138:55717]
>> 15:27:06,902 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph
>>        - Source: Read Text File Source -> Flat Map (12/25)
>> (1b8401bd1721755149014feeb3366d91) switched from CANCELING to FAILED
>> 15:27:06,904 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph
>>        - Keyed Aggregation -> Sink: Unnamed (3/25)
>> (a0a635a93364f40fb666bb8a5cd66907) switched from CANCELING to FAILED
>> 15:27:06,905 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph
>>        - Source: Read Text File Source -> Flat Map (23/25)
>> (7fef73c6ffbefd45f0d57007f7d85977) switched from CANCELING to FAILED
>> 15:27:06,906 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph
>>        - Keyed Aggregation -> Sink: Unnamed (1/25)
>> (5455ef7528959e20d37d497038536dd7) switched from CANCELING to FAILED
>> 15:27:06,916 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph
>>        - Source: Read Text File Source -> Flat Map (5/25)
>> (4a1547a3bc7ef7eaffa682f3cc255b40) switched from CANCELING to FAILED
>> 15:27:06,917 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph
>>        - Keyed Aggregation -> Sink: Unnamed (4/25)
>> (6f50674481dc0ff1e9ff49cde5f3cd4b) switched from CANCELING to FAILED
>> 15:27:06,917 INFO  org.apache.flink.runtime.instance.InstanceManager
>>         - Unregistered task manager akka.tcp://
>> flink@10.155.208.137:56659/user/taskmanager. Number of registered task
>> managers 5. Number of available slots 17.
>> 15:27:06,918 INFO  org.apache.flink.runtime.jobmanager.JobManager
>>        - Task manager akka.tcp://
>> flink@10.155.208.135:56307/user/taskmanager terminated.
>> 15:27:06,918 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph
>>        - Source: Read Text File Source -> Flat Map (4/25)
>> (b1fc9f23faa6275ddd14d4537bceb7ef) switched from CANCELING to FAILED
>> 15:27:06,918 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph
>>        - Keyed Aggregation -> Sink: Unnamed (25/25)
>> (bf8593ce3db7957427ad8e15e29fb975) switched from CANCELING to FAILED
>> 15:27:06,920 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph
>>        - Source: Read Text File Source -> Flat Map (11/25)
>> (e1179250546953f4199626d73c3062dd) switched from CANCELING to FAILED
>> 15:27:06,924 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph
>>        - Keyed Aggregation -> Sink: Unnamed (24/25)
>> (1d8b303a7cf313e110812336e98d577c) switched from CANCELING to FAILED
>> 15:27:06,926 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph
>>        - Source: Read Text File Source -> Flat Map (18/25)
>> (6d4e73f0e2b9cc164a464610a4976505) switched from CANCELING to FAILED
>> 15:27:06,927 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph
>>        - Keyed Aggregation -> Sink: Unnamed (23/25)
>> (a35248ddf4ac4b296464ace643154664) switched from CANCELING to FAILED
>> 15:27:06,929 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph
>>        - Keyed Aggregation -> Sink: Unnamed (22/25)
>> (aa668d03252e7c9cd70e78399002fc31) switched from CANCELING to FAILED
>> 15:27:06,930 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph
>>        - Source: Read Text File Source -> Flat Map (22/25)
>> (1573e487f0ed3ffd648ee460c7b78e64) switched from CANCELING to FAILED
>> 15:27:06,931 INFO  org.apache.flink.runtime.instance.InstanceManager
>>         - Unregistered task manager akka.tcp://
>> flink@10.155.208.135:56307/user/taskmanager. Number of registered task
>> managers 4. Number of available slots 13.
>> 15:27:06,931 INFO  org.apache.flink.runtime.jobmanager.JobManager
>>        - Task manager akka.tcp://
>> flink@10.155.208.138:55717/user/taskmanager terminated.
>> 15:27:06,932 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph
>>        - Source: Read Text File Source -> Flat Map (14/25)
>> (d71366c16e0d903b14b86e71ba9ba969) switched from CANCELING to FAILED
>> 15:27:06,933 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph
>>        - Keyed Aggregation -> Sink: Unnamed (7/25)
>> (b51d5cce7e2664bab04aa41d649cb632) switched from CANCELING to FAILED
>> 15:27:06,934 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph
>>        - Keyed Aggregation -> Sink: Unnamed (6/25)
>> (5522c46bb47fe0b0005833a25e9c9e72) switched from CANCELING to FAILED
>> 15:27:06,935 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph
>>        - Source: Read Text File Source -> Flat Map (21/25)
>> (09bbdc8b4cb21d1b8a4ec5dc94fbf54f) switched from CANCELING to FAILED
>> 15:27:06,936 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph
>>        - Source: Read Text File Source -> Flat Map (7/25)
>> (e5dede629edfb80e8342168c342c5a70) switched from CANCELING to FAILED
>> 15:27:06,937 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph
>>        - Keyed Aggregation -> Sink: Unnamed (8/25)
>> (634b686958f682aca02bee1c01a978c2) switched from CANCELING to FAILED
>> 15:27:06,938 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph
>>        - Source: Read Text File Source -> Flat Map (25/25)
>> (d7c01a3fc5049263833d33457d57f6b4) switched from CANCELING to FAILED
>> 15:27:06,939 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph
>>        - Keyed Aggregation -> Sink: Unnamed (5/25)
>> (edbb3ee3eb7c0a1977b3dba20aee1d7f) switched from CANCELING to FAILED
>> 15:27:06,940 INFO  org.apache.flink.runtime.instance.InstanceManager
>>         - Unregistered task manager akka.tcp://
>> flink@10.155.208.138:55717/user/taskmanager. Number of registered task
>> managers 3. Number of available slots 9.
>> 15:27:06,940 INFO  org.apache.flink.runtime.jobmanager.JobManager
>>        - Status of job 6b4b496acabca01aa6690fa6b156b184 (WordCount from
>> SocketTextStream Example) changed to FAILED.
>> java.lang.Exception: The slot in which the task was executed has been
>> released. Probably loss of TaskManager 07b034c041457fb1cb9eba017be8408e @
>> slave2 - 4 slots - URL: akka.tcp://
>> flink@10.155.208.136:42624/user/taskmanager
>>         at
>> org.apache.flink.runtime.instance.SimpleSlot.releaseSlot(SimpleSlot.java:153)
>>         at
>> org.apache.flink.runtime.instance.SlotSharingGroupAssignment.releaseSharedSlot(SlotSharingGroupAssignment.java:547)
>>         at
>> org.apache.flink.runtime.instance.SharedSlot.releaseSlot(SharedSlot.java:119)
>>         at
>> org.apache.flink.runtime.instance.Instance.markDead(Instance.java:156)
>>         at
>> org.apache.flink.runtime.instance.InstanceManager.unregisterTaskManager(InstanceManager.java:215)
>>         at
>> org.apache.flink.runtime.jobmanager.JobManager$$anonfun$handleMessage$1.applyOrElse(JobManager.scala:696)
>>         at
>> scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36)
>>         at
>> org.apache.flink.runtime.LeaderSessionMessageFilter$$anonfun$receive$1.applyOrElse(LeaderSessionMessageFilter.scala:44)
>>         at
>> scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36)
>>         at
>> org.apache.flink.runtime.LogMessages$$anon$1.apply(LogMessages.scala:33)
>>         at
>> org.apache.flink.runtime.LogMessages$$anon$1.apply(LogMessages.scala:28)
>>         at
>> scala.PartialFunction$class.applyOrElse(PartialFunction.scala:123)
>>         at
>> org.apache.flink.runtime.LogMessages$$anon$1.applyOrElse(LogMessages.scala:28)
>>         at akka.actor.Actor$class.aroundReceive(Actor.scala:465)
>>         at
>> org.apache.flink.runtime.jobmanager.JobManager.aroundReceive(JobManager.scala:100)
>>         at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516)
>>         at
>> akka.actor.dungeon.DeathWatch$class.receivedTerminated(DeathWatch.scala:46)
>>         at akka.actor.ActorCell.receivedTerminated(ActorCell.scala:369)
>>         at akka.actor.ActorCell.autoReceiveMessage(ActorCell.scala:501)
>>         at akka.actor.ActorCell.invoke(ActorCell.scala:486)
>>         at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:254)
>>         at akka.dispatch.Mailbox.run(Mailbox.scala:221)
>>         at akka.dispatch.Mailbox.exec(Mailbox.scala:231)
>>         at
>> scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
>>         at
>> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
>>         at
>> scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
>>         at
>> scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
>> 15:27:29,543 INFO  Remoting
>>        - Quarantined address [akka.tcp://flink@10.155.208.136:42624] is
>> still unreachable or has not been restarted. Keeping it quarantined.
>> 15:28:45,964 INFO  Remoting
>>        - Quarantined address [akka.tcp://flink@10.155.208.135:56307] is
>> still unreachable or has not been restarted. Keeping it quarantined.
>> 15:28:45,978 WARN  Remoting
>>        - Tried to associate with unreachable remote address [akka.tcp://
>> flink@10.155.208.135:56307]. Address is now gated for 5000 ms, all
>> messages to this address will be delivered to dead letters. Reason: The
>> remote system has a UID that has been quarantined. Association aborted.
>> 15:28:45,991 INFO  Remoting
>>        - Quarantined address [akka.tcp://flink@10.155.208.136:42624] is
>> still unreachable or has not been restarted. Keeping it quarantined.
>> 15:28:45,999 INFO  Remoting
>>        - Quarantined address [akka.tcp://flink@10.155.208.138:55717] is
>> still unreachable or has not been restarted. Keeping it quarantined.
>> 15:28:46,108 INFO  Remoting
>>        - Quarantined address [akka.tcp://flink@10.155.208.137:56659] is
>> still unreachable or has not been restarted. Keeping it quarantined.
>> 15:28:53,349 INFO  org.apache.flink.runtime.instance.InstanceManager
>>         - Registering TaskManager at akka.tcp://
>> flink@10.155.208.135:56307/user/taskmanager which was marked as dead
>> earlier because of a heart-beat timeout.
>> 15:28:53,349 INFO  org.apache.flink.runtime.instance.InstanceManager
>>         - Registered TaskManager at vm-10-155-208-135 (akka.tcp://
>> flink@10.155.208.135:56307/user/taskmanager) as
>> 9b6267856b55ca2937bab02b48e464a8. Current number of registered hosts is 4.
>> Current number of alive task slots is 13.
>> 15:30:25,988 INFO  Remoting
>>        - Quarantined address [akka.tcp://flink@10.155.208.136:42624] is
>> still unreachable or has not been restarted. Keeping it quarantined.
>> 15:30:25,992 INFO  Remoting
>>        - Quarantined address [akka.tcp://flink@10.155.208.138:55717] is
>> still unreachable or has not been restarted. Keeping it quarantined.
>> 15:30:26,090 INFO  Remoting
>>        - Quarantined address [akka.tcp://flink@10.155.208.137:56659] is
>> still unreachable or has not been restarted. Keeping it quarantined.
>> 15:32:05,994 INFO  Remoting
>>        - Quarantined address [akka.tcp://flink@10.155.208.137:56659] is
>> still unreachable or has not been restarted. Keeping it quarantined.
>> 15:32:05,996 INFO  Remoting
>>        - Quarantined address [akka.tcp://flink@10.155.208.138:55717] is
>> still unreachable or has not been restarted. Keeping it quarantined.
>> 15:32:05,999 INFO  Remoting
>>        - Quarantined address [akka.tcp://flink@10.155.208.136:42624] is
>> still unreachable or has not been restarted. Keeping it quarantined.
>> 15:32:06,007 WARN  Remoting
>>        - Tried to associate with unreachable remote address [akka.tcp://
>> flink@10.155.208.136:42624]. Address is now gated for 5000 ms, all
>> messages to this address will be delivered to dead letters. Reason: The
>> remote system has a UID that has been quarantined. Association aborted.
>> 15:32:11,743 INFO  org.apache.flink.runtime.instance.InstanceManager
>>         - Registering TaskManager at akka.tcp://
>> flink@10.155.208.136:42624/user/taskmanager which was marked as dead
>> earlier because of a heart-beat timeout.
>> 15:32:11,743 INFO  org.apache.flink.runtime.instance.InstanceManager
>>         - Registered TaskManager at slave2 (akka.tcp://
>> flink@10.155.208.136:42624/user/taskmanager) as
>> ff5d97a4837980371a14afa2b0f0823b. Current number of registered hosts is 5.
>> Current number of alive task slots is 17.
>> 15:58:47,993 WARN  akka.remote.ReliableDeliverySupervisor
>>        - Association with remote system [akka.tcp://
>> flink@10.155.208.156:33728] has failed, address is now gated for [5000]
>> ms. Reason is: [Disassociated].
>> 15:58:48,176 WARN  akka.remote.ReliableDeliverySupervisor
>>        - Association with remote system [akka.tcp://
>> flink@10.155.208.157:33728] has failed, address is now gated for [5000]
>> ms. Reason is: [Disassociated].
>> 15:58:48,744 WARN  akka.remote.ReliableDeliverySupervisor
>>        - Association with remote system [akka.tcp://
>> flink@10.155.208.158:33728] has failed, address is now gated for [5000]
>> ms. Reason is: [Disassociated].
>> 15:58:51,736 WARN  akka.remote.ReliableDeliverySupervisor
>>        - Association with remote system [akka.tcp://
>> flink@10.155.208.135:56307] has failed, address is now gated for [5000]
>> ms. Reason is: [Disassociated].
>> 15:58:54,050 WARN  akka.remote.ReliableDeliverySupervisor
>>        - Association with remote system [akka.tcp://
>> flink@10.155.208.136:42624] has failed, address is now gated for [5000]
>> ms. Reason is: [Disassociated].
>> 15:58:54,502 INFO  org.apache.flink.runtime.webmonitor.WebRuntimeMonitor
>>         - Removing web root dir
>> /tmp/flink-web-0dd9cb75-526a-4a95-a920-194ed24b0389
>> 15:58:54,509 INFO  org.apache.flink.runtime.blob.BlobServer
>>        - Stopped BLOB server at 0.0.0.0:55249
>>
>>
>> Kind Regards,
>> Ravinder Kaur
>>
>> On Mon, Mar 21, 2016 at 10:23 PM, Ufuk Celebi <uce@apache.org> wrote:
>>
>>> Hey Ravinder,
>>>
>>> can you please share the JobManager logs as well?
>>>
>>> The logs say that the TaskManager disconnects from the JobManager,
>>> because that one is not reachable anymore. At this point, the running
>>> shuffles are cancelled and you see the follow up
>>> RemoteTransportExceptions.
>>>
>>> – Ufuk
>>>
>>>
>>> On Mon, Mar 21, 2016 at 5:42 PM, Ravinder Kaur <neetu0404@gmail.com>
>>> wrote:
>>> > Hello All,
>>> >
>>> > I'm running the WordCount example streaming job and it fails because
>>> of loss
>>> > of Taskmanagers.
>>> >
>>> > When gone through the logs of the taskmanager it has the following
>>> messages
>>> >
>>> > 15:14:26,592 INFO  org.apache.flink.streaming.runtime.tasks.StreamTask
>>> > - State backend is set to heap memory (checkpoint to jobmanager)
>>> > 15:14:26,703 INFO  org.apache.flink.runtime.taskmanager.Task
>>> > - Keyed Aggregation -> Sink: Unnamed (15/25) switched to RUNNING
>>> > 15:14:26,709 INFO  org.apache.flink.runtime.taskmanager.Task
>>> > - Keyed Aggregation -> Sink: Unnamed (17/25) switched to RUNNING
>>> > 15:14:26,712 INFO  org.apache.flink.runtime.taskmanager.Task
>>> > - Keyed Aggregation -> Sink: Unnamed (16/25) switched to RUNNING
>>> > 15:14:26,714 INFO  org.apache.flink.runtime.taskmanager.Task
>>> > - Keyed Aggregation -> Sink: Unnamed (18/25) switched to RUNNING
>>> > 15:27:29,356 INFO  org.apache.flink.runtime.taskmanager.Task
>>> > - Keyed Aggregation -> Sink: Unnamed (15/25) switched to FAILED with
>>> > exception.
>>> >
>>> org.apache.flink.runtime.io.network.netty.exception.RemoteTransportException:
>>> > Connection unexpectedly closed by remote task manager
>>> > 'vm-10-155-208-135.cloud.mwn.de/10.155.208.135:37028'. This might
>>> indicate
>>> > that the remote task manager was lost.
>>> >         at
>>> >
>>> org.apache.flink.runtime.io.network.netty.PartitionRequestClientHandler.channelInactive(PartitionRequestClientHandler.java:119)
>>> >         at
>>> >
>>> io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:208)
>>> >         at
>>> >
>>> io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:194)
>>> >         at
>>> >
>>> io.netty.channel.ChannelInboundHandlerAdapter.channelInactive(ChannelInboundHandlerAdapter.java:75)
>>> >         at
>>> >
>>> io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:208)
>>> >         at
>>> >
>>> io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:194)
>>> >         at
>>> >
>>> io.netty.handler.codec.ByteToMessageDecoder.channelInactive(ByteToMessageDecoder.java:306)
>>> >         at
>>> >
>>> io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:208)
>>> >         at
>>> >
>>> io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:194)
>>> >         at
>>> >
>>> io.netty.channel.DefaultChannelPipeline.fireChannelInactive(DefaultChannelPipeline.java:828)
>>> >         at
>>> >
>>> io.netty.channel.AbstractChannel$AbstractUnsafe$7.run(AbstractChannel.java:621)
>>> >         at
>>> >
>>> io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:358)
>>> >         at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:357)
>>> >         at
>>> >
>>> io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:112)
>>> >         at java.lang.Thread.run(Thread.java:745)
>>> > 15:27:29,401 INFO  org.apache.flink.runtime.taskmanager.TaskManager
>>> > - TaskManager akka://flink/user/taskmanager disconnects from JobManager
>>> > akka.tcp://flink@10.155.208.156:6123/user/jobmanager: JobManager is no
>>> > longer reachable
>>> > 15:27:29,384 WARN  akka.remote.RemoteWatcher
>>> > - Detected unreachable: [akka.tcp://flink@10.155.208.156:6123]
>>> > 15:27:29,356 INFO  org.apache.flink.runtime.taskmanager.Task
>>> > - Keyed Aggregation -> Sink: Unnamed (18/25) switched to FAILED with
>>> > exception.
>>> >
>>> org.apache.flink.runtime.io.network.netty.exception.RemoteTransportException:
>>> > Connection unexpectedly closed by remote task manager
>>> > 'vm-10-155-208-135.cloud.mwn.de/10.155.208.135:37028'. This might
>>> indicate
>>> > that the remote task manager was lost.
>>> >         at
>>> >
>>> org.apache.flink.runtime.io.network.netty.PartitionRequestClientHandler.channelInactive(PartitionRequestClientHandler.java:119)
>>> >         at
>>> >
>>> io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:208)
>>> >         at
>>> >
>>> io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:194)
>>> >         at
>>> >
>>> io.netty.channel.ChannelInboundHandlerAdapter.channelInactive(ChannelInboundHandlerAdapter.java:75)
>>> >         at
>>> >
>>> io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:208)
>>> >         at
>>> >
>>> io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:194)
>>> >         at
>>> >
>>> io.netty.handler.codec.ByteToMessageDecoder.channelInactive(ByteToMessageDecoder.java:306)
>>> >         at
>>> >
>>> io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:208)
>>> >         at
>>> >
>>> io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:194)
>>> >         at
>>> >
>>> io.netty.channel.DefaultChannelPipeline.fireChannelInactive(DefaultChannelPipeline.java:828)
>>> >         at
>>> >
>>> io.netty.channel.AbstractChannel$AbstractUnsafe$7.run(AbstractChannel.java:621)
>>> >         at
>>> >
>>> io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:358)
>>> >         at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:357)
>>> >         at
>>> >
>>> io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:112)
>>> >         at java.lang.Thread.run(Thread.java:745)
>>> > 15:27:29,356 INFO  org.apache.flink.runtime.taskmanager.Task
>>> > - Keyed Aggregation -> Sink: Unnamed (16/25) switched to FAILED with
>>> > exception.
>>> >
>>> org.apache.flink.runtime.io.network.netty.exception.RemoteTransportException:
>>> > Connection unexpectedly closed by remote task manager
>>> > 'vm-10-155-208-135.cloud.mwn.de/10.155.208.135:37028'. This might
>>> indicate
>>> > that the remote task manager was lost.
>>> >         at
>>> >
>>> org.apache.flink.runtime.io.network.netty.PartitionRequestClientHandler.channelInactive(PartitionRequestClientHandler.java:119)
>>> >         at
>>> >
>>> io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:208)
>>> >         at
>>> >
>>> io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:194)
>>> >         at
>>> >
>>> io.netty.channel.ChannelInboundHandlerAdapter.channelInactive(ChannelInboundHandlerAdapter.java:75)
>>> >         at
>>> >
>>> io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:208)
>>> >         at
>>> >
>>> io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:194)
>>> >         at
>>> >
>>> io.netty.handler.codec.ByteToMessageDecoder.channelInactive(ByteToMessageDecoder.java:306)
>>> >         at
>>> >
>>> io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:208)
>>> >         at
>>> >
>>> io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:194)
>>> >         at
>>> >
>>> io.netty.channel.DefaultChannelPipeline.fireChannelInactive(DefaultChannelPipeline.java:828)
>>> >         at
>>> >
>>> io.netty.channel.AbstractChannel$AbstractUnsafe$7.run(AbstractChannel.java:621)
>>> >         at
>>> >
>>> io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:358)
>>> >         at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:357)
>>> >         at
>>> >
>>> io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:112)
>>> >         at java.lang.Thread.run(Thread.java:745)
>>> > 15:27:29,356 INFO  org.apache.flink.runtime.taskmanager.Task
>>> > - Keyed Aggregation -> Sink: Unnamed (17/25) switched to FAILED with
>>> > exception.
>>> >
>>> org.apache.flink.runtime.io.network.netty.exception.RemoteTransportException:
>>> > Connection unexpectedly closed by remote task manager
>>> > 'vm-10-155-208-135.cloud.mwn.de/10.155.208.135:37028'. This might
>>> indicate
>>> > that the remote task manager was lost.
>>> >         at
>>> >
>>> org.apache.flink.runtime.io.network.netty.PartitionRequestClientHandler.channelInactive(PartitionRequestClientHandler.java:119)
>>> >  at
>>> >
>>> io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:208)
>>> >         at
>>> >
>>> io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:194)
>>> >         at
>>> >
>>> io.netty.channel.ChannelInboundHandlerAdapter.channelInactive(ChannelInboundHandlerAdapter.java:75)
>>> >         at
>>> >
>>> io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:208)
>>> >         at
>>> >
>>> io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:194)
>>> >         at
>>> >
>>> io.netty.handler.codec.ByteToMessageDecoder.channelInactive(ByteToMessageDecoder.java:306)
>>> >         at
>>> >
>>> io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:208)
>>> >         at
>>> >
>>> io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:194)
>>> >         at
>>> >
>>> io.netty.channel.DefaultChannelPipeline.fireChannelInactive(DefaultChannelPipeline.java:828)
>>> >         at
>>> >
>>> io.netty.channel.AbstractChannel$AbstractUnsafe$7.run(AbstractChannel.java:621)
>>> >         at
>>> >
>>> io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:358)
>>> >         at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:357)
>>> >         at
>>> >
>>> io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:112)
>>> >         at java.lang.Thread.run(Thread.java:745)
>>> > 15:27:29,518 INFO  org.apache.flink.runtime.taskmanager.TaskManager
>>> > - Cancelling all computations and discarding all cached data.
>>> > 15:27:29,525 INFO  org.apache.flink.runtime.taskmanager.Task
>>> > - Attempting to fail task externally Keyed Aggregation -> Sink: Unnamed
>>> > (17/25)
>>> > 15:27:29,525 INFO  org.apache.flink.runtime.taskmanager.Task
>>> > - Task Keyed Aggregation -> Sink: Unnamed (17/25) is already in state
>>> FAILED
>>> > 15:27:29,525 INFO  org.apache.flink.runtime.taskmanager.Task
>>> > - Attempting to fail task externally Source: Read Text File Source ->
>>> Flat
>>> > Map (20/25)
>>> > 15:27:29,525 INFO  org.apache.flink.runtime.taskmanager.Task
>>> > - Source: Read Text File Source -> Flat Map (20/25) switched to FAILED
>>> with
>>> > exception.
>>> > java.lang.Exception: TaskManager akka://flink/user/taskmanager
>>> disconnects
>>> > from JobManager akka.tcp://flink@10.155.208.156:6123/user/jobmanager:
>>> > JobManager is no longer reachable
>>> >         at
>>> >
>>> org.apache.flink.runtime.taskmanager.TaskManager.handleJobManagerDisconnect(TaskManager.scala:826)
>>> >         at
>>> >
>>> org.apache.flink.runtime.taskmanager.TaskManager$$anonfun$handleMessage$1.applyOrElse(TaskManager.scala:297)
>>> >         at
>>> >
>>> scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36)
>>> >         at
>>> >
>>> org.apache.flink.runtime.LeaderSessionMessageFilter$$anonfun$receive$1.applyOrElse(LeaderSessionMessageFilter.scala:44)
>>> >         at
>>> >
>>> scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36)
>>> >         at
>>> >
>>> org.apache.flink.runtime.LogMessages$$anon$1.apply(LogMessages.scala:33)
>>> >         at
>>> >
>>> org.apache.flink.runtime.LogMessages$$anon$1.apply(LogMessages.scala:28)
>>> >         at
>>> > scala.PartialFunction$class.applyOrElse(PartialFunction.scala:123)
>>> >         at
>>> >
>>> org.apache.flink.runtime.LogMessages$$anon$1.applyOrElse(LogMessages.scala:28)
>>> >         at akka.actor.Actor$class.aroundReceive(Actor.scala:465)
>>> >         at
>>> >
>>> org.apache.flink.runtime.taskmanager.TaskManager.aroundReceive(TaskManager.scala:119)
>>> >         at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516)
>>> >         at
>>> >
>>> akka.actor.dungeon.DeathWatch$class.receivedTerminated(DeathWatch.scala:46)
>>> >         at akka.actor.ActorCell.receivedTerminated(ActorCell.scala:369)
>>> >         at akka.actor.ActorCell.autoReceiveMessage(ActorCell.scala:501)
>>> >         at akka.actor.ActorCell.invoke(ActorCell.scala:486)
>>> >         at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:254)
>>> >         at akka.dispatch.Mailbox.run(Mailbox.scala:221)
>>> >         at akka.dispatch.Mailbox.exec(Mailbox.scala:231)
>>> >         at
>>> > scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
>>> >         at
>>> >
>>> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.pollAndExecAll(ForkJoinPool.java:1253)
>>> >  at
>>> >
>>> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1346)
>>> >         at
>>> >
>>> scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
>>> >         at
>>> >
>>> scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
>>> > 15:27:29,526 INFO  org.apache.flink.runtime.taskmanager.Task
>>> > - Freeing task resources for Keyed Aggregation -> Sink: Unnamed (15/25)
>>> > 15:27:29,525 INFO  org.apache.flink.runtime.taskmanager.Task
>>> > - Freeing task resources for Keyed Aggregation -> Sink: Unnamed (17/25)
>>> > 15:27:29,526 INFO  org.apache.flink.runtime.taskmanager.Task
>>> > - Freeing task resources for Keyed Aggregation -> Sink: Unnamed (16/25)
>>> > 15:27:29,527 INFO  org.apache.flink.runtime.taskmanager.Task
>>> > - Freeing task resources for Keyed Aggregation -> Sink: Unnamed (18/25)
>>> > 15:27:29,528 INFO  org.apache.flink.runtime.taskmanager.Task
>>> > - Triggering cancellation of task code Source: Read Text File Source
>>> -> Flat
>>> > Map (20/25) (9cd65cbf15aafe0241b9d59e48b79f52).
>>> > 15:27:29,531 INFO  org.apache.flink.runtime.taskmanager.Task
>>> > - Attempting to fail task externally Source: Read Text File Source ->
>>> Flat
>>> > Map (6/25)
>>> > 15:27:29,531 INFO  org.apache.flink.runtime.taskmanager.Task
>>> > - Source: Read Text File Source -> Flat Map (6/25) switched to FAILED
>>> with
>>> > exception.
>>> > java.lang.Exception: TaskManager akka://flink/user/taskmanager
>>> disconnects
>>> > from JobManager akka.tcp://flink@10.155.208.156:6123/user/jobmanager:
>>> > JobManager is no longer reachable
>>> >         at
>>> >
>>> org.apache.flink.runtime.taskmanager.TaskManager.handleJobManagerDisconnect(TaskManager.scala:826)
>>> >         at
>>> >
>>> org.apache.flink.runtime.taskmanager.TaskManager$$anonfun$handleMessage$1.applyOrElse(TaskManager.scala:297)
>>> >         at
>>> >
>>> scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36)
>>> >         at
>>> >
>>> org.apache.flink.runtime.LeaderSessionMessageFilter$$anonfun$receive$1.applyOrElse(LeaderSessionMessageFilter.scala:44)
>>> >         at
>>> >
>>> scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36)
>>> >         at
>>> >
>>> org.apache.flink.runtime.LogMessages$$anon$1.apply(LogMessages.scala:33)
>>> >         at
>>> >
>>> org.apache.flink.runtime.LogMessages$$anon$1.apply(LogMessages.scala:28)
>>> >         at
>>> > scala.PartialFunction$class.applyOrElse(PartialFunction.scala:123)
>>> >         at
>>> >
>>> org.apache.flink.runtime.LogMessages$$anon$1.applyOrElse(LogMessages.scala:28)
>>> >         at akka.actor.Actor$class.aroundReceive(Actor.scala:465)
>>> >         at
>>> >
>>> org.apache.flink.runtime.taskmanager.TaskManager.aroundReceive(TaskManager.scala:119)
>>> >         at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516)
>>> >         at
>>> >
>>> akka.actor.dungeon.DeathWatch$class.receivedTerminated(DeathWatch.scala:46)
>>> >         at akka.actor.ActorCell.receivedTerminated(ActorCell.scala:369)
>>> >         at akka.actor.ActorCell.autoReceiveMessage(ActorCell.scala:501)
>>> >         at akka.actor.ActorCell.invoke(ActorCell.scala:486)
>>> >         at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:254)
>>> >         at akka.dispatch.Mailbox.run(Mailbox.scala:221)
>>> >         at akka.dispatch.Mailbox.exec(Mailbox.scala:231)
>>> >         at
>>> > scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
>>> >         at
>>> >
>>> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.pollAndExecAll(ForkJoinPool.java:1253)
>>> >         at
>>> >
>>> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1346)
>>> >         at
>>> >
>>> scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
>>> >         at
>>> >
>>> scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
>>> > 15:27:29,532 INFO  org.apache.flink.runtime.taskmanager.Task
>>> > - Freeing task resources for Source: Read Text File Source -> Flat Map
>>> > (20/25)
>>> > 15:27:29,535 INFO  org.apache.flink.runtime.taskmanager.Task
>>> > - Triggering cancellation of task code Source: Read Text File Source
>>> -> Flat
>>> > Map (6/25) (2151d07383014506c5f1b0283b1986de).
>>> > 15:27:29,535 INFO  org.apache.flink.runtime.taskmanager.Task
>>> > - Attempting to fail task externally Keyed Aggregation -> Sink: Unnamed
>>> > (15/25)
>>> > 15:27:29,535 INFO  org.apache.flink.runtime.taskmanager.Task
>>> > - Task Keyed Aggregation -> Sink: Unnamed (15/25) is already in state
>>> FAILED
>>> > 15:27:29,535 INFO  org.apache.flink.runtime.taskmanager.Task
>>> > - Attempting to fail task externally Source: Read Text File Source ->
>>> Flat
>>> > Map (13/25)
>>> > 15:27:29,535 INFO  org.apache.flink.runtime.taskmanager.Task
>>> > - Source: Read Text File Source -> Flat Map (13/25) switched to FAILED
>>> with
>>> > exception.
>>> > java.lang.Exception: TaskManager akka://flink/user/taskmanager
>>> disconnects
>>> > from JobManager akka.tcp://flink@10.155.208.156:6123/user/jobmanager:
>>> > JobManager is no longer reachable
>>> >         at
>>> >
>>> org.apache.flink.runtime.taskmanager.TaskManager.handleJobManagerDisconnect(TaskManager.scala:826)
>>> >         at
>>> >
>>> org.apache.flink.runtime.taskmanager.TaskManager$$anonfun$handleMessage$1.applyOrElse(TaskManager.scala:297)
>>> >         at
>>> >
>>> scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36)
>>> >         at
>>> >
>>> org.apache.flink.runtime.LeaderSessionMessageFilter$$anonfun$receive$1.applyOrElse(LeaderSessionMessageFilter.scala:44)
>>> >         at
>>> >
>>> scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36)
>>> >         at
>>> >
>>> org.apache.flink.runtime.LogMessages$$anon$1.apply(LogMessages.scala:33)
>>> >         at
>>> >
>>> org.apache.flink.runtime.LogMessages$$anon$1.apply(LogMessages.scala:28)
>>> >         at
>>> > scala.PartialFunction$class.applyOrElse(PartialFunction.scala:123)
>>> >         at
>>> >
>>> org.apache.flink.runtime.LogMessages$$anon$1.applyOrElse(LogMessages.scala:28)
>>> >         at akka.actor.Actor$class.aroundReceive(Actor.scala:465)
>>> >         at
>>> >
>>> org.apache.flink.runtime.taskmanager.TaskManager.aroundReceive(TaskManager.scala:119)
>>> >         at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516)
>>> >         at
>>> >
>>> akka.actor.dungeon.DeathWatch$class.receivedTerminated(DeathWatch.scala:46)
>>> >         at akka.actor.ActorCell.receivedTerminated(ActorCell.scala:369)
>>> >         at akka.actor.ActorCell.autoReceiveMessage(ActorCell.scala:501)
>>> >         at akka.actor.ActorCell.invoke(ActorCell.scala:486)
>>> >         at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:254)
>>> >         at akka.dispatch.Mailbox.run(Mailbox.scala:221)
>>> >         at akka.dispatch.Mailbox.exec(Mailbox.scala:231)
>>> >         at
>>> > scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
>>> >         at
>>> >
>>> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.pollAndExecAll(ForkJoinPool.java:1253)
>>> >         at
>>> >
>>> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1346)
>>> >         at
>>> >
>>> scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
>>> >         at
>>> >
>>> scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
>>> > 15:27:29,536 INFO  org.apache.flink.runtime.taskmanager.Task
>>> > - Freeing task resources for Source: Read Text File Source -> Flat Map
>>> > (6/25)
>>> > 15:27:29,538 INFO  org.apache.flink.runtime.taskmanager.Task
>>> > - Triggering cancellation of task code Source: Read Text File Source
>>> -> Flat
>>> > Map (13/25) (a6a0f05e0362dbf0505c1439893e53e4).
>>> > 15:27:29,539 INFO  org.apache.flink.runtime.taskmanager.Task
>>> > - Attempting to fail task externally Keyed Aggregation -> Sink: Unnamed
>>> > (16/25)
>>> > 15:27:29,539 INFO  org.apache.flink.runtime.taskmanager.Task
>>> > - Task Keyed Aggregation -> Sink: Unnamed (16/25) is already in state
>>> FAILED
>>> > 15:27:29,539 INFO  org.apache.flink.runtime.taskmanager.Task
>>> > - Attempting to fail task externally Source: Read Text File Source ->
>>> Flat
>>> > Map (24/25)
>>> > 15:27:29,539 INFO  org.apache.flink.runtime.taskmanager.Task
>>> > - Source: Read Text File Source -> Flat Map (24/25) switched to FAILED
>>> with
>>> > exception.
>>> > java.lang.Exception: TaskManager akka://flink/user/taskmanager
>>> disconnects
>>> > from JobManager akka.tcp://flink@10.155.208.156:6123/user/jobmanager:
>>> > JobManager is no longer reachable
>>> >         at
>>> >
>>> org.apache.flink.runtime.taskmanager.TaskManager.handleJobManagerDisconnect(TaskManager.scala:826)
>>> >         at
>>> >
>>> org.apache.flink.runtime.taskmanager.TaskManager$$anonfun$handleMessage$1.applyOrElse(TaskManager.scala:297)
>>> >         at
>>> >
>>> scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36)
>>> >         at
>>> >
>>> org.apache.flink.runtime.LeaderSessionMessageFilter$$anonfun$receive$1.applyOrElse(LeaderSessionMessageFilter.scala:44)
>>> >         at
>>> >
>>> scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36)
>>> >         at
>>> >
>>> org.apache.flink.runtime.LogMessages$$anon$1.apply(LogMessages.scala:28)
>>> >         at
>>> > scala.PartialFunction$class.applyOrElse(PartialFunction.scala:123)
>>> >         at
>>> >
>>> org.apache.flink.runtime.LogMessages$$anon$1.applyOrElse(LogMessages.scala:28)
>>> >         at akka.actor.Actor$class.aroundReceive(Actor.scala:465)
>>> >         at
>>> >
>>> org.apache.flink.runtime.taskmanager.TaskManager.aroundReceive(TaskManager.scala:119)
>>> >         at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516)
>>> >         at
>>> >
>>> akka.actor.dungeon.DeathWatch$class.receivedTerminated(DeathWatch.scala:46)
>>> >         at akka.actor.ActorCell.receivedTerminated(ActorCell.scala:369)
>>> >         at akka.actor.ActorCell.autoReceiveMessage(ActorCell.scala:501)
>>> >         at akka.actor.ActorCell.invoke(ActorCell.scala:486)
>>> >         at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:254)
>>> >         at akka.dispatch.Mailbox.run(Mailbox.scala:221)
>>> >         at akka.dispatch.Mailbox.exec(Mailbox.scala:231)
>>> >         at
>>> > scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
>>> >         at
>>> >
>>> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.pollAndExecAll(ForkJoinPool.java:1253)
>>> >         at
>>> >
>>> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1346)
>>> >         at
>>> >
>>> scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
>>> >         at
>>> >
>>> scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
>>> > 15:27:29,541 INFO  org.apache.flink.runtime.taskmanager.Task
>>> > - Triggering cancellation of task code Source: Read Text File Source
>>> -> Flat
>>> > Map (24/25) (96df0f3c0990c5a573c67a31a6207fa2).
>>> > 15:27:29,541 INFO  org.apache.flink.runtime.taskmanager.Task
>>> > - Attempting to fail task externally Keyed Aggregation -> Sink: Unnamed
>>> > (18/25)
>>> > 15:27:29,541 INFO  org.apache.flink.runtime.taskmanager.Task
>>> > - Task Keyed Aggregation -> Sink: Unnamed (18/25) is already in state
>>> FAILED
>>> > 15:27:29,542 INFO  org.apache.flink.runtime.taskmanager.Task
>>> > - Freeing task resources for Source: Read Text File Source -> Flat Map
>>> > (24/25)
>>> > 15:27:29,543 INFO  org.apache.flink.runtime.taskmanager.Task
>>> > - Freeing task resources for Source: Read Text File Source -> Flat Map
>>> > (13/25)
>>> > 15:27:29,551 INFO  org.apache.flink.runtime.taskmanager.TaskManager
>>> > - Disassociating from JobManager
>>> > 15:27:29,572 WARN  Remoting
>>> > - Tried to associate with unreachable remote address
>>> > [akka.tcp://flink@10.155.208.156:6123]. Address is now gated for 5000
>>> ms,
>>> > all messages to this address will be delivered to dead letters.
>>> Reason: The
>>> > remote system has a UID that has been quarantined. Association aborted.
>>> > 15:27:29,582 WARN  Remoting
>>> > - Tried to associate with unreachable remote address
>>> > [akka.tcp://flink@10.155.208.156:6123]. Address is now gated for 5000
>>> ms,
>>> > all messages to this address will be delivered to dead letters.
>>> Reason: The
>>> > remote system has quarantined this system. No further associations to
>>> the
>>> > remote system are possible until this system is restarted.
>>> > 15:27:29,587 WARN  Remoting
>>> > - Tried to associate with unreachable remote address
>>> > [akka.tcp://flink@10.155.208.156:6123]. Address is now gated for 5000
>>> ms,
>>> > all messages to this address will be delivered to dead letters.
>>> Reason: The
>>> > remote system has quarantined this system. No further associations to
>>> the
>>> > remote system are possible until this system is restarted.
>>> > 15:27:29,623 INFO
>>> org.apache.flink.runtime.io.network.netty.NettyClient
>>> > - Successful shutdown (took 17 ms).
>>> > 15:27:29,625 INFO
>>> org.apache.flink.runtime.io.network.netty.NettyServer
>>> > - Successful shutdown (took 2 ms).
>>> > 15:27:29,640 INFO  org.apache.flink.runtime.taskmanager.TaskManager
>>> > - Trying to register at JobManager
>>> > akka.tcp://flink@10.155.208.156:6123/user/jobmanager (attempt 1,
>>> timeout:
>>> > 500 milliseconds)
>>> > 15:27:30,157 INFO  org.apache.flink.runtime.taskmanager.TaskManager
>>> > - Trying to register at JobManager
>>> > akka.tcp://flink@10.155.208.156:6123/user/jobmanager (attempt 2,
>>> timeout:
>>> > 1000 milliseconds)
>>> > 15:27:31,177 INFO  org.apache.flink.runtime.taskmanager.TaskManager
>>> > - Trying to register at JobManager
>>> > akka.tcp://flink@10.155.208.156:6123/user/jobmanager (attempt 3,
>>> timeout:
>>> > 2000 milliseconds)
>>> > 15:27:33,193 INFO  org.apache.flink.runtime.taskmanager.TaskManager
>>> > - Trying to register at JobManager
>>> > akka.tcp://flink@10.155.208.156:6123/user/jobmanager (attempt 4,
>>> timeout:
>>> > 4000 milliseconds)
>>> > 15:27:37,203 INFO  org.apache.flink.runtime.taskmanager.TaskManager
>>> > - Trying to register at JobManager
>>> > akka.tcp://flink@10.155.208.156:6123/user/jobmanager (attempt 5,
>>> timeout:
>>> > 8000 milliseconds)
>>> > 15:27:37,218 WARN  Remoting
>>> > - Tried to associate with unreachable remote address
>>> > [akka.tcp://flink@10.155.208.156:6123]. Address is now gated for 5000
>>> ms,
>>> > all messages to this address will be delivered to dead letters.
>>> Reason: The
>>> > remote system has quarantined this system. No further associations to
>>> the
>>> > remote system are possible until this system is restarted.
>>> > 15:27:45,215 INFO  org.apache.flink.runtime.taskmanager.TaskManager
>>> > - Trying to register at JobManager
>>> > akka.tcp://flink@10.155.208.156:6123/user/jobmanager (attempt 6,
>>> timeout:
>>> > 16000 milliseconds)
>>> > 15:27:45,235 WARN  Remoting
>>> > - Tried to associate with unreachable remote address
>>> > [akka.tcp://flink@10.155.208.156:6123]. Address is now gated for 5000
>>> ms,
>>> > all messages to this address will be delivered to dead letters.
>>> Reason: The
>>> > remote system has quarantined this system. No further associations to
>>> the
>>> > remote system are possible until this system is restarted.
>>> > 15:28:01,223 INFO  org.apache.flink.runtime.taskmanager.TaskManager
>>> > - Trying to register at JobManager
>>> > akka.tcp://flink@10.155.208.156:6123/user/jobmanager (attempt 7,
>>> timeout: 30
>>> > seconds)
>>> > 15:28:01,237 WARN  Remoting
>>> > - Tried to associate with unreachable remote address
>>> > [akka.tcp://flink@10.155.208.156:6123]. Address is now gated for 5000
>>> ms,
>>> > all messages to this address will be delivered to dead letters.
>>> Reason: The
>>> > remote system has quarantined this system. No further associations to
>>> the
>>> > remote system are possible until this system is restarted.
>>> > 15:28:31,243 INFO  org.apache.flink.runtime.taskmanager.TaskManager
>>> > - Trying to register at JobManager
>>> > akka.tcp://flink@10.155.208.156:6123/user/jobmanager (attempt 8,
>>> timeout: 30
>>> > seconds)
>>> > 15:28:31,261 WARN  Remoting
>>> > - Tried to associate with unreachable remote address
>>> > [akka.tcp://flink@10.155.208.156:6123]. Address is now gated for 5000
>>> ms,
>>> > all messages to this address will be delivered to dead letters.
>>> Reason: The
>>> > remote system has quarantined this system. No further associations to
>>> the
>>> > remote system are possible until this system is restarted.
>>> > 15:28:45,960 WARN  Remoting
>>> > - Tried to associate with unreachable remote address
>>> > [akka.tcp://flink@10.155.208.156:6123]. Address is now gated for 5000
>>> ms,
>>> > all messages to this address will be delivered to dead letters.
>>> Reason: The
>>> > remote system has quarantined this system. No further associations to
>>> the
>>> > remote system are possible until this system is restarted.
>>> > 15:29:01,253 INFO  org.apache.flink.runtime.taskmanager.TaskManager
>>> > - Trying to register at JobManager
>>> > akka.tcp://flink@10.155.208.156:6123/user/jobmanager (attempt 9,
>>> timeout: 30
>>> > seconds)
>>> > 15:29:01,273 WARN  Remoting
>>> > - Tried to associate with unreachable remote address
>>> > [akka.tcp://flink@10.155.208.156:6123]. Address is now gated for 5000
>>> ms,
>>> > all messages to this address will be delivered to dead letters.
>>> Reason: The
>>> > remote system has quarantined this system. No further associations to
>>> the
>>> > remote system are possible until this system is restarted.
>>> > 15:30:11,545 INFO  org.apache.flink.runtime.taskmanager.TaskManager
>>> > - Trying to register at JobManager
>>> > akka.tcp://flink@10.155.208.156:6123/user/jobmanager (attempt 10,
>>> timeout:
>>> > 30 seconds)
>>> > 15:30:11,562 WARN  Remoting
>>> > - Tried to associate with unreachable remote address
>>> > [akka.tcp://flink@10.155.208.156:6123]. Address is now gated for 5000
>>> ms,
>>> > all messages to this address will be delivered to dead letters.
>>> Reason: The
>>> > remote system has quarantined this system. No further associations to
>>> the
>>> > remote system are possible until this system is restarted.
>>> > 15:30:25,958 WARN  Remoting
>>> > - Tried to associate with unreachable remote address
>>> > [akka.tcp://flink@10.155.208.156:6123]. Address is now gated for 5000
>>> ms,
>>> > all messages to this address will be delivered to dead letters.
>>> Reason: The
>>> > remote system has quarantined this system. No further associations to
>>> the
>>> > remote system are possible until this system is restarted.
>>> >
>>> > Could anyone explain what is the reason for the systems getting
>>> > disassociated and be quarantined?
>>> >
>>> > Kind Regards,
>>> > Ravinder Kaur
>>> >
>>>
>>
>>
>

Mime
View raw message