flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ravinder Kaur <neetu0...@gmail.com>
Subject Re: Streaming job failure due to loss of Taskmanagers
Date Wed, 23 Mar 2016 12:46:02 GMT
Hello Balaji,

I'm running Flink on a standalone cluster and not on hadoop, though I had
hadoop installed on these machines I do not use it. but that should not
cause any problem.

I have been successfully running Batch and Streaming jobs for a while now
before this issue occured. According to the logs I think it is related to
AKKA, but I'm not sure.

Kind Regards,
Ravinder Kaur

On Wed, Mar 23, 2016 at 1:06 PM, Balaji Rajagopalan <
balaji.rajagopalan@olacabs.com> wrote:

> Ravinder,
>   You could first try to fun standalone examples of flink without hadoop,
> and then try to run flink on hadoop. Your trouble seems to point to some
> hadoop cluster set up rather than anything to do with flink ?
>
> balaji
>
> On Wed, Mar 23, 2016 at 5:27 PM, Ravinder Kaur <neetu0404@gmail.com>
> wrote:
>
>> Hello All,
>>
>> After trying to debug, the log of jobmanager suggests that the failed
>> taskmanagers has stopped sending heartbeat messages. After this the TMs
>> were detected unreachable by the JM.
>>
>> If the actor system is dead why is it not restarted by the supervisor
>> during the Job? The TM is re-registered at the JM only after the job is
>> switched to failed. The failure is still unclear. Can someone help diagnose
>> the issue?
>>
>> Kind Regards,
>> Ravinder Kaur
>>
>>
>>
>>
>> On Mon, Mar 21, 2016 at 10:41 PM, Ravinder Kaur <neetu0404@gmail.com>
>> wrote:
>>
>>> Hi Ufuk,
>>>
>>> Here is the log of the JobManager
>>>
>>> 15:14:26,030 INFO
>>>  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed
>>> Aggregation -> Sink: Unnamed (19/25) (25cd0a9effb94d3c23fc4b35e45971e7)
>>> switched from SCHEDULED to DEPLOYING
>>> 15:14:26,030 INFO
>>>  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Deploying
>>> Keyed Aggregation -> Sink: Unnamed (19/25) (attempt #0) to vm-10-155-208-157
>>> 15:14:26,035 INFO
>>>  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed
>>> Aggregation -> Sink: Unnamed (20/25) (eab969f46a8ce40ff6544a1c39663455)
>>> switched from CREATED to SCHEDULED
>>> 15:14:26,036 INFO
>>>  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed
>>> Aggregation -> Sink: Unnamed (20/25) (eab969f46a8ce40ff6544a1c39663455)
>>> switched from SCHEDULED to DEPLOYING
>>> 15:14:26,036 INFO
>>>  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Deploying
>>> Keyed Aggregation -> Sink: Unnamed (20/25) (attempt #0) to vm-10-155-208-157
>>> 15:14:26,039 INFO
>>>  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed
>>> Aggregation -> Sink: Unnamed (18/25) (382440047bd2ff051c51e546dad90ea9)
>>> switched from SCHEDULED to DEPLOYING
>>> 15:14:26,039 INFO
>>>  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Deploying
>>> Keyed Aggregation -> Sink: Unnamed (18/25) (attempt #0) to slave2
>>> 15:14:26,082 INFO
>>>  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed
>>> Aggregation -> Sink: Unnamed (6/25) (5522c46bb47fe0b0005833a25e9c9e72)
>>> switched from DEPLOYING to RUNNING
>>> 15:14:26,091 INFO
>>>  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed
>>> Aggregation -> Sink: Unnamed (5/25) (edbb3ee3eb7c0a1977b3dba20aee1d7f)
>>> switched from DEPLOYING to RUNNING
>>> 15:14:26,093 INFO
>>>  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed
>>> Aggregation -> Sink: Unnamed (7/25) (b51d5cce7e2664bab04aa41d649cb632)
>>> switched from DEPLOYING to RUNNING
>>> 15:14:26,094 INFO
>>>  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed
>>> Aggregation -> Sink: Unnamed (8/25) (634b686958f682aca02bee1c01a978c2)
>>> switched from DEPLOYING to RUNNING
>>> 15:14:26,130 INFO
>>>  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed
>>> Aggregation -> Sink: Unnamed (21/25) (63fd4446c027aa10248f828871201a0d)
>>> switched from CREATED to SCHEDULED
>>> 15:14:26,142 INFO
>>>  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed
>>> Aggregation -> Sink: Unnamed (22/25) (aa668d03252e7c9cd70e78399002fc31)
>>> switched from CREATED to SCHEDULED
>>> 15:14:26,147 INFO
>>>  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed
>>> Aggregation -> Sink: Unnamed (21/25) (63fd4446c027aa10248f828871201a0d)
>>> switched from SCHEDULED to DEPLOYING
>>> 15:14:26,148 INFO
>>>  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Deploying
>>> Keyed Aggregation -> Sink: Unnamed (21/25) (attempt #0) to vm-10-155-208-157
>>> 15:14:26,168 INFO
>>>  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed
>>> Aggregation -> Sink: Unnamed (22/25) (aa668d03252e7c9cd70e78399002fc31)
>>> switched from SCHEDULED to DEPLOYING
>>> 15:14:26,168 INFO
>>>  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Deploying
>>> Keyed Aggregation -> Sink: Unnamed (22/25) (attempt #0) to vm-10-155-208-135
>>> 15:14:26,169 INFO
>>>  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed
>>> Aggregation -> Sink: Unnamed (23/25) (a35248ddf4ac4b296464ace643154664)
>>> switched from CREATED to SCHEDULED
>>> 15:14:26,171 INFO
>>>  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed
>>> Aggregation -> Sink: Unnamed (23/25) (a35248ddf4ac4b296464ace643154664)
>>> switched from SCHEDULED to DEPLOYING
>>> 15:14:26,171 INFO
>>>  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Deploying
>>> Keyed Aggregation -> Sink: Unnamed (23/25) (attempt #0) to vm-10-155-208-135
>>> 15:14:26,177 INFO
>>>  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed
>>> Aggregation -> Sink: Unnamed (24/25) (1d8b303a7cf313e110812336e98d577c)
>>> switched from CREATED to SCHEDULED
>>> 15:14:26,192 INFO
>>>  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed
>>> Aggregation -> Sink: Unnamed (24/25) (1d8b303a7cf313e110812336e98d577c)
>>> switched from SCHEDULED to DEPLOYING
>>> 15:14:26,192 INFO
>>>  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Deploying
>>> Keyed Aggregation -> Sink: Unnamed (24/25) (attempt #0) to vm-10-155-208-135
>>> 15:14:26,247 INFO
>>>  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed
>>> Aggregation -> Sink: Unnamed (25/25) (bf8593ce3db7957427ad8e15e29fb975)
>>> switched from CREATED to SCHEDULED
>>> 15:14:26,259 INFO
>>>  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed
>>> Aggregation -> Sink: Unnamed (25/25) (bf8593ce3db7957427ad8e15e29fb975)
>>> switched from SCHEDULED to DEPLOYING
>>> 15:14:26,260 INFO
>>>  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Deploying
>>> Keyed Aggregation -> Sink: Unnamed (25/25) (attempt #0) to vm-10-155-208-135
>>> 15:14:26,520 INFO
>>>  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed
>>> Aggregation -> Sink: Unnamed (9/25) (46f7e3af97a70041ad878ca24992cc0a)
>>> switched from DEPLOYING to RUNNING
>>> 15:14:26,530 INFO
>>>  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed
>>> Aggregation -> Sink: Unnamed (10/25) (528fee08fc63c38ddd9ef0347cee570e)
>>> switched from DEPLOYING to RUNNING
>>> 15:14:26,530 INFO
>>>  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed
>>> Aggregation -> Sink: Unnamed (12/25) (d6eeb17c2105a091e42aeae7f3587fdd)
>>> switched from DEPLOYING to RUNNING
>>> 15:14:26,739 INFO
>>>  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed
>>> Aggregation -> Sink: Unnamed (11/25) (1ad15af7f337f751eeae6e9ad38bb062)
>>> switched from DEPLOYING to RUNNING
>>> 15:14:26,740 INFO
>>>  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed
>>> Aggregation -> Sink: Unnamed (15/25) (a65d4b2835d31143889e386a8704334d)
>>> switched from DEPLOYING to RUNNING
>>> 15:14:26,747 INFO
>>>  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed
>>> Aggregation -> Sink: Unnamed (24/25) (1d8b303a7cf313e110812336e98d577c)
>>> switched from DEPLOYING to RUNNING
>>> 15:14:26,748 INFO
>>>  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed
>>> Aggregation -> Sink: Unnamed (22/25) (aa668d03252e7c9cd70e78399002fc31)
>>> switched from DEPLOYING to RUNNING
>>> 15:14:26,748 INFO
>>>  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed
>>> Aggregation -> Sink: Unnamed (13/25) (1fe93db943e87964f4e379b6f4f8878d)
>>> switched from DEPLOYING to RUNNING
>>> 15:14:26,749 INFO
>>>  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed
>>> Aggregation -> Sink: Unnamed (14/25) (564818d5f16b250aaee285b81320c7ea)
>>> switched from DEPLOYING to RUNNING
>>> 15:14:26,755 INFO
>>>  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed
>>> Aggregation -> Sink: Unnamed (17/25) (bc78671ffc663b8f43c8a46beb3a0d5c)
>>> switched from DEPLOYING to RUNNING
>>> 15:14:26,775 INFO
>>>  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed
>>> Aggregation -> Sink: Unnamed (23/25) (a35248ddf4ac4b296464ace643154664)
>>> switched from DEPLOYING to RUNNING
>>> 15:14:26,776 INFO
>>>  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed
>>> Aggregation -> Sink: Unnamed (16/25) (92b86c8f173898de99c02b2a4613a4a2)
>>> switched from DEPLOYING to RUNNING
>>> 15:14:26,776 INFO
>>>  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed
>>> Aggregation -> Sink: Unnamed (18/25) (382440047bd2ff051c51e546dad90ea9)
>>> switched from DEPLOYING to RUNNING
>>> 15:14:26,825 INFO
>>>  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed
>>> Aggregation -> Sink: Unnamed (25/25) (bf8593ce3db7957427ad8e15e29fb975)
>>> switched from DEPLOYING to RUNNING
>>> 15:14:27,099 INFO
>>>  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed
>>> Aggregation -> Sink: Unnamed (21/25) (63fd4446c027aa10248f828871201a0d)
>>> switched from DEPLOYING to RUNNING
>>> 15:14:27,102 INFO
>>>  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed
>>> Aggregation -> Sink: Unnamed (19/25) (25cd0a9effb94d3c23fc4b35e45971e7)
>>> switched from DEPLOYING to RUNNING
>>> 15:14:27,103 INFO
>>>  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed
>>> Aggregation -> Sink: Unnamed (20/25) (eab969f46a8ce40ff6544a1c39663455)
>>> switched from DEPLOYING to RUNNING
>>> 15:22:36,119 INFO
>>>  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Source:
>>> Read Text File Source -> Flat Map (1/25) (7b81553dbfad2211d6860cff187b0fd4)
>>> switched from RUNNING to FINISHED
>>> 15:22:43,697 INFO
>>>  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Source:
>>> Read Text File Source -> Flat Map (8/25) (dbcf94270f3b6c748cd7fb42d62087d0)
>>> switched from RUNNING to FINISHED
>>> 15:22:46,998 INFO
>>>  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Source:
>>> Read Text File Source -> Flat Map (15/25)
>>> (5a9126d43ee0622f7fc24b541c4da568) switched from RUNNING to FINISHED
>>> 15:23:00,969 INFO
>>>  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Source:
>>> Read Text File Source -> Flat Map (2/25) (d08cae46dcb5a9dc621161ec1c80a79e)
>>> switched from RUNNING to FINISHED
>>> 15:23:25,615 INFO
>>>  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Source:
>>> Read Text File Source -> Flat Map (9/25) (9011ead6d1309e3d5b957a6f827ee58d)
>>> switched from RUNNING to FINISHED
>>> 15:23:26,734 INFO
>>>  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Source:
>>> Read Text File Source -> Flat Map (16/25)
>>> (a490f46d6eaac7de26e2461f81246559) switched from RUNNING to FINISHED
>>> 15:24:04,736 INFO
>>>  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Source:
>>> Read Text File Source -> Flat Map (10/25)
>>> (f7659a6f39daa8831458e3bd612ad16a) switched from RUNNING to FINISHED
>>> 15:24:16,436 INFO
>>>  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Source:
>>> Read Text File Source -> Flat Map (17/25)
>>> (688b6dc1ed6c0bed284b751bc45bff6d) switched from RUNNING to FINISHED
>>> 15:24:17,705 INFO
>>>  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Source:
>>> Read Text File Source -> Flat Map (3/25) (004de7884dc33d95e79337ad05774cf8)
>>> switched from RUNNING to FINISHED
>>> 15:27:05,900 WARN  akka.remote.RemoteWatcher
>>>         - Detected unreachable: [akka.tcp://flink@10.155.208.136:42624]
>>> 15:27:05,913 INFO  org.apache.flink.runtime.jobmanager.JobManager
>>>          - Task manager akka.tcp://
>>> flink@10.155.208.136:42624/user/taskmanager terminated.
>>> 15:27:05,913 INFO
>>>  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Source:
>>> Read Text File Source -> Flat Map (20/25)
>>> (9cd65cbf15aafe0241b9d59e48b79f52) switched from RUNNING to FAILED
>>> 15:27:05,930 INFO
>>>  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Source:
>>> Read Text File Source -> Flat Map (4/25) (b1fc9f23faa6275ddd14d4537bceb7ef)
>>> switched from RUNNING to CANCELING
>>> 15:27:05,936 INFO
>>>  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Source:
>>> Read Text File Source -> Flat Map (5/25) (4a1547a3bc7ef7eaffa682f3cc255b40)
>>> switched from RUNNING to CANCELING
>>> 15:27:05,937 INFO
>>>  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Source:
>>> Read Text File Source -> Flat Map (6/25) (2151d07383014506c5f1b0283b1986de)
>>> switched from RUNNING to CANCELING
>>> 15:27:05,937 INFO
>>>  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Source:
>>> Read Text File Source -> Flat Map (7/25) (e5dede629edfb80e8342168c342c5a70)
>>> switched from RUNNING to CANCELING
>>> 15:27:05,937 INFO
>>>  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Source:
>>> Read Text File Source -> Flat Map (11/25)
>>> (e1179250546953f4199626d73c3062dd) switched from RUNNING to CANCELING
>>> 15:27:05,937 INFO
>>>  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Source:
>>> Read Text File Source -> Flat Map (12/25)
>>> (1b8401bd1721755149014feeb3366d91) switched from RUNNING to CANCELING
>>> 15:27:05,937 INFO
>>>  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Source:
>>> Read Text File Source -> Flat Map (13/25)
>>> (a6a0f05e0362dbf0505c1439893e53e4) switched from RUNNING to CANCELING
>>> 15:27:05,937 INFO
>>>  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Source:
>>> Read Text File Source -> Flat Map (13/25)
>>> (a6a0f05e0362dbf0505c1439893e53e4) switched from RUNNING to CANCELING
>>> 15:27:05,937 INFO
>>>  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Source:
>>> Read Text File Source -> Flat Map (14/25)
>>> (d71366c16e0d903b14b86e71ba9ba969) switched from RUNNING to CANCELING
>>> 15:27:05,937 INFO
>>>  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Source:
>>> Read Text File Source -> Flat Map (18/25)
>>> (6d4e73f0e2b9cc164a464610a4976505) switched from RUNNING to CANCELING
>>> 15:27:05,938 INFO
>>>  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Source:
>>> Read Text File Source -> Flat Map (19/25)
>>> (332523796c07c7a8e14a6acc0d651538) switched from RUNNING to CANCELING
>>> 15:27:05,938 INFO
>>>  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Source:
>>> Read Text File Source -> Flat Map (21/25)
>>> (09bbdc8b4cb21d1b8a4ec5dc94fbf54f) switched from RUNNING to CANCELING
>>> 15:27:05,938 INFO
>>>  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Source:
>>> Read Text File Source -> Flat Map (22/25)
>>> (1573e487f0ed3ffd648ee460c7b78e64) switched from RUNNING to CANCELING
>>> 15:27:05,938 INFO
>>>  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Source:
>>> Read Text File Source -> Flat Map (23/25)
>>> (7fef73c6ffbefd45f0d57007f7d85977) switched from RUNNING to CANCELING
>>> 15:27:05,938 INFO
>>>  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Source:
>>> Read Text File Source -> Flat Map (24/25)
>>> (96df0f3c0990c5a573c67a31a6207fa2) switched from RUNNING to CANCELING
>>> 15:27:05,938 INFO
>>>  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Source:
>>> Read Text File Source -> Flat Map (25/25)
>>> (d7c01a3fc5049263833d33457d57f6b4) switched from RUNNING to CANCELING
>>> 15:27:05,939 INFO
>>>  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed
>>> Aggregation -> Sink: Unnamed (1/25) (5455ef7528959e20d37d497038536dd7)
>>> switched from RUNNING to CANCELING
>>> 15:27:05,939 INFO
>>>  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed
>>> Aggregation -> Sink: Unnamed (2/25) (680cbf0c9c3de3d7610747298f2de3af)
>>> switched from RUNNING to CANCELING
>>> 15:27:05,939 INFO
>>>  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed
>>> Aggregation -> Sink: Unnamed (3/25) (a0a635a93364f40fb666bb8a5cd66907)
>>> switched from RUNNING to CANCELING
>>> 15:27:05,939 INFO
>>>  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed
>>> Aggregation -> Sink: Unnamed (4/25) (6f50674481dc0ff1e9ff49cde5f3cd4b)
>>> switched from RUNNING to CANCELING
>>> 15:27:05,939 INFO
>>>  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed
>>> Aggregation -> Sink: Unnamed (5/25) (edbb3ee3eb7c0a1977b3dba20aee1d7f)
>>> switched from RUNNING to CANCELING
>>> 15:27:05,939 INFO
>>>  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed
>>> Aggregation -> Sink: Unnamed (6/25) (5522c46bb47fe0b0005833a25e9c9e72)
>>> switched from RUNNING to CANCELING
>>> 15:27:05,939 INFO
>>>  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed
>>> Aggregation -> Sink: Unnamed (7/25) (b51d5cce7e2664bab04aa41d649cb632)
>>> switched from RUNNING to CANCELING
>>> 15:27:05,940 INFO
>>>  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed
>>> Aggregation -> Sink: Unnamed (8/25) (634b686958f682aca02bee1c01a978c2)
>>> switched from RUNNING to CANCELING
>>> 15:27:05,940 INFO
>>>  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed
>>> Aggregation -> Sink: Unnamed (9/25) (46f7e3af97a70041ad878ca24992cc0a)
>>> switched from RUNNING to CANCELING
>>> 15:27:05,940 INFO
>>>  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed
>>> Aggregation -> Sink: Unnamed (10/25) (528fee08fc63c38ddd9ef0347cee570e)
>>> switched from RUNNING to CANCELING
>>> 15:27:05,940 INFO
>>>  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed
>>> Aggregation -> Sink: Unnamed (11/25) (1ad15af7f337f751eeae6e9ad38bb062)
>>> switched from RUNNING to CANCELING
>>> 15:27:05,940 INFO
>>>  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed
>>> Aggregation -> Sink: Unnamed (12/25) (d6eeb17c2105a091e42aeae7f3587fdd)
>>> switched from RUNNING to CANCELING
>>> 15:27:05,940 INFO
>>>  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed
>>> Aggregation -> Sink: Unnamed (13/25) (1fe93db943e87964f4e379b6f4f8878d)
>>> switched from RUNNING to CANCELING
>>> 15:27:05,940 INFO
>>>  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed
>>> Aggregation -> Sink: Unnamed (14/25) (564818d5f16b250aaee285b81320c7ea)
>>> switched from RUNNING to CANCELING
>>> 15:27:05,941 INFO
>>>  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed
>>> Aggregation -> Sink: Unnamed (15/25) (a65d4b2835d31143889e386a8704334d)
>>> switched from RUNNING to CANCELING
>>> 15:27:05,941 INFO
>>>  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed
>>> Aggregation -> Sink: Unnamed (16/25) (92b86c8f173898de99c02b2a4613a4a2)
>>> switched from RUNNING to CANCELING
>>> 15:27:05,941 INFO
>>>  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed
>>> Aggregation -> Sink: Unnamed (17/25) (bc78671ffc663b8f43c8a46beb3a0d5c)
>>> switched from RUNNING to CANCELING
>>> 15:27:05,941 INFO
>>>  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed
>>> Aggregation -> Sink: Unnamed (18/25) (382440047bd2ff051c51e546dad90ea9)
>>> switched from RUNNING to CANCELING
>>> 15:27:05,941 INFO
>>>  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed
>>> Aggregation -> Sink: Unnamed (19/25) (25cd0a9effb94d3c23fc4b35e45971e7)
>>> switched from RUNNING to CANCELING
>>> 15:27:05,941 INFO
>>>  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed
>>> Aggregation -> Sink: Unnamed (20/25) (eab969f46a8ce40ff6544a1c39663455)
>>> switched from RUNNING to CANCELING
>>> 15:27:05,942 INFO
>>>  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed
>>> Aggregation -> Sink: Unnamed (21/25) (63fd4446c027aa10248f828871201a0d)
>>> switched from RUNNING to CANCELING
>>> 15:27:05,942 INFO
>>>  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed
>>> Aggregation -> Sink: Unnamed (22/25) (aa668d03252e7c9cd70e78399002fc31)
>>> switched from RUNNING to CANCELING
>>> 15:27:05,942 INFO
>>>  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed
>>> Aggregation -> Sink: Unnamed (23/25) (a35248ddf4ac4b296464ace643154664)
>>> switched from RUNNING to CANCELING
>>> 15:27:05,942 INFO
>>>  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed
>>> Aggregation -> Sink: Unnamed (24/25) (1d8b303a7cf313e110812336e98d577c)
>>> switched from RUNNING to CANCELING
>>> 15:27:05,942 INFO
>>>  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed
>>> Aggregation -> Sink: Unnamed (25/25) (bf8593ce3db7957427ad8e15e29fb975)
>>> switched from RUNNING to CANCELING
>>> 15:27:05,943 INFO
>>>  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed
>>> Aggregation -> Sink: Unnamed (16/25) (92b86c8f173898de99c02b2a4613a4a2)
>>> switched from CANCELING to FAILED
>>> 15:27:05,943 INFO
>>>  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Source:
>>> Read Text File Source -> Flat Map (24/25)
>>> (96df0f3c0990c5a573c67a31a6207fa2) switched from CANCELING to FAILED
>>> 15:27:05,944 INFO
>>>  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed
>>> Aggregation -> Sink: Unnamed (15/25) (a65d4b2835d31143889e386a8704334d)
>>> switched from CANCELING to FAILED
>>> 15:27:05,944 INFO
>>>  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed
>>> Aggregation -> Sink: Unnamed (17/25) (bc78671ffc663b8f43c8a46beb3a0d5c)
>>> switched from CANCELING to FAILED
>>> 15:27:05,944 INFO
>>>  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Source:
>>> Read Text File Source -> Flat Map (13/25)
>>> (a6a0f05e0362dbf0505c1439893e53e4) switched from CANCELING to FAILED
>>> 15:27:05,945 INFO
>>>  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Source:
>>> Read Text File Source -> Flat Map (6/25) (2151d07383014506c5f1b0283b1986de)
>>> switched from CANCELING to FAILED
>>> 15:27:05,945 INFO
>>>  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed
>>> Aggregation -> Sink: Unnamed (18/25) (382440047bd2ff051c51e546dad90ea9)
>>> switched from CANCELING to FAILED
>>> 15:27:05,945 INFO  org.apache.flink.runtime.instance.InstanceManager
>>>         - Unregistered task manager akka.tcp://
>>> flink@10.155.208.136:42624/user/taskmanager. Number of registered task
>>> managers 6. Number of available slots 21.
>>> 15:27:05,947 INFO  org.apache.flink.runtime.jobmanager.JobManager
>>>          - Status of job 6b4b496acabca01aa6690fa6b156b184 (WordCount from
>>> SocketTextStream Example) changed to FAILING.
>>> java.lang.Exception: The slot in which the task was executed has been
>>> released. Probably loss of TaskManager 07b034c041457fb1cb9eba017be8408e @
>>> slave2 - 4 slots - URL: akka.tcp://
>>> flink@10.155.208.136:42624/user/taskmanager
>>>         at
>>> org.apache.flink.runtime.instance.SimpleSlot.releaseSlot(SimpleSlot.java:153)
>>>         at
>>> org.apache.flink.runtime.instance.SlotSharingGroupAssignment.releaseSharedSlot(SlotSharingGroupAssignment.java:547)
>>>         at
>>> org.apache.flink.runtime.instance.SharedSlot.releaseSlot(SharedSlot.java:119)
>>>         at
>>> org.apache.flink.runtime.instance.Instance.markDead(Instance.java:156)
>>>         at
>>> org.apache.flink.runtime.instance.InstanceManager.unregisterTaskManager(InstanceManager.java:215)
>>>         at
>>> org.apache.flink.runtime.jobmanager.JobManager$$anonfun$handleMessage$1.applyOrElse(JobManager.scala:696)
>>>         at
>>> scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36)
>>>         at
>>> org.apache.flink.runtime.LeaderSessionMessageFilter$$anonfun$receive$1.applyOrElse(LeaderSessionMessageFilter.scala:44)
>>>         at
>>> scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36)
>>>         at
>>> org.apache.flink.runtime.LogMessages$$anon$1.apply(LogMessages.scala:33)
>>>         at
>>> org.apache.flink.runtime.LogMessages$$anon$1.apply(LogMessages.scala:28)
>>>         at
>>> scala.PartialFunction$class.applyOrElse(PartialFunction.scala:123)
>>>         at
>>> org.apache.flink.runtime.LogMessages$$anon$1.applyOrElse(LogMessages.scala:28)
>>>         at akka.actor.Actor$class.aroundReceive(Actor.scala:465)
>>>         at
>>> org.apache.flink.runtime.jobmanager.JobManager.aroundReceive(JobManager.scala:100)
>>>         at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516)
>>>         at
>>> akka.actor.dungeon.DeathWatch$class.receivedTerminated(DeathWatch.scala:46)
>>>         at akka.actor.ActorCell.receivedTerminated(ActorCell.scala:369)
>>>         at akka.actor.ActorCell.autoReceiveMessage(ActorCell.scala:501)
>>>         at akka.actor.ActorCell.invoke(ActorCell.scala:486)
>>>         at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:254)
>>>         at akka.dispatch.Mailbox.run(Mailbox.scala:221)
>>>         at akka.dispatch.Mailbox.exec(Mailbox.scala:231)
>>>         at
>>> scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
>>>         at
>>> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
>>>  at
>>> scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
>>>         at
>>> scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
>>> 15:27:06,020 INFO
>>>  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed
>>> Aggregation -> Sink: Unnamed (9/25) (46f7e3af97a70041ad878ca24992cc0a)
>>> switched from CANCELING to CANCELED
>>> 15:27:06,020 INFO
>>>  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed
>>> Aggregation -> Sink: Unnamed (10/25) (528fee08fc63c38ddd9ef0347cee570e)
>>> switched from CANCELING to CANCELED
>>> 15:27:06,021 INFO
>>>  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed
>>> Aggregation -> Sink: Unnamed (12/25) (d6eeb17c2105a091e42aeae7f3587fdd)
>>> switched from CANCELING to CANCELED
>>> 15:27:06,030 INFO
>>>  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed
>>> Aggregation -> Sink: Unnamed (19/25) (25cd0a9effb94d3c23fc4b35e45971e7)
>>> switched from CANCELING to CANCELED
>>> 15:27:06,031 INFO
>>>  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed
>>> Aggregation -> Sink: Unnamed (20/25) (eab969f46a8ce40ff6544a1c39663455)
>>> switched from CANCELING to CANCELED
>>> 15:27:06,033 INFO
>>>  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed
>>> Aggregation -> Sink: Unnamed (21/25) (63fd4446c027aa10248f828871201a0d)
>>> switched from CANCELING to CANCELED
>>> 15:27:06,034 INFO
>>>  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed
>>> Aggregation -> Sink: Unnamed (11/25) (1ad15af7f337f751eeae6e9ad38bb062)
>>> switched from CANCELING to CANCELED
>>> 15:27:06,034 INFO
>>>  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed
>>> Aggregation -> Sink: Unnamed (13/25) (1fe93db943e87964f4e379b6f4f8878d)
>>> switched from CANCELING to CANCELED
>>> 15:27:06,035 INFO
>>>  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed
>>> Aggregation -> Sink: Unnamed (14/25) (564818d5f16b250aaee285b81320c7ea)
>>> switched from CANCELING to CANCELED
>>> 15:27:06,898 INFO  org.apache.flink.runtime.jobmanager.JobManager
>>>          - Task manager akka.tcp://
>>> flink@10.155.208.137:56659/user/taskmanager terminated.
>>> 15:27:06,899 INFO
>>>  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Source:
>>> Read Text File Source -> Flat Map (19/25)
>>> (332523796c07c7a8e14a6acc0d651538) switched from CANCELING to FAILED
>>> 15:27:06,900 INFO
>>>  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed
>>> Aggregation -> Sink: Unnamed (2/25) (680cbf0c9c3de3d7610747298f2de3af)
>>> switched from CANCELING to FAILED
>>> 15:27:06,897 WARN  akka.remote.RemoteWatcher
>>>         - Detected unreachable: [akka.tcp://flink@10.155.208.137:56659]
>>> 15:27:06,902 WARN  akka.remote.RemoteWatcher
>>>         - Detected unreachable: [akka.tcp://flink@10.155.208.135:56307]
>>> 15:27:06,902 WARN  akka.remote.RemoteWatcher
>>>         - Detected unreachable: [akka.tcp://flink@10.155.208.138:55717]
>>> 15:27:06,902 INFO
>>>  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Source:
>>> Read Text File Source -> Flat Map (12/25)
>>> (1b8401bd1721755149014feeb3366d91) switched from CANCELING to FAILED
>>> 15:27:06,904 INFO
>>>  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed
>>> Aggregation -> Sink: Unnamed (3/25) (a0a635a93364f40fb666bb8a5cd66907)
>>> switched from CANCELING to FAILED
>>> 15:27:06,905 INFO
>>>  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Source:
>>> Read Text File Source -> Flat Map (23/25)
>>> (7fef73c6ffbefd45f0d57007f7d85977) switched from CANCELING to FAILED
>>> 15:27:06,906 INFO
>>>  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed
>>> Aggregation -> Sink: Unnamed (1/25) (5455ef7528959e20d37d497038536dd7)
>>> switched from CANCELING to FAILED
>>> 15:27:06,916 INFO
>>>  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Source:
>>> Read Text File Source -> Flat Map (5/25) (4a1547a3bc7ef7eaffa682f3cc255b40)
>>> switched from CANCELING to FAILED
>>> 15:27:06,917 INFO
>>>  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed
>>> Aggregation -> Sink: Unnamed (4/25) (6f50674481dc0ff1e9ff49cde5f3cd4b)
>>> switched from CANCELING to FAILED
>>> 15:27:06,917 INFO  org.apache.flink.runtime.instance.InstanceManager
>>>         - Unregistered task manager akka.tcp://
>>> flink@10.155.208.137:56659/user/taskmanager. Number of registered task
>>> managers 5. Number of available slots 17.
>>> 15:27:06,918 INFO  org.apache.flink.runtime.jobmanager.JobManager
>>>          - Task manager akka.tcp://
>>> flink@10.155.208.135:56307/user/taskmanager terminated.
>>> 15:27:06,918 INFO
>>>  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Source:
>>> Read Text File Source -> Flat Map (4/25) (b1fc9f23faa6275ddd14d4537bceb7ef)
>>> switched from CANCELING to FAILED
>>> 15:27:06,918 INFO
>>>  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed
>>> Aggregation -> Sink: Unnamed (25/25) (bf8593ce3db7957427ad8e15e29fb975)
>>> switched from CANCELING to FAILED
>>> 15:27:06,920 INFO
>>>  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Source:
>>> Read Text File Source -> Flat Map (11/25)
>>> (e1179250546953f4199626d73c3062dd) switched from CANCELING to FAILED
>>> 15:27:06,924 INFO
>>>  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed
>>> Aggregation -> Sink: Unnamed (24/25) (1d8b303a7cf313e110812336e98d577c)
>>> switched from CANCELING to FAILED
>>> 15:27:06,926 INFO
>>>  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Source:
>>> Read Text File Source -> Flat Map (18/25)
>>> (6d4e73f0e2b9cc164a464610a4976505) switched from CANCELING to FAILED
>>> 15:27:06,927 INFO
>>>  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed
>>> Aggregation -> Sink: Unnamed (23/25) (a35248ddf4ac4b296464ace643154664)
>>> switched from CANCELING to FAILED
>>> 15:27:06,929 INFO
>>>  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed
>>> Aggregation -> Sink: Unnamed (22/25) (aa668d03252e7c9cd70e78399002fc31)
>>> switched from CANCELING to FAILED
>>> 15:27:06,930 INFO
>>>  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Source:
>>> Read Text File Source -> Flat Map (22/25)
>>> (1573e487f0ed3ffd648ee460c7b78e64) switched from CANCELING to FAILED
>>> 15:27:06,931 INFO  org.apache.flink.runtime.instance.InstanceManager
>>>         - Unregistered task manager akka.tcp://
>>> flink@10.155.208.135:56307/user/taskmanager. Number of registered task
>>> managers 4. Number of available slots 13.
>>> 15:27:06,931 INFO  org.apache.flink.runtime.jobmanager.JobManager
>>>          - Task manager akka.tcp://
>>> flink@10.155.208.138:55717/user/taskmanager terminated.
>>> 15:27:06,932 INFO
>>>  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Source:
>>> Read Text File Source -> Flat Map (14/25)
>>> (d71366c16e0d903b14b86e71ba9ba969) switched from CANCELING to FAILED
>>> 15:27:06,933 INFO
>>>  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed
>>> Aggregation -> Sink: Unnamed (7/25) (b51d5cce7e2664bab04aa41d649cb632)
>>> switched from CANCELING to FAILED
>>> 15:27:06,934 INFO
>>>  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed
>>> Aggregation -> Sink: Unnamed (6/25) (5522c46bb47fe0b0005833a25e9c9e72)
>>> switched from CANCELING to FAILED
>>> 15:27:06,935 INFO
>>>  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Source:
>>> Read Text File Source -> Flat Map (21/25)
>>> (09bbdc8b4cb21d1b8a4ec5dc94fbf54f) switched from CANCELING to FAILED
>>> 15:27:06,936 INFO
>>>  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Source:
>>> Read Text File Source -> Flat Map (7/25) (e5dede629edfb80e8342168c342c5a70)
>>> switched from CANCELING to FAILED
>>> 15:27:06,937 INFO
>>>  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed
>>> Aggregation -> Sink: Unnamed (8/25) (634b686958f682aca02bee1c01a978c2)
>>> switched from CANCELING to FAILED
>>> 15:27:06,938 INFO
>>>  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Source:
>>> Read Text File Source -> Flat Map (25/25)
>>> (d7c01a3fc5049263833d33457d57f6b4) switched from CANCELING to FAILED
>>> 15:27:06,939 INFO
>>>  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed
>>> Aggregation -> Sink: Unnamed (5/25) (edbb3ee3eb7c0a1977b3dba20aee1d7f)
>>> switched from CANCELING to FAILED
>>> 15:27:06,940 INFO  org.apache.flink.runtime.instance.InstanceManager
>>>         - Unregistered task manager akka.tcp://
>>> flink@10.155.208.138:55717/user/taskmanager. Number of registered task
>>> managers 3. Number of available slots 9.
>>> 15:27:06,940 INFO  org.apache.flink.runtime.jobmanager.JobManager
>>>          - Status of job 6b4b496acabca01aa6690fa6b156b184 (WordCount from
>>> SocketTextStream Example) changed to FAILED.
>>> java.lang.Exception: The slot in which the task was executed has been
>>> released. Probably loss of TaskManager 07b034c041457fb1cb9eba017be8408e @
>>> slave2 - 4 slots - URL: akka.tcp://
>>> flink@10.155.208.136:42624/user/taskmanager
>>>         at
>>> org.apache.flink.runtime.instance.SimpleSlot.releaseSlot(SimpleSlot.java:153)
>>>         at
>>> org.apache.flink.runtime.instance.SlotSharingGroupAssignment.releaseSharedSlot(SlotSharingGroupAssignment.java:547)
>>>         at
>>> org.apache.flink.runtime.instance.SharedSlot.releaseSlot(SharedSlot.java:119)
>>>         at
>>> org.apache.flink.runtime.instance.Instance.markDead(Instance.java:156)
>>>         at
>>> org.apache.flink.runtime.instance.InstanceManager.unregisterTaskManager(InstanceManager.java:215)
>>>         at
>>> org.apache.flink.runtime.jobmanager.JobManager$$anonfun$handleMessage$1.applyOrElse(JobManager.scala:696)
>>>         at
>>> scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36)
>>>         at
>>> org.apache.flink.runtime.LeaderSessionMessageFilter$$anonfun$receive$1.applyOrElse(LeaderSessionMessageFilter.scala:44)
>>>         at
>>> scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36)
>>>         at
>>> org.apache.flink.runtime.LogMessages$$anon$1.apply(LogMessages.scala:33)
>>>         at
>>> org.apache.flink.runtime.LogMessages$$anon$1.apply(LogMessages.scala:28)
>>>         at
>>> scala.PartialFunction$class.applyOrElse(PartialFunction.scala:123)
>>>         at
>>> org.apache.flink.runtime.LogMessages$$anon$1.applyOrElse(LogMessages.scala:28)
>>>         at akka.actor.Actor$class.aroundReceive(Actor.scala:465)
>>>         at
>>> org.apache.flink.runtime.jobmanager.JobManager.aroundReceive(JobManager.scala:100)
>>>         at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516)
>>>         at
>>> akka.actor.dungeon.DeathWatch$class.receivedTerminated(DeathWatch.scala:46)
>>>         at akka.actor.ActorCell.receivedTerminated(ActorCell.scala:369)
>>>         at akka.actor.ActorCell.autoReceiveMessage(ActorCell.scala:501)
>>>         at akka.actor.ActorCell.invoke(ActorCell.scala:486)
>>>         at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:254)
>>>         at akka.dispatch.Mailbox.run(Mailbox.scala:221)
>>>         at akka.dispatch.Mailbox.exec(Mailbox.scala:231)
>>>         at
>>> scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
>>>         at
>>> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
>>>         at
>>> scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
>>>         at
>>> scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
>>> 15:27:29,543 INFO  Remoting
>>>          - Quarantined address [akka.tcp://flink@10.155.208.136:42624]
>>> is still unreachable or has not been restarted. Keeping it quarantined.
>>> 15:28:45,964 INFO  Remoting
>>>          - Quarantined address [akka.tcp://flink@10.155.208.135:56307]
>>> is still unreachable or has not been restarted. Keeping it quarantined.
>>> 15:28:45,978 WARN  Remoting
>>>          - Tried to associate with unreachable remote address [akka.tcp://
>>> flink@10.155.208.135:56307]. Address is now gated for 5000 ms, all
>>> messages to this address will be delivered to dead letters. Reason: The
>>> remote system has a UID that has been quarantined. Association aborted.
>>> 15:28:45,991 INFO  Remoting
>>>          - Quarantined address [akka.tcp://flink@10.155.208.136:42624]
>>> is still unreachable or has not been restarted. Keeping it quarantined.
>>> 15:28:45,999 INFO  Remoting
>>>          - Quarantined address [akka.tcp://flink@10.155.208.138:55717]
>>> is still unreachable or has not been restarted. Keeping it quarantined.
>>> 15:28:46,108 INFO  Remoting
>>>          - Quarantined address [akka.tcp://flink@10.155.208.137:56659]
>>> is still unreachable or has not been restarted. Keeping it quarantined.
>>> 15:28:53,349 INFO  org.apache.flink.runtime.instance.InstanceManager
>>>         - Registering TaskManager at akka.tcp://
>>> flink@10.155.208.135:56307/user/taskmanager which was marked as dead
>>> earlier because of a heart-beat timeout.
>>> 15:28:53,349 INFO  org.apache.flink.runtime.instance.InstanceManager
>>>         - Registered TaskManager at vm-10-155-208-135 (akka.tcp://
>>> flink@10.155.208.135:56307/user/taskmanager) as
>>> 9b6267856b55ca2937bab02b48e464a8. Current number of registered hosts is 4.
>>> Current number of alive task slots is 13.
>>> 15:30:25,988 INFO  Remoting
>>>          - Quarantined address [akka.tcp://flink@10.155.208.136:42624]
>>> is still unreachable or has not been restarted. Keeping it quarantined.
>>> 15:30:25,992 INFO  Remoting
>>>          - Quarantined address [akka.tcp://flink@10.155.208.138:55717]
>>> is still unreachable or has not been restarted. Keeping it quarantined.
>>> 15:30:26,090 INFO  Remoting
>>>          - Quarantined address [akka.tcp://flink@10.155.208.137:56659]
>>> is still unreachable or has not been restarted. Keeping it quarantined.
>>> 15:32:05,994 INFO  Remoting
>>>          - Quarantined address [akka.tcp://flink@10.155.208.137:56659]
>>> is still unreachable or has not been restarted. Keeping it quarantined.
>>> 15:32:05,996 INFO  Remoting
>>>          - Quarantined address [akka.tcp://flink@10.155.208.138:55717]
>>> is still unreachable or has not been restarted. Keeping it quarantined.
>>> 15:32:05,999 INFO  Remoting
>>>          - Quarantined address [akka.tcp://flink@10.155.208.136:42624]
>>> is still unreachable or has not been restarted. Keeping it quarantined.
>>> 15:32:06,007 WARN  Remoting
>>>          - Tried to associate with unreachable remote address [akka.tcp://
>>> flink@10.155.208.136:42624]. Address is now gated for 5000 ms, all
>>> messages to this address will be delivered to dead letters. Reason: The
>>> remote system has a UID that has been quarantined. Association aborted.
>>> 15:32:11,743 INFO  org.apache.flink.runtime.instance.InstanceManager
>>>         - Registering TaskManager at akka.tcp://
>>> flink@10.155.208.136:42624/user/taskmanager which was marked as dead
>>> earlier because of a heart-beat timeout.
>>> 15:32:11,743 INFO  org.apache.flink.runtime.instance.InstanceManager
>>>         - Registered TaskManager at slave2 (akka.tcp://
>>> flink@10.155.208.136:42624/user/taskmanager) as
>>> ff5d97a4837980371a14afa2b0f0823b. Current number of registered hosts is 5.
>>> Current number of alive task slots is 17.
>>> 15:58:47,993 WARN  akka.remote.ReliableDeliverySupervisor
>>>          - Association with remote system [akka.tcp://
>>> flink@10.155.208.156:33728] has failed, address is now gated for [5000]
>>> ms. Reason is: [Disassociated].
>>> 15:58:48,176 WARN  akka.remote.ReliableDeliverySupervisor
>>>          - Association with remote system [akka.tcp://
>>> flink@10.155.208.157:33728] has failed, address is now gated for [5000]
>>> ms. Reason is: [Disassociated].
>>> 15:58:48,744 WARN  akka.remote.ReliableDeliverySupervisor
>>>          - Association with remote system [akka.tcp://
>>> flink@10.155.208.158:33728] has failed, address is now gated for [5000]
>>> ms. Reason is: [Disassociated].
>>> 15:58:51,736 WARN  akka.remote.ReliableDeliverySupervisor
>>>          - Association with remote system [akka.tcp://
>>> flink@10.155.208.135:56307] has failed, address is now gated for [5000]
>>> ms. Reason is: [Disassociated].
>>> 15:58:54,050 WARN  akka.remote.ReliableDeliverySupervisor
>>>          - Association with remote system [akka.tcp://
>>> flink@10.155.208.136:42624] has failed, address is now gated for [5000]
>>> ms. Reason is: [Disassociated].
>>> 15:58:54,502 INFO  org.apache.flink.runtime.webmonitor.WebRuntimeMonitor
>>>         - Removing web root dir
>>> /tmp/flink-web-0dd9cb75-526a-4a95-a920-194ed24b0389
>>> 15:58:54,509 INFO  org.apache.flink.runtime.blob.BlobServer
>>>          - Stopped BLOB server at 0.0.0.0:55249
>>>
>>>
>>> Kind Regards,
>>> Ravinder Kaur
>>>
>>> On Mon, Mar 21, 2016 at 10:23 PM, Ufuk Celebi <uce@apache.org> wrote:
>>>
>>>> Hey Ravinder,
>>>>
>>>> can you please share the JobManager logs as well?
>>>>
>>>> The logs say that the TaskManager disconnects from the JobManager,
>>>> because that one is not reachable anymore. At this point, the running
>>>> shuffles are cancelled and you see the follow up
>>>> RemoteTransportExceptions.
>>>>
>>>> – Ufuk
>>>>
>>>>
>>>> On Mon, Mar 21, 2016 at 5:42 PM, Ravinder Kaur <neetu0404@gmail.com>
>>>> wrote:
>>>> > Hello All,
>>>> >
>>>> > I'm running the WordCount example streaming job and it fails because
>>>> of loss
>>>> > of Taskmanagers.
>>>> >
>>>> > When gone through the logs of the taskmanager it has the following
>>>> messages
>>>> >
>>>> > 15:14:26,592 INFO  org.apache.flink.streaming.runtime.tasks.StreamTask
>>>> > - State backend is set to heap memory (checkpoint to jobmanager)
>>>> > 15:14:26,703 INFO  org.apache.flink.runtime.taskmanager.Task
>>>> > - Keyed Aggregation -> Sink: Unnamed (15/25) switched to RUNNING
>>>> > 15:14:26,709 INFO  org.apache.flink.runtime.taskmanager.Task
>>>> > - Keyed Aggregation -> Sink: Unnamed (17/25) switched to RUNNING
>>>> > 15:14:26,712 INFO  org.apache.flink.runtime.taskmanager.Task
>>>> > - Keyed Aggregation -> Sink: Unnamed (16/25) switched to RUNNING
>>>> > 15:14:26,714 INFO  org.apache.flink.runtime.taskmanager.Task
>>>> > - Keyed Aggregation -> Sink: Unnamed (18/25) switched to RUNNING
>>>> > 15:27:29,356 INFO  org.apache.flink.runtime.taskmanager.Task
>>>> > - Keyed Aggregation -> Sink: Unnamed (15/25) switched to FAILED with
>>>> > exception.
>>>> >
>>>> org.apache.flink.runtime.io.network.netty.exception.RemoteTransportException:
>>>> > Connection unexpectedly closed by remote task manager
>>>> > 'vm-10-155-208-135.cloud.mwn.de/10.155.208.135:37028'. This might
>>>> indicate
>>>> > that the remote task manager was lost.
>>>> >         at
>>>> >
>>>> org.apache.flink.runtime.io.network.netty.PartitionRequestClientHandler.channelInactive(PartitionRequestClientHandler.java:119)
>>>> >         at
>>>> >
>>>> io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:208)
>>>> >         at
>>>> >
>>>> io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:194)
>>>> >         at
>>>> >
>>>> io.netty.channel.ChannelInboundHandlerAdapter.channelInactive(ChannelInboundHandlerAdapter.java:75)
>>>> >         at
>>>> >
>>>> io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:208)
>>>> >         at
>>>> >
>>>> io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:194)
>>>> >         at
>>>> >
>>>> io.netty.handler.codec.ByteToMessageDecoder.channelInactive(ByteToMessageDecoder.java:306)
>>>> >         at
>>>> >
>>>> io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:208)
>>>> >         at
>>>> >
>>>> io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:194)
>>>> >         at
>>>> >
>>>> io.netty.channel.DefaultChannelPipeline.fireChannelInactive(DefaultChannelPipeline.java:828)
>>>> >         at
>>>> >
>>>> io.netty.channel.AbstractChannel$AbstractUnsafe$7.run(AbstractChannel.java:621)
>>>> >         at
>>>> >
>>>> io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:358)
>>>> >         at
>>>> io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:357)
>>>> >         at
>>>> >
>>>> io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:112)
>>>> >         at java.lang.Thread.run(Thread.java:745)
>>>> > 15:27:29,401 INFO  org.apache.flink.runtime.taskmanager.TaskManager
>>>> > - TaskManager akka://flink/user/taskmanager disconnects from
>>>> JobManager
>>>> > akka.tcp://flink@10.155.208.156:6123/user/jobmanager: JobManager is
>>>> no
>>>> > longer reachable
>>>> > 15:27:29,384 WARN  akka.remote.RemoteWatcher
>>>> > - Detected unreachable: [akka.tcp://flink@10.155.208.156:6123]
>>>> > 15:27:29,356 INFO  org.apache.flink.runtime.taskmanager.Task
>>>> > - Keyed Aggregation -> Sink: Unnamed (18/25) switched to FAILED with
>>>> > exception.
>>>> >
>>>> org.apache.flink.runtime.io.network.netty.exception.RemoteTransportException:
>>>> > Connection unexpectedly closed by remote task manager
>>>> > 'vm-10-155-208-135.cloud.mwn.de/10.155.208.135:37028'. This might
>>>> indicate
>>>> > that the remote task manager was lost.
>>>> >         at
>>>> >
>>>> org.apache.flink.runtime.io.network.netty.PartitionRequestClientHandler.channelInactive(PartitionRequestClientHandler.java:119)
>>>> >         at
>>>> >
>>>> io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:208)
>>>> >         at
>>>> >
>>>> io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:194)
>>>> >         at
>>>> >
>>>> io.netty.channel.ChannelInboundHandlerAdapter.channelInactive(ChannelInboundHandlerAdapter.java:75)
>>>> >         at
>>>> >
>>>> io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:208)
>>>> >         at
>>>> >
>>>> io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:194)
>>>> >         at
>>>> >
>>>> io.netty.handler.codec.ByteToMessageDecoder.channelInactive(ByteToMessageDecoder.java:306)
>>>> >         at
>>>> >
>>>> io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:208)
>>>> >         at
>>>> >
>>>> io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:194)
>>>> >         at
>>>> >
>>>> io.netty.channel.DefaultChannelPipeline.fireChannelInactive(DefaultChannelPipeline.java:828)
>>>> >         at
>>>> >
>>>> io.netty.channel.AbstractChannel$AbstractUnsafe$7.run(AbstractChannel.java:621)
>>>> >         at
>>>> >
>>>> io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:358)
>>>> >         at
>>>> io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:357)
>>>> >         at
>>>> >
>>>> io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:112)
>>>> >         at java.lang.Thread.run(Thread.java:745)
>>>> > 15:27:29,356 INFO  org.apache.flink.runtime.taskmanager.Task
>>>> > - Keyed Aggregation -> Sink: Unnamed (16/25) switched to FAILED with
>>>> > exception.
>>>> >
>>>> org.apache.flink.runtime.io.network.netty.exception.RemoteTransportException:
>>>> > Connection unexpectedly closed by remote task manager
>>>> > 'vm-10-155-208-135.cloud.mwn.de/10.155.208.135:37028'. This might
>>>> indicate
>>>> > that the remote task manager was lost.
>>>> >         at
>>>> >
>>>> org.apache.flink.runtime.io.network.netty.PartitionRequestClientHandler.channelInactive(PartitionRequestClientHandler.java:119)
>>>> >         at
>>>> >
>>>> io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:208)
>>>> >         at
>>>> >
>>>> io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:194)
>>>> >         at
>>>> >
>>>> io.netty.channel.ChannelInboundHandlerAdapter.channelInactive(ChannelInboundHandlerAdapter.java:75)
>>>> >         at
>>>> >
>>>> io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:208)
>>>> >         at
>>>> >
>>>> io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:194)
>>>> >         at
>>>> >
>>>> io.netty.handler.codec.ByteToMessageDecoder.channelInactive(ByteToMessageDecoder.java:306)
>>>> >         at
>>>> >
>>>> io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:208)
>>>> >         at
>>>> >
>>>> io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:194)
>>>> >         at
>>>> >
>>>> io.netty.channel.DefaultChannelPipeline.fireChannelInactive(DefaultChannelPipeline.java:828)
>>>> >         at
>>>> >
>>>> io.netty.channel.AbstractChannel$AbstractUnsafe$7.run(AbstractChannel.java:621)
>>>> >         at
>>>> >
>>>> io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:358)
>>>> >         at
>>>> io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:357)
>>>> >         at
>>>> >
>>>> io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:112)
>>>> >         at java.lang.Thread.run(Thread.java:745)
>>>> > 15:27:29,356 INFO  org.apache.flink.runtime.taskmanager.Task
>>>> > - Keyed Aggregation -> Sink: Unnamed (17/25) switched to FAILED with
>>>> > exception.
>>>> >
>>>> org.apache.flink.runtime.io.network.netty.exception.RemoteTransportException:
>>>> > Connection unexpectedly closed by remote task manager
>>>> > 'vm-10-155-208-135.cloud.mwn.de/10.155.208.135:37028'. This might
>>>> indicate
>>>> > that the remote task manager was lost.
>>>> >         at
>>>> >
>>>> org.apache.flink.runtime.io.network.netty.PartitionRequestClientHandler.channelInactive(PartitionRequestClientHandler.java:119)
>>>> >  at
>>>> >
>>>> io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:208)
>>>> >         at
>>>> >
>>>> io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:194)
>>>> >         at
>>>> >
>>>> io.netty.channel.ChannelInboundHandlerAdapter.channelInactive(ChannelInboundHandlerAdapter.java:75)
>>>> >         at
>>>> >
>>>> io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:208)
>>>> >         at
>>>> >
>>>> io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:194)
>>>> >         at
>>>> >
>>>> io.netty.handler.codec.ByteToMessageDecoder.channelInactive(ByteToMessageDecoder.java:306)
>>>> >         at
>>>> >
>>>> io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:208)
>>>> >         at
>>>> >
>>>> io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:194)
>>>> >         at
>>>> >
>>>> io.netty.channel.DefaultChannelPipeline.fireChannelInactive(DefaultChannelPipeline.java:828)
>>>> >         at
>>>> >
>>>> io.netty.channel.AbstractChannel$AbstractUnsafe$7.run(AbstractChannel.java:621)
>>>> >         at
>>>> >
>>>> io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:358)
>>>> >         at
>>>> io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:357)
>>>> >         at
>>>> >
>>>> io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:112)
>>>> >         at java.lang.Thread.run(Thread.java:745)
>>>> > 15:27:29,518 INFO  org.apache.flink.runtime.taskmanager.TaskManager
>>>> > - Cancelling all computations and discarding all cached data.
>>>> > 15:27:29,525 INFO  org.apache.flink.runtime.taskmanager.Task
>>>> > - Attempting to fail task externally Keyed Aggregation -> Sink:
>>>> Unnamed
>>>> > (17/25)
>>>> > 15:27:29,525 INFO  org.apache.flink.runtime.taskmanager.Task
>>>> > - Task Keyed Aggregation -> Sink: Unnamed (17/25) is already in state
>>>> FAILED
>>>> > 15:27:29,525 INFO  org.apache.flink.runtime.taskmanager.Task
>>>> > - Attempting to fail task externally Source: Read Text File Source ->
>>>> Flat
>>>> > Map (20/25)
>>>> > 15:27:29,525 INFO  org.apache.flink.runtime.taskmanager.Task
>>>> > - Source: Read Text File Source -> Flat Map (20/25) switched to
>>>> FAILED with
>>>> > exception.
>>>> > java.lang.Exception: TaskManager akka://flink/user/taskmanager
>>>> disconnects
>>>> > from JobManager akka.tcp://flink@10.155.208.156:6123/user/jobmanager:
>>>> > JobManager is no longer reachable
>>>> >         at
>>>> >
>>>> org.apache.flink.runtime.taskmanager.TaskManager.handleJobManagerDisconnect(TaskManager.scala:826)
>>>> >         at
>>>> >
>>>> org.apache.flink.runtime.taskmanager.TaskManager$$anonfun$handleMessage$1.applyOrElse(TaskManager.scala:297)
>>>> >         at
>>>> >
>>>> scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36)
>>>> >         at
>>>> >
>>>> org.apache.flink.runtime.LeaderSessionMessageFilter$$anonfun$receive$1.applyOrElse(LeaderSessionMessageFilter.scala:44)
>>>> >         at
>>>> >
>>>> scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36)
>>>> >         at
>>>> >
>>>> org.apache.flink.runtime.LogMessages$$anon$1.apply(LogMessages.scala:33)
>>>> >         at
>>>> >
>>>> org.apache.flink.runtime.LogMessages$$anon$1.apply(LogMessages.scala:28)
>>>> >         at
>>>> > scala.PartialFunction$class.applyOrElse(PartialFunction.scala:123)
>>>> >         at
>>>> >
>>>> org.apache.flink.runtime.LogMessages$$anon$1.applyOrElse(LogMessages.scala:28)
>>>> >         at akka.actor.Actor$class.aroundReceive(Actor.scala:465)
>>>> >         at
>>>> >
>>>> org.apache.flink.runtime.taskmanager.TaskManager.aroundReceive(TaskManager.scala:119)
>>>> >         at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516)
>>>> >         at
>>>> >
>>>> akka.actor.dungeon.DeathWatch$class.receivedTerminated(DeathWatch.scala:46)
>>>> >         at
>>>> akka.actor.ActorCell.receivedTerminated(ActorCell.scala:369)
>>>> >         at
>>>> akka.actor.ActorCell.autoReceiveMessage(ActorCell.scala:501)
>>>> >         at akka.actor.ActorCell.invoke(ActorCell.scala:486)
>>>> >         at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:254)
>>>> >         at akka.dispatch.Mailbox.run(Mailbox.scala:221)
>>>> >         at akka.dispatch.Mailbox.exec(Mailbox.scala:231)
>>>> >         at
>>>> > scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
>>>> >         at
>>>> >
>>>> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.pollAndExecAll(ForkJoinPool.java:1253)
>>>> >  at
>>>> >
>>>> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1346)
>>>> >         at
>>>> >
>>>> scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
>>>> >         at
>>>> >
>>>> scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
>>>> > 15:27:29,526 INFO  org.apache.flink.runtime.taskmanager.Task
>>>> > - Freeing task resources for Keyed Aggregation -> Sink: Unnamed
>>>> (15/25)
>>>> > 15:27:29,525 INFO  org.apache.flink.runtime.taskmanager.Task
>>>> > - Freeing task resources for Keyed Aggregation -> Sink: Unnamed
>>>> (17/25)
>>>> > 15:27:29,526 INFO  org.apache.flink.runtime.taskmanager.Task
>>>> > - Freeing task resources for Keyed Aggregation -> Sink: Unnamed
>>>> (16/25)
>>>> > 15:27:29,527 INFO  org.apache.flink.runtime.taskmanager.Task
>>>> > - Freeing task resources for Keyed Aggregation -> Sink: Unnamed
>>>> (18/25)
>>>> > 15:27:29,528 INFO  org.apache.flink.runtime.taskmanager.Task
>>>> > - Triggering cancellation of task code Source: Read Text File Source
>>>> -> Flat
>>>> > Map (20/25) (9cd65cbf15aafe0241b9d59e48b79f52).
>>>> > 15:27:29,531 INFO  org.apache.flink.runtime.taskmanager.Task
>>>> > - Attempting to fail task externally Source: Read Text File Source ->
>>>> Flat
>>>> > Map (6/25)
>>>> > 15:27:29,531 INFO  org.apache.flink.runtime.taskmanager.Task
>>>> > - Source: Read Text File Source -> Flat Map (6/25) switched to FAILED
>>>> with
>>>> > exception.
>>>> > java.lang.Exception: TaskManager akka://flink/user/taskmanager
>>>> disconnects
>>>> > from JobManager akka.tcp://flink@10.155.208.156:6123/user/jobmanager:
>>>> > JobManager is no longer reachable
>>>> >         at
>>>> >
>>>> org.apache.flink.runtime.taskmanager.TaskManager.handleJobManagerDisconnect(TaskManager.scala:826)
>>>> >         at
>>>> >
>>>> org.apache.flink.runtime.taskmanager.TaskManager$$anonfun$handleMessage$1.applyOrElse(TaskManager.scala:297)
>>>> >         at
>>>> >
>>>> scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36)
>>>> >         at
>>>> >
>>>> org.apache.flink.runtime.LeaderSessionMessageFilter$$anonfun$receive$1.applyOrElse(LeaderSessionMessageFilter.scala:44)
>>>> >         at
>>>> >
>>>> scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36)
>>>> >         at
>>>> >
>>>> org.apache.flink.runtime.LogMessages$$anon$1.apply(LogMessages.scala:33)
>>>> >         at
>>>> >
>>>> org.apache.flink.runtime.LogMessages$$anon$1.apply(LogMessages.scala:28)
>>>> >         at
>>>> > scala.PartialFunction$class.applyOrElse(PartialFunction.scala:123)
>>>> >         at
>>>> >
>>>> org.apache.flink.runtime.LogMessages$$anon$1.applyOrElse(LogMessages.scala:28)
>>>> >         at akka.actor.Actor$class.aroundReceive(Actor.scala:465)
>>>> >         at
>>>> >
>>>> org.apache.flink.runtime.taskmanager.TaskManager.aroundReceive(TaskManager.scala:119)
>>>> >         at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516)
>>>> >         at
>>>> >
>>>> akka.actor.dungeon.DeathWatch$class.receivedTerminated(DeathWatch.scala:46)
>>>> >         at
>>>> akka.actor.ActorCell.receivedTerminated(ActorCell.scala:369)
>>>> >         at
>>>> akka.actor.ActorCell.autoReceiveMessage(ActorCell.scala:501)
>>>> >         at akka.actor.ActorCell.invoke(ActorCell.scala:486)
>>>> >         at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:254)
>>>> >         at akka.dispatch.Mailbox.run(Mailbox.scala:221)
>>>> >         at akka.dispatch.Mailbox.exec(Mailbox.scala:231)
>>>> >         at
>>>> > scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
>>>> >         at
>>>> >
>>>> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.pollAndExecAll(ForkJoinPool.java:1253)
>>>> >         at
>>>> >
>>>> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1346)
>>>> >         at
>>>> >
>>>> scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
>>>> >         at
>>>> >
>>>> scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
>>>> > 15:27:29,532 INFO  org.apache.flink.runtime.taskmanager.Task
>>>> > - Freeing task resources for Source: Read Text File Source -> Flat Map
>>>> > (20/25)
>>>> > 15:27:29,535 INFO  org.apache.flink.runtime.taskmanager.Task
>>>> > - Triggering cancellation of task code Source: Read Text File Source
>>>> -> Flat
>>>> > Map (6/25) (2151d07383014506c5f1b0283b1986de).
>>>> > 15:27:29,535 INFO  org.apache.flink.runtime.taskmanager.Task
>>>> > - Attempting to fail task externally Keyed Aggregation -> Sink:
>>>> Unnamed
>>>> > (15/25)
>>>> > 15:27:29,535 INFO  org.apache.flink.runtime.taskmanager.Task
>>>> > - Task Keyed Aggregation -> Sink: Unnamed (15/25) is already in state
>>>> FAILED
>>>> > 15:27:29,535 INFO  org.apache.flink.runtime.taskmanager.Task
>>>> > - Attempting to fail task externally Source: Read Text File Source ->
>>>> Flat
>>>> > Map (13/25)
>>>> > 15:27:29,535 INFO  org.apache.flink.runtime.taskmanager.Task
>>>> > - Source: Read Text File Source -> Flat Map (13/25) switched to
>>>> FAILED with
>>>> > exception.
>>>> > java.lang.Exception: TaskManager akka://flink/user/taskmanager
>>>> disconnects
>>>> > from JobManager akka.tcp://flink@10.155.208.156:6123/user/jobmanager:
>>>> > JobManager is no longer reachable
>>>> >         at
>>>> >
>>>> org.apache.flink.runtime.taskmanager.TaskManager.handleJobManagerDisconnect(TaskManager.scala:826)
>>>> >         at
>>>> >
>>>> org.apache.flink.runtime.taskmanager.TaskManager$$anonfun$handleMessage$1.applyOrElse(TaskManager.scala:297)
>>>> >         at
>>>> >
>>>> scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36)
>>>> >         at
>>>> >
>>>> org.apache.flink.runtime.LeaderSessionMessageFilter$$anonfun$receive$1.applyOrElse(LeaderSessionMessageFilter.scala:44)
>>>> >         at
>>>> >
>>>> scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36)
>>>> >         at
>>>> >
>>>> org.apache.flink.runtime.LogMessages$$anon$1.apply(LogMessages.scala:33)
>>>> >         at
>>>> >
>>>> org.apache.flink.runtime.LogMessages$$anon$1.apply(LogMessages.scala:28)
>>>> >         at
>>>> > scala.PartialFunction$class.applyOrElse(PartialFunction.scala:123)
>>>> >         at
>>>> >
>>>> org.apache.flink.runtime.LogMessages$$anon$1.applyOrElse(LogMessages.scala:28)
>>>> >         at akka.actor.Actor$class.aroundReceive(Actor.scala:465)
>>>> >         at
>>>> >
>>>> org.apache.flink.runtime.taskmanager.TaskManager.aroundReceive(TaskManager.scala:119)
>>>> >         at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516)
>>>> >         at
>>>> >
>>>> akka.actor.dungeon.DeathWatch$class.receivedTerminated(DeathWatch.scala:46)
>>>> >         at
>>>> akka.actor.ActorCell.receivedTerminated(ActorCell.scala:369)
>>>> >         at
>>>> akka.actor.ActorCell.autoReceiveMessage(ActorCell.scala:501)
>>>> >         at akka.actor.ActorCell.invoke(ActorCell.scala:486)
>>>> >         at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:254)
>>>> >         at akka.dispatch.Mailbox.run(Mailbox.scala:221)
>>>> >         at akka.dispatch.Mailbox.exec(Mailbox.scala:231)
>>>> >         at
>>>> > scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
>>>> >         at
>>>> >
>>>> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.pollAndExecAll(ForkJoinPool.java:1253)
>>>> >         at
>>>> >
>>>> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1346)
>>>> >         at
>>>> >
>>>> scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
>>>> >         at
>>>> >
>>>> scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
>>>> > 15:27:29,536 INFO  org.apache.flink.runtime.taskmanager.Task
>>>> > - Freeing task resources for Source: Read Text File Source -> Flat Map
>>>> > (6/25)
>>>> > 15:27:29,538 INFO  org.apache.flink.runtime.taskmanager.Task
>>>> > - Triggering cancellation of task code Source: Read Text File Source
>>>> -> Flat
>>>> > Map (13/25) (a6a0f05e0362dbf0505c1439893e53e4).
>>>> > 15:27:29,539 INFO  org.apache.flink.runtime.taskmanager.Task
>>>> > - Attempting to fail task externally Keyed Aggregation -> Sink:
>>>> Unnamed
>>>> > (16/25)
>>>> > 15:27:29,539 INFO  org.apache.flink.runtime.taskmanager.Task
>>>> > - Task Keyed Aggregation -> Sink: Unnamed (16/25) is already in state
>>>> FAILED
>>>> > 15:27:29,539 INFO  org.apache.flink.runtime.taskmanager.Task
>>>> > - Attempting to fail task externally Source: Read Text File Source ->
>>>> Flat
>>>> > Map (24/25)
>>>> > 15:27:29,539 INFO  org.apache.flink.runtime.taskmanager.Task
>>>> > - Source: Read Text File Source -> Flat Map (24/25) switched to
>>>> FAILED with
>>>> > exception.
>>>> > java.lang.Exception: TaskManager akka://flink/user/taskmanager
>>>> disconnects
>>>> > from JobManager akka.tcp://flink@10.155.208.156:6123/user/jobmanager:
>>>> > JobManager is no longer reachable
>>>> >         at
>>>> >
>>>> org.apache.flink.runtime.taskmanager.TaskManager.handleJobManagerDisconnect(TaskManager.scala:826)
>>>> >         at
>>>> >
>>>> org.apache.flink.runtime.taskmanager.TaskManager$$anonfun$handleMessage$1.applyOrElse(TaskManager.scala:297)
>>>> >         at
>>>> >
>>>> scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36)
>>>> >         at
>>>> >
>>>> org.apache.flink.runtime.LeaderSessionMessageFilter$$anonfun$receive$1.applyOrElse(LeaderSessionMessageFilter.scala:44)
>>>> >         at
>>>> >
>>>> scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36)
>>>> >         at
>>>> >
>>>> org.apache.flink.runtime.LogMessages$$anon$1.apply(LogMessages.scala:28)
>>>> >         at
>>>> > scala.PartialFunction$class.applyOrElse(PartialFunction.scala:123)
>>>> >         at
>>>> >
>>>> org.apache.flink.runtime.LogMessages$$anon$1.applyOrElse(LogMessages.scala:28)
>>>> >         at akka.actor.Actor$class.aroundReceive(Actor.scala:465)
>>>> >         at
>>>> >
>>>> org.apache.flink.runtime.taskmanager.TaskManager.aroundReceive(TaskManager.scala:119)
>>>> >         at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516)
>>>> >         at
>>>> >
>>>> akka.actor.dungeon.DeathWatch$class.receivedTerminated(DeathWatch.scala:46)
>>>> >         at
>>>> akka.actor.ActorCell.receivedTerminated(ActorCell.scala:369)
>>>> >         at
>>>> akka.actor.ActorCell.autoReceiveMessage(ActorCell.scala:501)
>>>> >         at akka.actor.ActorCell.invoke(ActorCell.scala:486)
>>>> >         at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:254)
>>>> >         at akka.dispatch.Mailbox.run(Mailbox.scala:221)
>>>> >         at akka.dispatch.Mailbox.exec(Mailbox.scala:231)
>>>> >         at
>>>> > scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
>>>> >         at
>>>> >
>>>> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.pollAndExecAll(ForkJoinPool.java:1253)
>>>> >         at
>>>> >
>>>> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1346)
>>>> >         at
>>>> >
>>>> scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
>>>> >         at
>>>> >
>>>> scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
>>>> > 15:27:29,541 INFO  org.apache.flink.runtime.taskmanager.Task
>>>> > - Triggering cancellation of task code Source: Read Text File Source
>>>> -> Flat
>>>> > Map (24/25) (96df0f3c0990c5a573c67a31a6207fa2).
>>>> > 15:27:29,541 INFO  org.apache.flink.runtime.taskmanager.Task
>>>> > - Attempting to fail task externally Keyed Aggregation -> Sink:
>>>> Unnamed
>>>> > (18/25)
>>>> > 15:27:29,541 INFO  org.apache.flink.runtime.taskmanager.Task
>>>> > - Task Keyed Aggregation -> Sink: Unnamed (18/25) is already in state
>>>> FAILED
>>>> > 15:27:29,542 INFO  org.apache.flink.runtime.taskmanager.Task
>>>> > - Freeing task resources for Source: Read Text File Source -> Flat Map
>>>> > (24/25)
>>>> > 15:27:29,543 INFO  org.apache.flink.runtime.taskmanager.Task
>>>> > - Freeing task resources for Source: Read Text File Source -> Flat Map
>>>> > (13/25)
>>>> > 15:27:29,551 INFO  org.apache.flink.runtime.taskmanager.TaskManager
>>>> > - Disassociating from JobManager
>>>> > 15:27:29,572 WARN  Remoting
>>>> > - Tried to associate with unreachable remote address
>>>> > [akka.tcp://flink@10.155.208.156:6123]. Address is now gated for
>>>> 5000 ms,
>>>> > all messages to this address will be delivered to dead letters.
>>>> Reason: The
>>>> > remote system has a UID that has been quarantined. Association
>>>> aborted.
>>>> > 15:27:29,582 WARN  Remoting
>>>> > - Tried to associate with unreachable remote address
>>>> > [akka.tcp://flink@10.155.208.156:6123]. Address is now gated for
>>>> 5000 ms,
>>>> > all messages to this address will be delivered to dead letters.
>>>> Reason: The
>>>> > remote system has quarantined this system. No further associations to
>>>> the
>>>> > remote system are possible until this system is restarted.
>>>> > 15:27:29,587 WARN  Remoting
>>>> > - Tried to associate with unreachable remote address
>>>> > [akka.tcp://flink@10.155.208.156:6123]. Address is now gated for
>>>> 5000 ms,
>>>> > all messages to this address will be delivered to dead letters.
>>>> Reason: The
>>>> > remote system has quarantined this system. No further associations to
>>>> the
>>>> > remote system are possible until this system is restarted.
>>>> > 15:27:29,623 INFO
>>>> org.apache.flink.runtime.io.network.netty.NettyClient
>>>> > - Successful shutdown (took 17 ms).
>>>> > 15:27:29,625 INFO
>>>> org.apache.flink.runtime.io.network.netty.NettyServer
>>>> > - Successful shutdown (took 2 ms).
>>>> > 15:27:29,640 INFO  org.apache.flink.runtime.taskmanager.TaskManager
>>>> > - Trying to register at JobManager
>>>> > akka.tcp://flink@10.155.208.156:6123/user/jobmanager (attempt 1,
>>>> timeout:
>>>> > 500 milliseconds)
>>>> > 15:27:30,157 INFO  org.apache.flink.runtime.taskmanager.TaskManager
>>>> > - Trying to register at JobManager
>>>> > akka.tcp://flink@10.155.208.156:6123/user/jobmanager (attempt 2,
>>>> timeout:
>>>> > 1000 milliseconds)
>>>> > 15:27:31,177 INFO  org.apache.flink.runtime.taskmanager.TaskManager
>>>> > - Trying to register at JobManager
>>>> > akka.tcp://flink@10.155.208.156:6123/user/jobmanager (attempt 3,
>>>> timeout:
>>>> > 2000 milliseconds)
>>>> > 15:27:33,193 INFO  org.apache.flink.runtime.taskmanager.TaskManager
>>>> > - Trying to register at JobManager
>>>> > akka.tcp://flink@10.155.208.156:6123/user/jobmanager (attempt 4,
>>>> timeout:
>>>> > 4000 milliseconds)
>>>> > 15:27:37,203 INFO  org.apache.flink.runtime.taskmanager.TaskManager
>>>> > - Trying to register at JobManager
>>>> > akka.tcp://flink@10.155.208.156:6123/user/jobmanager (attempt 5,
>>>> timeout:
>>>> > 8000 milliseconds)
>>>> > 15:27:37,218 WARN  Remoting
>>>> > - Tried to associate with unreachable remote address
>>>> > [akka.tcp://flink@10.155.208.156:6123]. Address is now gated for
>>>> 5000 ms,
>>>> > all messages to this address will be delivered to dead letters.
>>>> Reason: The
>>>> > remote system has quarantined this system. No further associations to
>>>> the
>>>> > remote system are possible until this system is restarted.
>>>> > 15:27:45,215 INFO  org.apache.flink.runtime.taskmanager.TaskManager
>>>> > - Trying to register at JobManager
>>>> > akka.tcp://flink@10.155.208.156:6123/user/jobmanager (attempt 6,
>>>> timeout:
>>>> > 16000 milliseconds)
>>>> > 15:27:45,235 WARN  Remoting
>>>> > - Tried to associate with unreachable remote address
>>>> > [akka.tcp://flink@10.155.208.156:6123]. Address is now gated for
>>>> 5000 ms,
>>>> > all messages to this address will be delivered to dead letters.
>>>> Reason: The
>>>> > remote system has quarantined this system. No further associations to
>>>> the
>>>> > remote system are possible until this system is restarted.
>>>> > 15:28:01,223 INFO  org.apache.flink.runtime.taskmanager.TaskManager
>>>> > - Trying to register at JobManager
>>>> > akka.tcp://flink@10.155.208.156:6123/user/jobmanager (attempt 7,
>>>> timeout: 30
>>>> > seconds)
>>>> > 15:28:01,237 WARN  Remoting
>>>> > - Tried to associate with unreachable remote address
>>>> > [akka.tcp://flink@10.155.208.156:6123]. Address is now gated for
>>>> 5000 ms,
>>>> > all messages to this address will be delivered to dead letters.
>>>> Reason: The
>>>> > remote system has quarantined this system. No further associations to
>>>> the
>>>> > remote system are possible until this system is restarted.
>>>> > 15:28:31,243 INFO  org.apache.flink.runtime.taskmanager.TaskManager
>>>> > - Trying to register at JobManager
>>>> > akka.tcp://flink@10.155.208.156:6123/user/jobmanager (attempt 8,
>>>> timeout: 30
>>>> > seconds)
>>>> > 15:28:31,261 WARN  Remoting
>>>> > - Tried to associate with unreachable remote address
>>>> > [akka.tcp://flink@10.155.208.156:6123]. Address is now gated for
>>>> 5000 ms,
>>>> > all messages to this address will be delivered to dead letters.
>>>> Reason: The
>>>> > remote system has quarantined this system. No further associations to
>>>> the
>>>> > remote system are possible until this system is restarted.
>>>> > 15:28:45,960 WARN  Remoting
>>>> > - Tried to associate with unreachable remote address
>>>> > [akka.tcp://flink@10.155.208.156:6123]. Address is now gated for
>>>> 5000 ms,
>>>> > all messages to this address will be delivered to dead letters.
>>>> Reason: The
>>>> > remote system has quarantined this system. No further associations to
>>>> the
>>>> > remote system are possible until this system is restarted.
>>>> > 15:29:01,253 INFO  org.apache.flink.runtime.taskmanager.TaskManager
>>>> > - Trying to register at JobManager
>>>> > akka.tcp://flink@10.155.208.156:6123/user/jobmanager (attempt 9,
>>>> timeout: 30
>>>> > seconds)
>>>> > 15:29:01,273 WARN  Remoting
>>>> > - Tried to associate with unreachable remote address
>>>> > [akka.tcp://flink@10.155.208.156:6123]. Address is now gated for
>>>> 5000 ms,
>>>> > all messages to this address will be delivered to dead letters.
>>>> Reason: The
>>>> > remote system has quarantined this system. No further associations to
>>>> the
>>>> > remote system are possible until this system is restarted.
>>>> > 15:30:11,545 INFO  org.apache.flink.runtime.taskmanager.TaskManager
>>>> > - Trying to register at JobManager
>>>> > akka.tcp://flink@10.155.208.156:6123/user/jobmanager (attempt 10,
>>>> timeout:
>>>> > 30 seconds)
>>>> > 15:30:11,562 WARN  Remoting
>>>> > - Tried to associate with unreachable remote address
>>>> > [akka.tcp://flink@10.155.208.156:6123]. Address is now gated for
>>>> 5000 ms,
>>>> > all messages to this address will be delivered to dead letters.
>>>> Reason: The
>>>> > remote system has quarantined this system. No further associations to
>>>> the
>>>> > remote system are possible until this system is restarted.
>>>> > 15:30:25,958 WARN  Remoting
>>>> > - Tried to associate with unreachable remote address
>>>> > [akka.tcp://flink@10.155.208.156:6123]. Address is now gated for
>>>> 5000 ms,
>>>> > all messages to this address will be delivered to dead letters.
>>>> Reason: The
>>>> > remote system has quarantined this system. No further associations to
>>>> the
>>>> > remote system are possible until this system is restarted.
>>>> >
>>>> > Could anyone explain what is the reason for the systems getting
>>>> > disassociated and be quarantined?
>>>> >
>>>> > Kind Regards,
>>>> > Ravinder Kaur
>>>> >
>>>>
>>>
>>>
>>
>

Mime
View raw message