flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Hao Sun <ha...@zendesk.com>
Subject TM get killed/disconnected after a while
Date Fri, 06 Oct 2017 18:16:29 GMT
Hi, I am running Flink 1.3.2 on kubernetes, I am not sure why sometime one
of my TM is killed, is there a way to debug this? Thanks

===== Logs ====

*2017-10-05 22:36:42,631 INFO
org.apache.flink.runtime.instance.InstanceManager             - Registered
TaskManager at fps-flink-taskmanager-2384273947-9n4kc
(akka.tcp://flink@fps-flink-taskmanager-2384273947-9n4kc:40274/user/taskmanager)
as 330ff7eeaabfe2b7289fee4a0e36c4b2. Current number of registered hosts is
2. Current number of alive task slots is 2.*
2017-10-05 22:37:04,974 INFO
org.apache.flink.runtime.executiongraph.ExecutionGraph        - Deploying
Source: KafkaSource(maxwell.users) -> MaxwellFilter->Maxwell(maxwell.users)
-> FixedDelayWatermark(maxwell.users) ->
MaxwellFPSEvent->InfluxDBData(maxwell.users) -> (Sink:
influxdbSink(maxwell.users), Sink: PrintSink(maxwell.users)) (1/1) (attempt
#0) to fps-flink-taskmanager-2384273947-9n4kc
*2017-10-06 06:08:55,657 WARN  akka.remote.ReliableDeliverySupervisor
                  - Association with remote system
[akka.tcp://flink@fps-flink-taskmanager-2384273947-9n4kc:40274] has failed,
address is now gated for [5000] ms. Reason: [Disassociated]*
2017-10-06 06:08:55,832 WARN  Remoting
                - Tried to associate with unreachable remote address
[akka.tcp://flink@fps-flink-taskmanager-2384273947-9n4kc:40274]. Address is
now gated for 5000 ms, all messages to this address will be delivered to
dead letters. Reason: [The remote system has quarantined this system. No
further associations to the remote system are possible until this system is
restarted.]
2017-10-06 06:09:01,232 WARN  akka.remote.ReliableDeliverySupervisor
                - Association with remote system
[akka.tcp://flink@fps-flink-taskmanager-2384273947-9n4kc:40274] has failed,
address is now gated for [5000] ms. Reason: [Association failed with
[akka.tcp://flink@fps-flink-taskmanager-2384273947-9n4kc:40274]] Caused by:
[fps-flink-taskmanager-2384273947-9n4kc: Name does not resolve]
2017-10-06 06:09:03,416 WARN  akka.remote.ReliableDeliverySupervisor
                - Association with remote system
[akka.tcp://flink@fps-flink-taskmanager-2384273947-9n4kc:40274] has failed,
address is now gated for [5000] ms. Reason: [Association failed with
[akka.tcp://flink@fps-flink-taskmanager-2384273947-9n4kc:40274]] Caused by:
[fps-flink-taskmanager-2384273947-9n4kc]
2017-10-06 06:09:11,174 WARN  akka.remote.ReliableDeliverySupervisor
                - Association with remote system
[akka.tcp://flink@fps-flink-taskmanager-2384273947-9n4kc:40274] has failed,
address is now gated for [5000] ms. Reason: [Association failed with
[akka.tcp://flink@fps-flink-taskmanager-2384273947-9n4kc:40274]] Caused by:
[fps-flink-taskmanager-2384273947-9n4kc]
2017-10-06 06:09:11,440 WARN  Remoting
                - Tried to associate with unreachable remote address
[akka.tcp://flink@fps-flink-taskmanager-2384273947-9n4kc:40274]. Address is
now gated for 5000 ms, all messages to this address will be delivered to
dead letters. Reason: [The remote system has quarantined this system. No
further associations to the remote system are possible until this system is
restarted.]
2017-10-06 06:09:21,232 WARN  akka.remote.ReliableDeliverySupervisor
                - Association with remote system
[akka.tcp://flink@fps-flink-taskmanager-2384273947-9n4kc:40274] has failed,
address is now gated for [5000] ms. Reason: [Association failed with
[akka.tcp://flink@fps-flink-taskmanager-2384273947-9n4kc:40274]] Caused by:
[fps-flink-taskmanager-2384273947-9n4kc: Name does not resolve]
2017-10-06 06:09:27,460 WARN  Remoting
                - Tried to associate with unreachable remote address
[akka.tcp://flink@fps-flink-taskmanager-2384273947-9n4kc:40274]. Address is
now gated for 5000 ms, all messages to this address will be delivered to
dead letters. Reason: [The remote system has quarantined this system. No
further associations to the remote system are possible until this system is
restarted.]
2017-10-06 06:09:31,173 WARN  akka.remote.ReliableDeliverySupervisor
                - Association with remote system
[akka.tcp://flink@fps-flink-taskmanager-2384273947-9n4kc:40274] has failed,
address is now gated for [5000] ms. Reason: [Association failed with
[akka.tcp://flink@fps-flink-taskmanager-2384273947-9n4kc:40274]] Caused by:
[fps-flink-taskmanager-2384273947-9n4kc]
2017-10-06 06:09:41,179 WARN  akka.remote.ReliableDeliverySupervisor
                - Association with remote system
[akka.tcp://flink@fps-flink-taskmanager-2384273947-9n4kc:40274] has failed,
address is now gated for [5000] ms. Reason: [Association failed with
[akka.tcp://flink@fps-flink-taskmanager-2384273947-9n4kc:40274]] Caused by:
[fps-flink-taskmanager-2384273947-9n4kc: Name does not resolve]
2017-10-06 06:09:51,174 WARN  akka.remote.ReliableDeliverySupervisor
                - Association with remote system
[akka.tcp://flink@fps-flink-taskmanager-2384273947-9n4kc:40274] has failed,
address is now gated for [5000] ms. Reason: [Association failed with
[akka.tcp://flink@fps-flink-taskmanager-2384273947-9n4kc:40274]] Caused by:
[fps-flink-taskmanager-2384273947-9n4kc]
2017-10-06 06:09:57,475 WARN  Remoting
                - Tried to associate with unreachable remote address
[akka.tcp://flink@fps-flink-taskmanager-2384273947-9n4kc:40274]. Address is
now gated for 5000 ms, all messages to this address will be delivered to
dead letters. Reason: [The remote system has quarantined this system. No
further associations to the remote system are possible until this system is
restarted.]
2017-10-06 06:10:01,179 WARN  akka.remote.ReliableDeliverySupervisor
                - Association with remote system
[akka.tcp://flink@fps-flink-taskmanager-2384273947-9n4kc:40274] has failed,
address is now gated for [5000] ms. Reason: [Association failed with
[akka.tcp://flink@fps-flink-taskmanager-2384273947-9n4kc:40274]] Caused by:
[fps-flink-taskmanager-2384273947-9n4kc: Name does not resolve]
2017-10-06 06:10:06,173 WARN  akka.remote.RemoteWatcher
                 - Detected unreachable:
[akka.tcp://flink@fps-flink-taskmanager-2384273947-9n4kc:40274]
2017-10-06 06:10:06,177 INFO
org.apache.flink.runtime.jobmanager.JobManager                - Task
manager akka.tcp://flink@fps-flink-taskmanager-2384273947-9n4kc:40274/user/taskmanager
terminated.
java.lang.Exception: TaskManager was lost/killed:
55d3143ccecec7878f7df169208795d0 @ fps-flink-taskmanager-2384273947-9n4kc
(dataPort=37448)
java.lang.Exception: TaskManager was lost/killed:
55d3143ccecec7878f7df169208795d0 @ fps-flink-taskmanager-2384273947-9n4kc
(dataPort=37448)
2017-10-06 06:10:06,188 WARN  akka.remote.ReliableDeliverySupervisor
                - Association with remote system
[akka.tcp://flink@fps-flink-taskmanager-2384273947-9n4kc:40274] has failed,
address is now gated for [5000] ms. Reason: [Association failed with
[akka.tcp://flink@fps-flink-taskmanager-2384273947-9n4kc:40274]] Caused by:
[fps-flink-taskmanager-2384273947-9n4kc]
2017-10-06 06:10:06,240 INFO
org.apache.flink.runtime.instance.InstanceManager             -
Unregistered task manager fps-flink-taskmanager-2384273947-9n4kc/
10.225.132.78. Number of registered task managers 3. Number of available
slots 3.
2017-10-06 06:10:16,247 WARN  akka.remote.ReliableDeliverySupervisor
                - Association with remote system
[akka.tcp://flink@fps-flink-taskmanager-2384273947-9n4kc:40274] has failed,
address is now gated for [5000] ms. Reason: [Association failed with
[akka.tcp://flink@fps-flink-taskmanager-2384273947-9n4kc:40274]] Caused by:
[fps-flink-taskmanager-2384273947-9n4kc: Name does not resolve]
2017-10-06 06:10:26,284 WARN  akka.remote.ReliableDeliverySupervisor
                - Association with remote system
[akka.tcp://flink@fps-flink-taskmanager-2384273947-9n4kc:40274] has failed,
address is now gated for [5000] ms. Reason: [Association failed with
[akka.tcp://flink@fps-flink-taskmanager-2384273947-9n4kc:40274]] Caused by:
[fps-flink-taskmanager-2384273947-9n4kc: Name does not resolve]
2017-10-06 06:10:27,495 WARN  Remoting
                - Tried to associate with unreachable remote address
[akka.tcp://flink@fps-flink-taskmanager-2384273947-9n4kc:40274]. Address is
now gated for 5000 ms, all messages to this address will be delivered to
dead letters. Reason: [The remote system has quarantined this system. No
further associations to the remote system are possible until this system is
restarted.]

Mime
View raw message