flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ethan Li <ethanopensou...@gmail.com>
Subject Re: TaskManager gets confused after the JobManager restarts
Date Thu, 14 Feb 2019 16:06:56 GMT
The related job manager log is https://gist.github.com/Ethanlm/86a10e786ad9025ddaa27c113c536da8

> On Feb 14, 2019, at 9:40 AM, Ethan Li <ethanopensource@gmail.com> wrote:
> 
> Hello,
> 
> I have a standalone flink-1.4.2 cluster with one JobManager, one TaskManager, and zookeeper.
 I first started JM and TM and waited for them to be stable. Then I restarted JM. It’s when
the TM got confused.
> 
> TM got notified that Leader node has changed and it tried to register to the new Leader
(the new rpc port is 34561). Then it got the acknowledge says it’s already registered. And
it then kept trying to associate with the old JM roc port (35213) and fail.
> 
> 2019-02-14 14:56:54,059 INFO  org.apache.flink.runtime.taskmanager.TaskManager      
       - Trying to register at JobManager akka.ssl.tcp://flink@openstorm10blue-n1.blue.ygrid.yahoo.com:34561/user/jobmanager
<akka.ssl.tcp://flink@openstorm10blue-n1.blue.ygrid.yahoo.com:34561/user/jobmanager>
(attempt 1, timeout: 500 milliseconds)
> 2019-02-14 14:56:54,157 DEBUG org.apache.flink.shaded.akka.org.jboss.netty.handler.ssl.SslHandler
 - [id: 0x77ac93ae, /10.215.68.243:46796 => openstorm10blue-n1.blue.ygrid.yahoo.com/10.215.68.98:34561
<http://openstorm10blue-n1.blue.ygrid.yahoo.com/10.215.68.98:34561>] HANDSHAKEN: TLS_RSA_WITH_AES_128_CBC_SHA
> 2019-02-14 14:56:54,276 INFO  org.apache.flink.runtime.taskmanager.TaskManager      
       - Successful registration at JobManager (akka.ssl.tcp://flink@openstorm10blue-n1.blue.ygrid.yahoo.com:34561/user/jobmanager
<akka.ssl.tcp://flink@openstorm10blue-n1.blue.ygrid.yahoo.com:34561/user/jobmanager>),
starting network stack and library cache.
> 2019-02-14 14:56:54,276 INFO  org.apache.flink.runtime.taskmanager.TaskManager      
       - Determined BLOB server address to be openstorm10blue-n1.blue.ygrid.yahoo.com/10.215.68.98:50100
<http://openstorm10blue-n1.blue.ygrid.yahoo.com/10.215.68.98:50100>. Starting BLOB cache.
> 2019-02-14 14:56:54,278 INFO  org.apache.flink.runtime.blob.PermanentBlobCache      
       - Created BLOB cache storage directory /home/y/var/flink/blobstorage/blobStore-927b523f-f3ff-4ccc-83a0-362e09a3b858
> 2019-02-14 14:56:54,279 INFO  org.apache.flink.runtime.blob.TransientBlobCache      
       - Created BLOB cache storage directory /home/y/var/flink/blobstorage/blobStore-8492465e-0e94-4792-a346-66e6da299f7a
> 2019-02-14 14:56:54,572 DEBUG org.apache.flink.runtime.taskmanager.TaskManager      
       - TaskManager was triggered to register at JobManager, but is already registered
> 2019-02-14 14:56:56,359 WARN  akka.remote.transport.netty.NettyTransport            
       - Remote connection to [null] failed with java.net.ConnectException: Connection refused:
openstorm10blue-n1.blue.ygrid.yahoo.com/10.215.68.98:35213 <http://openstorm10blue-n1.blue.ygrid.yahoo.com/10.215.68.98:35213>
> 2019-02-14 14:56:56,360 DEBUG org.apache.flink.runtime.taskmanager.TaskManager      
       - The association error event's root cause is not of type InvalidAssociationException.
> 
> 
> 
> Full Task manage log:  https://gist.github.com/Ethanlm/e6f1b29d27d26813f5f8f40cd2c12643
<https://gist.github.com/Ethanlm/e6f1b29d27d26813f5f8f40cd2c12643>
> 
> 
> Is this expected or is this a bug? 
> 
> Thank you!
> 
> Ethan


Mime
View raw message