flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From mozer <mehmet.a.o...@externe.bnpparibas.com>
Subject Unable to start Flink HA cluster with Zookeeper
Date Tue, 21 Aug 2018 13:15:03 GMT
I am trying to install a Flink HA cluster (Zookeeper mode) but the task
manager cannot find the job manager. 

Here I give you the architecture; 

    - Machine 1 : Job Manager + Zookeeper
    - Machine 2 : Task Manager

masters: 

    Machine1

slaves : 

    Machine2

flink-conf.yaml: 

    #jobmanager.rpc.address: localhost
    jobmanager.rpc.port: 6123
    blob.server.port: 50100-50200
    taskmanager.data.port: 6121
    high-availability: zookeeper
    high-availability.zookeeper.quorum: Machine1:2181
    high-availability.zookeeper.path.root: /flink-1.5.1
    high-availability.cluster-id: /default_b
    high-availability.storageDir: file:///shareflink/recovery

Here this is the log of Task Manager, it tries to connect to localhost
instead of Machine1:

    2018-08-17 10:46:44,875 INFO 
org.apache.flink.runtime.util.LeaderRetrievalUtils            - Trying to
select the network interface and address to use by connecting to the leading
JobManager.
    2018-08-17 10:46:44,876 INFO 
org.apache.flink.runtime.util.LeaderRetrievalUtils            - TaskManager
will try to connect for 10000 milliseconds before falling back to heuristics
    2018-08-17 10:46:44,966 INFO 
org.apache.flink.runtime.net.ConnectionUtils                  - Retrieved
new target address /127.0.0.1:37133.
    2018-08-17 10:46:45,324 INFO 
org.apache.flink.runtime.net.ConnectionUtils                  - Trying to
connect to address /127.0.0.1:37133
    2018-08-17 10:46:45,325 INFO 
org.apache.flink.runtime.net.ConnectionUtils                  - Failed to
connect from address 'Machine2/IP-Machine2': Connection refused
    2018-08-17 10:46:45,325 INFO 
org.apache.flink.runtime.net.ConnectionUtils                  - Failed to
connect from address '/127.0.0.1': Connection refused
    2018-08-17 10:46:45,325 INFO 
org.apache.flink.runtime.net.ConnectionUtils                  - Failed to
connect from address '/IP_Machine2': Connection refused
    2018-08-17 10:46:45,325 INFO 
org.apache.flink.runtime.net.ConnectionUtils                  - Failed to
connect from address '/127.0.0.1': Connection refused
    2018-08-17 10:46:45,326 INFO 
org.apache.flink.runtime.net.ConnectionUtils                  - Failed to
connect from address '/IP_Machine2': Connection refused
    2018-08-17 10:46:45,326 INFO 
org.apache.flink.runtime.net.ConnectionUtils                  - Failed to
connect from address '/127.0.0.1': Connection refused
    2018-08-17 10:46:45,726 INFO 
org.apache.flink.runtime.net.ConnectionUtils                  - Trying to
connect to address /127.0.0.1:37133
    2018-08-17 10:46:45,727 INFO 
org.apache.flink.runtime.net.ConnectionUtils                  - Failed to
connect from address 'Machine2/IP-Machine2
    
    2018-08-17 10:47:22,022 WARN  akka.remote.ReliableDeliverySupervisor                 
     
- Association with remote system [akka.tcp://flink@127.0.0.1:36515] has
failed, address is now gated for [50] ms. Reason: [Association failed with
[akka.tcp://flink@127.0.0.1:36515]] Caused by: [Connection refused:
/127.0.0.1:36515]

    2018-08-17 10:47:22,022 INFO 
org.apache.flink.runtime.taskexecutor.TaskExecutor            - Could not
resolve ResourceManager address
akka.tcp://flink@127.0.0.1:36515/user/resourcemanager, retrying in 10000 ms:
Could not connect to rpc endpoint under address
akka.tcp://flink@127.0.0.1:36515/user/resourcemanager..
    2018-08-17 10:47:32,037 WARN  akka.remote.transport.netty.NettyTransport             
     
- Remote connection to [null] failed with java.net.ConnectException:
Connection refused: /127.0.0.1:36515



PS. : **/etc/hosts** contains the **localhost, Machine1 and Machine2**


Can you please tell me how the Task Manager can connect to Job Manager ? 

Regards





--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Mime
View raw message