flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Le Xu <sharonx...@gmail.com>
Subject Could not resolve ResourceManager address on Flink 1.7.1
Date Tue, 12 Mar 2019 07:52:59 GMT
Hello:

I am trying to set up a standalone flink cluster (1.7.1) and I'm getting a
very similar error as the user reported in
this
<http://mail-archives.apache.org/mod_mbox/flink-user/201809.mbox/%3CCAGr9p8BGrC81q1SDZN=PDnmL2g4jU0Gb3TEPkaKhY=EUoB9QBw@mail.gmail.com%3E>
thread. However, I believe the root cause should be different -- as I tried
start job manager using both start-cluster.sh and jobmanager.sh but both of
them failed with the same error.
The error I got is on task manager (flink-worker1) is similar to the
following:

6:6123/user/resourcemanager, retrying in 10000 ms: Could not connect to rpc
endpoint under address akka.tcp://flink@10.0.0.6:6123/user/resourcemanager..
2019-03-12 07:39:42,884 INFO
org.apache.flink.runtime.taskexecutor.TaskExecutor            - Could not
resolve ResourceManager address akka.tcp://
flink@10.0.0.6:6123/user/resourcemanager, retrying in 10000 ms: Could not
connect to rpc endpoint under address akka.tcp://
flink@10.0.0.6:6123/user/resourcemanager..
2019-03-12 07:39:52,901 INFO
org.apache.flink.runtime.taskexecutor.TaskExecutor            - Could not
resolve ResourceManager address akka.tcp://
flink@10.0.0.6:6123/user/resourcemanager, retrying in 10000 ms: Could not
connect to rpc endpoint under address akka.tcp://
flink@10.0.0.6:6123/user/resourcemanager..
2019-03-12 07:40:02,925 INFO
org.apache.flink.runtime.taskexecutor.TaskExecutor            - Could not
resolve ResourceManager address akka.tcp://
flink@10.0.0.6:6123/user/resourcemanager, retrying in 10000 ms: Could not
connect to rpc endpoint under address akka.tcp://
flink@10.0.0.6:6123/user/resourcemanager..
2019-03-12 07:40:12,939 INFO
org.apache.flink.runtime.taskexecutor.TaskExecutor            - Could not
resolve ResourceManager address akka.tcp://
flink@10.0.0.6:6123/user/resourcemanager, retrying in 10000 ms: Could not
connect to rpc endpoint under address akka.tcp://
flink@10.0.0.6:6123/user/resourcemanager..
2019-03-12 07:40:22,963 INFO
org.apache.flink.runtime.taskexecutor.TaskExecutor            - Could not
resolve ResourceManager address akka.tcp://
flink@10.0.0.6:6123/user/resourcemanager, retrying in 10000 ms: Could not
connect to rpc endpoint under address akka.tcp://
flink@10.0.0.6:6123/user/resourcemanager..
2019-03-12 07:40:32,978 INFO
org.apache.flink.runtime.taskexecutor.TaskExecutor            - Could not
resolve ResourceManager address akka.tcp://
flink@10.0.0.6:6123/user/resourcemanager, retrying in 10000 ms: Could not
connect to rpc endpoint under address akka.tcp://
flink@10.0.0.6:6123/user/resourcemanager..


But the job manager seems to start up ok:

2019-03-12 07:38:36,643 INFO
akka.remote.Remoting                                          - Remoting
started; listening on addresses :[akka.tcp://flink@10.0.0.6:6123]
2019-03-12 07:38:36,659 INFO
org.apache.flink.runtime.rpc.akka.AkkaRpcServiceUtils         - Actor
system started at akka.tcp://flink@10.0.0.6:6123
2019-03-12 07:38:36,690 INFO
org.apache.flink.runtime.blob.BlobServer                      - Created
BLOB server storage directory
C:\cygwin64\tmp\blobStore-85b28100-fa08-4488-9f79-d0d712f34733
2019-03-12 07:38:36,690 INFO
org.apache.flink.runtime.blob.BlobServer                      - Started
BLOB server at 0.0.0.0:54072 - max concurrent requests: 50 - max backlog:
1000
2019-03-12 07:38:36,705 INFO
org.apache.flink.runtime.metrics.MetricRegistryImpl           - No metrics
reporter configured, no metrics will be exposed/reported.
2019-03-12 07:38:36,721 INFO
org.apache.flink.runtime.entrypoint.ClusterEntrypoint         - Trying to
start actor system at 10.0.0.6:0
2019-03-12 07:38:36,737 INFO
akka.event.slf4j.Slf4jLogger                                  - Slf4jLogger
started
2019-03-12 07:38:36,752 INFO
akka.remote.Remoting                                          - Starting
remoting
2019-03-12 07:38:36,768 INFO
akka.remote.Remoting                                          - Remoting
started; listening on addresses :[akka.tcp://flink-metrics@10.0.0.6:54085]
2019-03-12 07:38:36,768 INFO
org.apache.flink.runtime.entrypoint.ClusterEntrypoint         - Actor
system started at akka.tcp://flink-metrics@10.0.0.6:54085
2019-03-12 07:38:36,784 INFO
org.apache.flink.runtime.dispatcher.FileArchivedExecutionGraphStore  -
Initializing FileArchivedExecutionGraphStore: Storage directory
C:\cygwin64\tmp\executionGraphStore-550bff8d-314e-4a04-b10e-93bdc7af80c6,
expiration time 3600000, maximum cache size 52428800 bytes.
2019-03-12 07:38:36,815 INFO
org.apache.flink.runtime.blob.TransientBlobCache              - Created
BLOB cache storage directory
C:\cygwin64\tmp\blobStore-608a5134-9f0d-44dd-8e3d-d9fbe4185d21
2019-03-12 07:38:36,830 WARN
org.apache.flink.runtime.dispatcher.DispatcherRestEndpoint    - Upload
directory
C:\cygwin64\tmp\flink-web-2d9712e2-54cb-428a-a27a-826fa2214dad\flink-web-upload
does not exist, or has been deleted externally. Previously uploaded files
are no longer available.
2019-03-12 07:38:36,830 INFO
org.apache.flink.runtime.dispatcher.DispatcherRestEndpoint    - Created
directory
C:\cygwin64\tmp\flink-web-2d9712e2-54cb-428a-a27a-826fa2214dad\flink-web-upload
for file uploads.
2019-03-12 07:38:36,830 INFO
org.apache.flink.runtime.dispatcher.DispatcherRestEndpoint    - Starting
rest endpoint.
2019-03-12 07:38:37,065 WARN
org.apache.flink.runtime.webmonitor.WebMonitorUtils           - Log file
environment variable 'log.file' is not set.
2019-03-12 07:38:37,065 WARN
org.apache.flink.runtime.webmonitor.WebMonitorUtils           - JobManager
log files are unavailable in the web dashboard. Log file location not found
in environment variable 'log.file' or configuration key 'Key:
'web.log.path' , default: null (deprecated keys:
[jobmanager.web.log.path])'.
2019-03-12 07:38:38,034 INFO
org.apache.flink.runtime.dispatcher.DispatcherRestEndpoint    - Rest
endpoint listening at 10.0.0.6:8081
2019-03-12 07:38:38,034 INFO
org.apache.flink.runtime.dispatcher.DispatcherRestEndpoint    -
http://10.0.0.6:8081 was granted leadership with
leaderSessionID=00000000-0000-0000-0000-000000000000
2019-03-12 07:38:38,034 INFO
org.apache.flink.runtime.dispatcher.DispatcherRestEndpoint    - Web
frontend listening at http://10.0.0.6:8081.
2019-03-12 07:38:38,096 INFO
org.apache.flink.runtime.rpc.akka.AkkaRpcService              - Starting
RPC endpoint for
org.apache.flink.runtime.resourcemanager.StandaloneResourceManager at
akka://flink/user/resourcemanager .
2019-03-12 07:38:38,112 INFO
org.apache.flink.runtime.rpc.akka.AkkaRpcService              - Starting
RPC endpoint for org.apache.flink.runtime.dispatcher.StandaloneDispatcher
at akka://flink/user/dispatcher .
2019-03-12 07:38:38,190 INFO
org.apache.flink.runtime.resourcemanager.StandaloneResourceManager  -
ResourceManager akka.tcp://flink@10.0.0.6:6123/user/resourcemanager was
granted leadership with fencing token 00000000000000000000000000000000
2019-03-12 07:38:38,190 INFO
org.apache.flink.runtime.resourcemanager.slotmanager.SlotManager  -
Starting the SlotManager.
2019-03-12 07:38:38,206 INFO
org.apache.flink.runtime.dispatcher.StandaloneDispatcher      - Dispatcher
akka.tcp://flink@10.0.0.6:6123/user/dispatcher was granted leadership with
fencing token 00000000-0000-0000-0000-000000000000
2019-03-12 07:38:38,221 INFO
org.apache.flink.runtime.dispatcher.StandaloneDispatcher      - Recovering
all persisted jobs.
2019-03-12 07:44:20,564 WARN
akka.remote.transport.netty.NettyTransport                    - Remote
connection to [/10.0.0.7:51057] failed with java.io.IOException: An
existing connection was forcibly closed by the remote host
2019-03-12 07:44:20,564 WARN
akka.remote.ReliableDeliverySupervisor                        - Association
with remote system [akka.tcp://flink@flink-worker1:50978] has failed,
address is now gated for [50] ms. Reason: [Disassociated]


Interestingly, the worker node (flink-worker1) never seems to connect to
the jobmanager since it keeps retrying. But when I force the task manager
to close, job manager reports an error at the end saying the association
has failed. For some reason, none of the job manager managed to connect
even though port 6123 on the job manager is open and listening.

Any suggestion will be appreciated.

Thanks!

Le

Mime
View raw message