flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Till Rohrmann <trohrm...@apache.org>
Subject Re: StandaloneResourceManager failed to associate with JobManager leader
Date Tue, 22 Aug 2017 08:38:30 GMT
Hi Hao Sun,

have you checked that one can resolve the hostname flink_jobmanager from
within the container? This is required to connect to the JobManager. If
this is the case, then log files with DEBUG log level would be helpful to
track down the problem.

Cheers,
Till

On Wed, Aug 16, 2017 at 5:35 AM, Hao Sun <hasun@zendesk.com> wrote:

> Hi,
>
> I am trying to run a cluster of job-manager and task-manager in docker.
> One of each for now. I got a StandaloneResourceManager error, stating that
> it can not associate with job-manager. I do not know what was wrong.
>
> I am sure that job-manager can be connected.
> ===============================
> root@flink-jobmanager:/opt/flink# telnet flink_jobmanager 32929
> Trying 172.18.0.3...
> Connected to flink-jobmanager.
> Escape character is '^]'.
> Connection closed by foreign host.
> ===============================
>
> Here is my config:
> ===============================
> Starting Job Manager
> config file:
> jobmanager.rpc.address: flink_jobmanager
> jobmanager.rpc.port: 6123
> jobmanager.web.port: 8081
> jobmanager.heap.mb: 1024
> taskmanager.heap.mb: 1024
> taskmanager.numberOfTaskSlots: 1
> taskmanager.memory.preallocate: false
> parallelism.default: 1
> jobmanager.archive.fs.dir: file:///flink_data/completed-jobs/
> historyserver.archive.fs.dir: file:///flink_data/completed-jobs/
> state.backend: rocksdb
> state.backend.fs.checkpointdir: file:///flink_data/checkpoints
> taskmanager.tmp.dirs: /flink_data/tmp
> blob.storage.directory: /flink_data/tmp
> jobmanager.web.tmpdir: /flink_data/tmp
> env.log.dir: /flink_data/logs
> high-availability: zookeeper
> high-availability.storageDir: file:///flink_data/ha/
> high-availability.zookeeper.quorum: kafka:2181
> blob.server.port: 6124
> query.server.port: 6125
> ===============================
>
> Here is the major error I see:
> ===============================
> 2017-08-16 02:46:23,586 INFO org.apache.flink.runtime.leaderretrieval.
> ZooKeeperLeaderRetrievalService - Starting ZooKeeperLeaderRetrievalServic
> e.
> 2017-08-16 02:46:23,612 INFO org.apache.flink.runtime.jobmanager.JobManager
> - JobManager akka.tcp://flink@flink_jobmanager:32929/user/jobmanager was
> granted leadership with leader session ID Some(06abc8f5-c1b9-44b2-bb7f-
> 771c74981552).
> 2017-08-16 02:46:23,627 INFO org.apache.flink.runtime.jobmanager.JobManager
> - Delaying recovery of all jobs by 10000 milliseconds.
> 2017-08-16 02:46:23,638 INFO org.apache.flink.runtime.webmonitor.JobManagerRetriever
> - New leader reachable under akka.tcp://flink@flink_jobmanager:32929/user/
> jobmanager:06abc8f5-c1b9-44b2-bb7f-771c74981552.
> 2017-08-16 02:46:23,640 INFO org.apache.flink.runtime.
> clusterframework.standalone.StandaloneResourceManager - Trying to
> associate with JobManager leader akka.tcp://flink@flink_
> jobmanager:32929/user/jobmanager
> 2017-08-16 02:46:23,653 WARN org.apache.flink.runtime.webmonitor.JobManagerRetriever
> - Failed to retrieve leader gateway and port.
> akka.actor.ActorNotFound: Actor not found for:
> ActorSelection[Anchor(akka://flink/deadLetters), Path(/)]
> at akka.actor.ActorSelection$$anonfun$resolveOne$1.apply(
> ActorSelection.scala:65)
> at akka.actor.ActorSelection$$anonfun$resolveOne$1.apply(
> ActorSelection.scala:63)
> at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:32)
> at akka.dispatch.BatchingExecutor$AbstractBatch.processBatch(
> BatchingExecutor.scala:55)
> at akka.dispatch.BatchingExecutor$Batch.run(BatchingExecutor.scala:73)
> at akka.dispatch.ExecutionContexts$sameThreadExecutionContext$.
> unbatchedExecute(Future.scala:74)
> at akka.dispatch.BatchingExecutor$class.execute(BatchingExecutor.
> scala:120)
> at akka.dispatch.ExecutionContexts$sameThreadExecutionContext$.
> execute(Future.scala:73)
> at scala.concurrent.impl.CallbackRunnable.executeWithValue(Promise.
> scala:40)
> at scala.concurrent.impl.Promise$DefaultPromise.scala$
> concurrent$impl$Promise$DefaultPromise$$dispatchOrAddCallback(Promise.
> scala:280)
> at scala.concurrent.impl.Promise$DefaultPromise.onComplete(
> Promise.scala:270)
> at akka.actor.ActorSelection.resolveOne(ActorSelection.scala:63)
> at org.apache.flink.runtime.akka.AkkaUtils$.getActorRefFuture(
> AkkaUtils.scala:498)
> at org.apache.flink.runtime.akka.AkkaUtils.getActorRefFuture(
> AkkaUtils.scala)
> at org.apache.flink.runtime.webmonitor.JobManagerRetriever.
> notifyLeaderAddress(JobManagerRetriever.java:141)
> at org.apache.flink.runtime.leaderretrieval.ZooKeeperLeaderRetrievalServic
> e.nodeChanged(ZooKeeperLeaderRetrievalService.java:168)
> at org.apache.flink.shaded.org.apache.curator.framework.
> recipes.cache.NodeCache$4.apply(NodeCache.java:310)
> at org.apache.flink.shaded.org.apache.curator.framework.
> recipes.cache.NodeCache$4.apply(NodeCache.java:304)
> at org.apache.flink.shaded.org.apache.curator.framework.
> listen.ListenerContainer$1.run(ListenerContainer.java:93)
> at org.apache.flink.shaded.org.apache.curator.shaded.com.
> google.common.util.concurrent.MoreExecutors$SameThreadExecutorService.
> execute(MoreExecutors.java:297)
> at org.apache.flink.shaded.org.apache.curator.framework.
> listen.ListenerContainer.forEach(ListenerContainer.java:85)
> at org.apache.flink.shaded.org.apache.curator.framework.
> recipes.cache.NodeCache.setNewData(NodeCache.java:302)
> at org.apache.flink.shaded.org.apache.curator.framework.
> recipes.cache.NodeCache.processBackgroundResult(NodeCache.java:269)
> at org.apache.flink.shaded.org.apache.curator.framework.
> recipes.cache.NodeCache.access$300(NodeCache.java:56)
> at org.apache.flink.shaded.org.apache.curator.framework.
> recipes.cache.NodeCache$3.processResult(NodeCache.java:122)
> at org.apache.flink.shaded.org.apache.curator.framework.imps.
> CuratorFrameworkImpl.sendToBackgroundCallback(
> CuratorFrameworkImpl.java:749)
> at org.apache.flink.shaded.org.apache.curator.framework.imps.
> CuratorFrameworkImpl.processBackgroundOperation(
> CuratorFrameworkImpl.java:522)
> at org.apache.flink.shaded.org.apache.curator.framework.imps.
> GetDataBuilderImpl$3.processResult(GetDataBuilderImpl.java:257)
> at org.apache.zookeeper.ClientCnxn$EventThread.
> processEvent(ClientCnxn.java:561)
> at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498)
> 2017-08-16 02:46:33,644 INFO org.apache.flink.runtime.jobmanager.JobManager
> - Attempting to recover all jobs.
> 2017-08-16 02:46:33,648 INFO org.apache.flink.runtime.jobmanager.JobManager
> - There are no jobs to recover.
> ===============================
>
> More detailed log:
> https://gist.github.com/zenhao/19926402438f613c331ffe5b6e6e005d
>

Mime
View raw message