flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Hao Sun <ha...@zendesk.com>
Subject Re: StandaloneResourceManager failed to associate with JobManager leader
Date Tue, 22 Aug 2017 16:27:20 GMT
Thanks Till, the DEBUG log level is a good idea. I figured it out. I made a
mistake with `-` and `_`.

On Tue, Aug 22, 2017 at 1:39 AM Till Rohrmann <trohrmann@apache.org> wrote:

> Hi Hao Sun,
>
> have you checked that one can resolve the hostname flink_jobmanager from
> within the container? This is required to connect to the JobManager. If
> this is the case, then log files with DEBUG log level would be helpful to
> track down the problem.
>
> Cheers,
> Till
>
> On Wed, Aug 16, 2017 at 5:35 AM, Hao Sun <hasun@zendesk.com> wrote:
>
>> Hi,
>>
>> I am trying to run a cluster of job-manager and task-manager in docker.
>> One of each for now. I got a StandaloneResourceManager error, stating
>> that it can not associate with job-manager. I do not know what was wrong.
>>
>> I am sure that job-manager can be connected.
>> ===============================
>> root@flink-jobmanager:/opt/flink# telnet flink_jobmanager 32929
>> Trying 172.18.0.3...
>> Connected to flink-jobmanager.
>> Escape character is '^]'.
>> Connection closed by foreign host.
>> ===============================
>>
>> Here is my config:
>> ===============================
>> Starting Job Manager
>> config file:
>> jobmanager.rpc.address: flink_jobmanager
>> jobmanager.rpc.port: 6123
>> jobmanager.web.port: 8081
>> jobmanager.heap.mb: 1024
>> taskmanager.heap.mb: 1024
>> taskmanager.numberOfTaskSlots: 1
>> taskmanager.memory.preallocate: false
>> parallelism.default: 1
>> jobmanager.archive.fs.dir: file:///flink_data/completed-jobs/
>> historyserver.archive.fs.dir: file:///flink_data/completed-jobs/
>> state.backend: rocksdb
>> state.backend.fs.checkpointdir: file:///flink_data/checkpoints
>> taskmanager.tmp.dirs: /flink_data/tmp
>> blob.storage.directory: /flink_data/tmp
>> jobmanager.web.tmpdir: /flink_data/tmp
>> env.log.dir: /flink_data/logs
>> high-availability: zookeeper
>> high-availability.storageDir: file:///flink_data/ha/
>> high-availability.zookeeper.quorum: kafka:2181
>> blob.server.port: 6124
>> query.server.port: 6125
>> ===============================
>>
>> Here is the major error I see:
>> ===============================
>> 2017-08-16 02:46:23,586 INFO
>> org.apache.flink.runtime.leaderretrieval.ZooKeeperLeaderRetrievalService -
>> Starting ZooKeeperLeaderRetrievalService.
>> 2017-08-16 02:46:23,612 INFO
>> org.apache.flink.runtime.jobmanager.JobManager - JobManager
>> akka.tcp://flink@flink_jobmanager:32929/user/jobmanager was granted
>> leadership with leader session ID
>> Some(06abc8f5-c1b9-44b2-bb7f-771c74981552).
>> 2017-08-16 02:46:23,627 INFO
>> org.apache.flink.runtime.jobmanager.JobManager - Delaying recovery of all
>> jobs by 10000 milliseconds.
>> 2017-08-16 02:46:23,638 INFO
>> org.apache.flink.runtime.webmonitor.JobManagerRetriever - New leader
>> reachable under akka.tcp://flink@flink_jobmanager
>> :32929/user/jobmanager:06abc8f5-c1b9-44b2-bb7f-771c74981552.
>> 2017-08-16 02:46:23,640 INFO
>> org.apache.flink.runtime.clusterframework.standalone.StandaloneResourceManager
>> - Trying to associate with JobManager leader
>> akka.tcp://flink@flink_jobmanager:32929/user/jobmanager
>> 2017-08-16 02:46:23,653 WARN
>> org.apache.flink.runtime.webmonitor.JobManagerRetriever - Failed to
>> retrieve leader gateway and port.
>> akka.actor.ActorNotFound: Actor not found for:
>> ActorSelection[Anchor(akka://flink/deadLetters), Path(/)]
>> at
>> akka.actor.ActorSelection$$anonfun$resolveOne$1.apply(ActorSelection.scala:65)
>> at
>> akka.actor.ActorSelection$$anonfun$resolveOne$1.apply(ActorSelection.scala:63)
>> at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:32)
>> at
>> akka.dispatch.BatchingExecutor$AbstractBatch.processBatch(BatchingExecutor.scala:55)
>> at akka.dispatch.BatchingExecutor$Batch.run(BatchingExecutor.scala:73)
>> at
>> akka.dispatch.ExecutionContexts$sameThreadExecutionContext$.unbatchedExecute(Future.scala:74)
>> at
>> akka.dispatch.BatchingExecutor$class.execute(BatchingExecutor.scala:120)
>> at
>> akka.dispatch.ExecutionContexts$sameThreadExecutionContext$.execute(Future.scala:73)
>> at
>> scala.concurrent.impl.CallbackRunnable.executeWithValue(Promise.scala:40)
>> at
>> scala.concurrent.impl.Promise$DefaultPromise.scala$concurrent$impl$Promise$DefaultPromise$$dispatchOrAddCallback(Promise.scala:280)
>> at
>> scala.concurrent.impl.Promise$DefaultPromise.onComplete(Promise.scala:270)
>> at akka.actor.ActorSelection.resolveOne(ActorSelection.scala:63)
>> at
>> org.apache.flink.runtime.akka.AkkaUtils$.getActorRefFuture(AkkaUtils.scala:498)
>> at
>> org.apache.flink.runtime.akka.AkkaUtils.getActorRefFuture(AkkaUtils.scala)
>> at
>> org.apache.flink.runtime.webmonitor.JobManagerRetriever.notifyLeaderAddress(JobManagerRetriever.java:141)
>> at
>> org.apache.flink.runtime.leaderretrieval.ZooKeeperLeaderRetrievalService.nodeChanged(ZooKeeperLeaderRetrievalService.java:168)
>> at org.apache.flink.shaded.org
>> <http://org.apache.flink.shaded.org>
>> .apache.curator.framework.recipes.cache.NodeCache$4.apply(NodeCache.java:310)
>> at org.apache.flink.shaded.org
>> <http://org.apache.flink.shaded.org>
>> .apache.curator.framework.recipes.cache.NodeCache$4.apply(NodeCache.java:304)
>> at org.apache.flink.shaded.org
>> <http://org.apache.flink.shaded.org>
>> .apache.curator.framework.listen.ListenerContainer$1.run(ListenerContainer.java:93)
>> at org.apache.flink.shaded.org
>> <http://org.apache.flink.shaded.org>.
>> apache.curator.shaded.com
>> <http://apache.curator.shaded.com>
>> .google.common.util.concurrent.MoreExecutors$SameThreadExecutorService.execute(MoreExecutors.java:297)
>> at org.apache.flink.shaded.org
>> <http://org.apache.flink.shaded.org>
>> .apache.curator.framework.listen.ListenerContainer.forEach(ListenerContainer.java:85)
>> at org.apache.flink.shaded.org
>> <http://org.apache.flink.shaded.org>
>> .apache.curator.framework.recipes.cache.NodeCache.setNewData(NodeCache.java:302)
>> at org.apache.flink.shaded.org
>> <http://org.apache.flink.shaded.org>
>> .apache.curator.framework.recipes.cache.NodeCache.processBackgroundResult(NodeCache.java:269)
>> at org.apache.flink.shaded.org
>> <http://org.apache.flink.shaded.org>
>> .apache.curator.framework.recipes.cache.NodeCache.access$300(NodeCache.java:56)
>> at org.apache.flink.shaded.org
>> <http://org.apache.flink.shaded.org>
>> .apache.curator.framework.recipes.cache.NodeCache$3.processResult(NodeCache.java:122)
>> at org.apache.flink.shaded.org
>> <http://org.apache.flink.shaded.org>
>> .apache.curator.framework.imps.CuratorFrameworkImpl.sendToBackgroundCallback(CuratorFrameworkImpl.java:749)
>> at org.apache.flink.shaded.org
>> <http://org.apache.flink.shaded.org>
>> .apache.curator.framework.imps.CuratorFrameworkImpl.processBackgroundOperation(CuratorFrameworkImpl.java:522)
>> at org.apache.flink.shaded.org
>> <http://org.apache.flink.shaded.org>
>> .apache.curator.framework.imps.GetDataBuilderImpl$3.processResult(GetDataBuilderImpl.java:257)
>> at
>> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:561)
>> at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498)
>> 2017-08-16 02:46:33,644 INFO
>> org.apache.flink.runtime.jobmanager.JobManager - Attempting to recover all
>> jobs.
>> 2017-08-16 02:46:33,648 INFO
>> org.apache.flink.runtime.jobmanager.JobManager - There are no jobs to
>> recover.
>> ===============================
>>
>> More detailed log:
>> https://gist.github.com/zenhao/19926402438f613c331ffe5b6e6e005d
>> <https://gist.github.com/zenhao/19926402438f613c331ffe5b6e6e005d>
>>
>
>

Mime
View raw message