flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Rinat <r.shari...@cleverdata.ru>
Subject Flink Job Deployment (Not enough resources)
Date Mon, 04 Sep 2017 11:29:38 GMT
Hi everyone, I’ve got the following problem, when I’m trying to submit new job and if cluster
has not enough resources, job submission fails with the following exception
But in YARN job hangs and wait’s for requested resources. When resources become available,
job successfully runs.

What can I do to be sure that job startup is completed successfully or completely failed 
?

Thx.

The program finished with the following exception:\n\njava.lang.RuntimeException: Unable to
tell application master to stop once the specified job has been finised\n\tat org.apache.flink.yarn.YarnClusterClient.stopAfterJob(YarnClusterClient.java:177)\n\tat
org.apache.flink.yarn.YarnClusterClient.submitJob(YarnClusterClient.java:201)\n\tat org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:442)\n\tat
org.apache.flink.client.program.DetachedEnvironment.finalizeExecute(DetachedEnvironment.java:76)\n\tat
org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:387)\n\tat org.apache.flink.client.CliFrontend.executeProgram(CliFrontend.java:838)\n\tat
org.apache.flink.client.CliFrontend.run(CliFrontend.java:259)\n\tat org.apache.flink.client.CliFrontend.parseParameters(CliFrontend.java:1086)\n\tat
org.apache.flink.client.CliFrontend$2.call(CliFrontend.java:1133)\n\tat org.apache.flink.client.CliFrontend$2.call(CliFrontend.java:1130)\n\tat
org.apache.flink.runtime.security.HadoopSecurityContext$1.run(HadoopSecurityContext.java:43)\n\tat
java.security.AccessController.doPrivileged(Native Method)\n\tat javax.security.auth.Subject.doAs(Subject.java:422)\n\tat
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1656)\n\tat
org.apache.flink.runtime.security.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:40)\n\tat
org.apache.flink.client.CliFrontend.main(CliFrontend.java:1129)\nCaused by: org.apache.flink.util.FlinkException:
Could not connect to the leading JobManager. Please check that the JobManager is running.\n\tat
org.apache.flink.client.program.ClusterClient.getJobManagerGateway(ClusterClient.java:789)\n\tat
org.apache.flink.yarn.YarnClusterClient.stopAfterJob(YarnClusterClient.java:171)\n\t... 15
more\nCaused by: org.apache.flink.runtime.leaderretrieval.LeaderRetrievalException: Could
not retrieve the leader gateway.\n\tat org.apache.flink.runtime.util.LeaderRetrievalUtils.retrieveLeaderGateway(LeaderRetrievalUtils.java:79)\n\tat
org.apache.flink.client.program.ClusterClient.getJobManagerGateway(ClusterClient.java:784)\n\t...
16 more\nCaused by: java.util.concurrent.TimeoutException: Futures timed out after [10000
milliseconds]\n\tat scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219)\n\tat
scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223)\n\tat scala.concurrent.Await$$anonfun$result$1.apply(package.scala:190)\n\tat
scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53)\n\tat scala.concurrent.Await$.result(package.scala:190)\n\tat
scala.concurrent.Await.result(package.scala)\n\tat org.apache.flink.runtime.util.LeaderRetrievalUtils.retrieveLeaderGateway(LeaderRetrievalUtils.java:77)\n\t...
17 more", "stderr_lines": ["", "------------------------------------------------------------",
" The program finished with the following exception:", "", "java.lang.RuntimeException: Unable
to tell application master to stop once the specified job has been finised", "\tat org.apache.flink.yarn.YarnClusterClient.stopAfterJob(YarnClusterClient.java:177)",
"\tat org.apache.flink.yarn.YarnClusterClient.submitJob(YarnClusterClient.java:201)", "\tat
org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:442)", "\tat org.apache.flink.client.program.DetachedEnvironment.finalizeExecute(DetachedEnvironment.java:76)",
"\tat org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:387)", "\tat org.apache.flink.client.CliFrontend.executeProgram(CliFrontend.java:838)",
"\tat org.apache.flink.client.CliFrontend.run(CliFrontend.java:259)", "\tat org.apache.flink.client.CliFrontend.parseParameters(CliFrontend.java:1086)",
"\tat org.apache.flink.client.CliFrontend$2.call(CliFrontend.java:1133)", "\tat org.apache.flink.client.CliFrontend$2.call(CliFrontend.java:1130)",
"\tat org.apache.flink.runtime.security.HadoopSecurityContext$1.run(HadoopSecurityContext.java:43)",
"\tat java.security.AccessController.doPrivileged(Native Method)", "\tat javax.security.auth.Subject.doAs(Subject.java:422)",
"\tat org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1656)",
"\tat org.apache.flink.runtime.security.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:40)",
"\tat org.apache.flink.client.CliFrontend.main(CliFrontend.java:1129)", "Caused by: org.apache.flink.util.FlinkException:
Could not connect to the leading JobManager. Please check that the JobManager is running.",
"\tat org.apache.flink.client.program.ClusterClient.getJobManagerGateway(ClusterClient.java:789)",
"\tat org.apache.flink.yarn.YarnClusterClient.stopAfterJob(YarnClusterClient.java:171)", "\t...
15 more", "Caused by: org.apache.flink.runtime.leaderretrieval.LeaderRetrievalException: Could
not retrieve the leader gateway.", "\tat org.apache.flink.runtime.util.LeaderRetrievalUtils.retrieveLeaderGateway(LeaderRetrievalUtils.java:79)",
"\tat org.apache.flink.client.program.ClusterClient.getJobManagerGateway(ClusterClient.java:784)",
"\t... 16 more", "Caused by: java.util.concurrent.TimeoutException: Futures timed out after
[10000 milliseconds]", "\tat scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219)",
"\tat scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223)", "\tat scala.concurrent.Await$$anonfun$result$1.apply(package.scala:190)",
"\tat scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53)"
Mime
View raw message