flink-user-zh mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jimmy.Shao" <sjms...@gmail.com>
Subject Flink Savepoint 超时
Date Fri, 06 Sep 2019 06:38:58 GMT
请问下有谁遇到过在CLI手动触发Flink的Savepoint的时候遇到超时的异常吗?
或者尝试把Job Cancel With Savepoint也是一样的超时错误.
Savepoint是已经配置了存到HDFS上的,
Flink本身Run在Yarn上.
在官网看到一个参数“akka.client.timeout”不知道是不是针对这个的,
但是这个参数生效是要配置在flink-conf.yml里的,
也没办法CLI传递进去.
这样Job没法Cancel, Flink Cluster也就没法重启,死循环了.
感谢!

Setting HADOOP_CONF_DIR=/etc/hadoop/conf because no HADOOP_CONF_DIR was set.
> SLF4J: Class path contains multiple SLF4J bindings.
> SLF4J: Found binding in
> [jar:file:/opt/flink-1.6.0-hdp/lib/phoenix-4.7.0.2.6.3.0-235-client.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: Found binding in
> [jar:file:/opt/flink-1.6.0-hdp/lib/slf4j-log4j12-1.7.7.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an
> explanation.
> 2019-09-05 10:45:41,807 INFO
>  org.apache.flink.yarn.cli.FlinkYarnSessionCli                 - Found Yarn
> properties file under /tmp/.yarn-properties-hive.
> 2019-09-05 10:45:41,807 INFO
>  org.apache.flink.yarn.cli.FlinkYarnSessionCli                 - Found Yarn
> properties file under /tmp/.yarn-properties-hive.
> 2019-09-05 10:45:42,056 INFO
>  org.apache.flink.yarn.cli.FlinkYarnSessionCli                 - YARN
> properties set default parallelism to 1
> 2019-09-05 10:45:42,056 INFO
>  org.apache.flink.yarn.cli.FlinkYarnSessionCli                 - YARN
> properties set default parallelism to 1
> YARN properties set default parallelism to 1
> 2019-09-05 10:45:42,269 INFO  org.apache.hadoop.yarn.client.AHSProxy
>                  - Connecting to Application History server at
> ac13ghdpt2m01.lab-rot.saas.sap.corp/10.116.201.103:10200
> 2019-09-05 10:45:42,276 INFO
>  org.apache.flink.yarn.cli.FlinkYarnSessionCli                 - No path
> for the flink jar passed. Using the location of class
> org.apache.flink.yarn.YarnClusterDescriptor to locate the jar
> 2019-09-05 10:45:42,276 INFO
>  org.apache.flink.yarn.cli.FlinkYarnSessionCli                 - No path
> for the flink jar passed. Using the location of class
> org.apache.flink.yarn.YarnClusterDescriptor to locate the jar
> 2019-09-05 10:45:42,282 WARN
>  org.apache.flink.yarn.AbstractYarnClusterDescriptor           - Neither
> the HADOOP_CONF_DIR nor the YARN_CONF_DIR environment variable is set.The
> Flink YARN Client needs one of these to be set to properly load the Hadoop
> configuration for accessing YARN.
> 2019-09-05 10:45:42,284 INFO
>  org.apache.hadoop.yarn.client.RequestHedgingRMFailoverProxyProvider  -
> Looking for the active RM in [rm1, rm2]...
> 2019-09-05 10:45:42,341 INFO
>  org.apache.hadoop.yarn.client.RequestHedgingRMFailoverProxyProvider  -
> Found active RM [rm1]
> 2019-09-05 10:45:42,345 INFO
>  org.apache.flink.yarn.AbstractYarnClusterDescriptor           - Found
> application JobManager host name 'ac13ghdpt2dn01.lab-rot.saas.sap.corp' and
> port '40192' from supplied application id 'application_1559153472177_52202'
> 2019-09-05 10:45:42,689 WARN
>  org.apache.hadoop.hdfs.shortcircuit.DomainSocketFactory       - The
> short-circuit local reads feature cannot be used because libhadoop cannot
> be loaded.
> Triggering savepoint for job 6399ec2e8fdf4cb7d8481890019554f6.
> Waiting for response...
> ------------------------------------------------------------
>  The program finished with the following exception:
> org.apache.flink.util.FlinkException: Triggering a savepoint for the job
> 6399ec2e8fdf4cb7d8481890019554f6 failed.
>         at
> org.apache.flink.client.cli.CliFrontend.triggerSavepoint(CliFrontend.java:714)
>         at
> org.apache.flink.client.cli.CliFrontend.lambda$savepoint$9(CliFrontend.java:692)
>         at
> org.apache.flink.client.cli.CliFrontend.runClusterAction(CliFrontend.java:979)
>         at
> org.apache.flink.client.cli.CliFrontend.savepoint(CliFrontend.java:689)
>         at
> org.apache.flink.client.cli.CliFrontend.parseParameters(CliFrontend.java:1059)
>         at
> org.apache.flink.client.cli.CliFrontend.lambda$main$11(CliFrontend.java:1120)
>         at java.security.AccessController.doPrivileged(Native Method)
>         at javax.security.auth.Subject.doAs(Subject.java:422)
>         at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1866)
>         at
> org.apache.flink.runtime.security.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:41)
>         at
> org.apache.flink.client.cli.CliFrontend.main(CliFrontend.java:1120)
> Caused by: org.apache.flink.runtime.concurrent.FutureUtils$RetryException:
> Could not complete the operation. Exception is not retryable.
>         at
> org.apache.flink.runtime.concurrent.FutureUtils.lambda$retryOperationWithDelay$5(FutureUtils.java:213)
>         at
> java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:760)
>         at
> java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:736)
>         at
> java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
>         at
> java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1977)
>         at
> org.apache.flink.runtime.concurrent.FutureUtils$Timeout.run(FutureUtils.java:793)
>         at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>         at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>         at
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
>         at
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
>         at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>         at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>         at java.lang.Thread.run(Thread.java:745)
> Caused by: java.util.concurrent.CompletionException:
> java.util.concurrent.TimeoutException
>         at
> java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:292)
>         at
> java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:308)
>         at
> java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:593)
>         at
> java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:577)
>         ... 10 more
> Caused by: java.util.concurrent.TimeoutException
>         ... 8 more
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message