flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bruno Aranda <brunoara...@gmail.com>
Subject Re: Can't run flink on yarn on version 1.2.0
Date Thu, 23 Feb 2017 09:43:25 GMT
Hi,

Good you found a solution, but are you sure it is the JDK version?

We are running Flink 1.2.0 on Yarn on an AWS EMR Cluster with no issues,
using JDK 8 (1.8.0_121).

Cheers,

Bruno

On Thu, 23 Feb 2017 at 09:26 Howard,Li(vip.com) <howard.li@vipshop.com>
wrote:

> Hi All:
>
>          We finally find out the problem.
>
>          The Flink on Yarn only works on JDK7, but not JDK8. If you use
> JDK8, you may meet the problem discussed before.
>
>          For more information: OS: CentOS 6.6. JDK7 version: 1.7.0u75 JDK8
> version: 1.8.0u111.
>
>
>
>          This problem may have some relationship with akka.
>
>
>
> *发件人:* Till Rohrmann [mailto:trohrmann@apache.org]
> *发送时间:* 2017年2月17日 18:33
>
> *收件人:* user@flink.apache.org
> *主题:* Re: Can't run flink on yarn on version 1.2.0
>
>
>
> Hi Howard,
>
>
>
> could you check whether the JobManager's actor system was bound to "
> vip-rc-vsubu.vclound.com:55926"? You should see that in the job manager
> logs. Furthermore, have you checked that you Yarn cluster nodes are
> actually reachable from the node where you start the Flink application? If
> so, the logs of the cli client as well as the JobManager logs (both on
> debug level) would be tremendously helpful.
>
>
>
> Cheers,
>
> Till
>
>
>
> On Fri, Feb 17, 2017 at 10:41 AM, Howard,Li(vip.com) <
> howard.li@vipshop.com> wrote:
>
> Sorry for the confusion I made. I do copy the wrong log, but we do meet
> this problem on 1.2.0.
>
> for version 1.1.4 however, we meet this in one cluster but not in another.
> We are still trying to figure out what happened.
>
>
>
> The following is the log for 1.2.0 version:
>
>
>
> 2017-02-17 15:51:37,775 INFO
> org.apache.flink.yarn.cli.FlinkYarnSessionCli                 - No path for
> the flink jar passed. Using the location of class
> org.apache.flink.yarn.YarnClusterDescriptor to locate the jar
>
> 2017-02-17 15:51:37,775 INFO
> org.apache.flink.yarn.cli.FlinkYarnSessionCli                 - No path for
> the flink jar passed. Using the location of class
> org.apache.flink.yarn.YarnClusterDescriptor to locate the jar
>
> 2017-02-17 15:51:37,803 INFO
> org.apache.flink.yarn.YarnClusterDescriptor                   - Using
> values:
>
> 2017-02-17 15:51:37,804 INFO
> org.apache.flink.yarn.YarnClusterDescriptor                   -
> TaskManager count = 2
>
> 2017-02-17 15:51:37,804 INFO
> org.apache.flink.yarn.YarnClusterDescriptor                   -
> JobManager memory = 1024
>
> 2017-02-17 15:51:37,804 INFO
> org.apache.flink.yarn.YarnClusterDescriptor                   -
> TaskManager memory = 1024
>
> 2017-02-17 15:51:37,827 INFO
> org.apache.hadoop.yarn.client.RMProxy                         - Connecting
> to ResourceManager at /0.0.0.0:8032
>
> 2017-02-17 15:51:38,672 WARN
> org.apache.flink.yarn.YarnClusterDescriptor                   - The
> configuration directory ('/home/software/flink-1.2.0/conf') contains both
> LOG4J and Logback configuration files. Please delete or rename one of them.
>
> 2017-02-17 15:51:38,685 INFO
> org.apache.flink.yarn.Utils                                   - Copying
> from file:/home/software/flink-1.2.0/examples/batch/WordCount.jar to hdfs://
> 10.199.202.161:9000/user/root/.flink/application_1487247313588_0016/WordCount.jar
>
> 2017-02-17 15:51:38,992 INFO
> org.apache.flink.yarn.Utils                                   - Copying
> from file:/home/software/flink-1.2.0/conf/log4j.properties to hdfs://
> 10.199.202.161:9000/user/root/.flink/application_1487247313588_0016/log4j.properties
>
> 2017-02-17 15:51:39,058 INFO
> org.apache.flink.yarn.Utils                                   - Copying
> from file:/home/software/flink-1.2.0/conf/logback.xml to hdfs://
> 10.199.202.161:9000/user/root/.flink/application_1487247313588_0016/logback.xml
>
> 2017-02-17 15:51:39,085 INFO
> org.apache.flink.yarn.Utils                                   - Copying
> from file:/home/software/flink-1.2.0/lib to hdfs://
> 10.199.202.161:9000/user/root/.flink/application_1487247313588_0016/lib
>
> 2017-02-17 15:51:39,695 INFO
> org.apache.flink.yarn.Utils                                   - Copying
> from file:/home/software/flink-1.2.0/lib/flink-dist_2.11-1.2.0.jar to
> hdfs://
> 10.199.202.161:9000/user/root/.flink/application_1487247313588_0016/flink-dist_2.11-1.2.0.jar
>
> 2017-02-17 15:51:40,493 INFO
> org.apache.flink.yarn.Utils                                   - Copying
> from /home/software/flink-1.2.0/conf/flink-conf.yaml to hdfs://
> 10.199.202.161:9000/user/root/.flink/application_1487247313588_0016/flink-conf.yaml
>
> 2017-02-17 15:51:40,547 INFO
> org.apache.flink.yarn.YarnClusterDescriptor                   - Submitting
> application master application_1487247313588_0016
>
> 2017-02-17 15:51:40,585 INFO
> org.apache.hadoop.yarn.client.api.impl.YarnClientImpl         - Submitted
> application application_1487247313588_0016
>
> 2017-02-17 15:51:40,585 INFO
> org.apache.flink.yarn.YarnClusterDescriptor                   - Waiting for
> the cluster to be allocated
>
> 2017-02-17 15:51:40,587 INFO
> org.apache.flink.yarn.YarnClusterDescriptor                   - Deploying
> cluster, current state ACCEPTED
>
> 2017-02-17 15:51:45,879 INFO
> org.apache.flink.yarn.YarnClusterDescriptor                   - YARN
> application has been deployed successfully.
>
> Cluster started: Yarn cluster with application id
> application_1487247313588_0016
>
> Using address vip-rc-vsubu.vclound.com:55926 to connect to JobManager.
>
> JobManager web interface address
> http://vip-rc-ucsww.vclound.com:8088/proxy/application_1487247313588_0016/
>
> Using the parallelism provided by the remote cluster (8). To use another
> parallelism, set it at the ./bin/flink client.
>
> Starting execution of program
>
> 2017-02-17 15:51:46,704 INFO
> org.apache.flink.yarn.YarnClusterClient                       - Starting
> program in interactive mode
>
> Executing WordCount example with default input data set.
>
> Use --input to specify file input.
>
> Printing result to stdout. Use --output to specify output path.
>
> 2017-02-17 15:51:47,029 INFO
> org.apache.flink.yarn.YarnClusterClient                       - Waiting
> until all TaskManagers have connected
>
> Waiting until all TaskManagers have connected
>
> 2017-02-17 15:51:47,029 INFO
> org.apache.flink.yarn.YarnClusterClient                       - Starting
> client actor system.
>
>
>
> ------------------------------------------------------------
>
> The program finished with the following exception:
>
>
>
> org.apache.flink.client.program.ProgramInvocationException: The main
> method caused an error.
>
>          at
> org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:545)
>
>          at
> org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:419)
>
>          at
> org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:339)
>
>          at
> org.apache.flink.client.CliFrontend.executeProgram(CliFrontend.java:831)
>
>          at org.apache.flink.client.CliFrontend.run(CliFrontend.java:256)
>
>          at
> org.apache.flink.client.CliFrontend.parseParameters(CliFrontend.java:1073)
>
>          at
> org.apache.flink.client.CliFrontend$2.call(CliFrontend.java:1120)
>
>          at
> org.apache.flink.client.CliFrontend$2.call(CliFrontend.java:1117)
>
>          at
> org.apache.flink.runtime.security.HadoopSecurityContext$1.run(HadoopSecurityContext.java:43)
>
>          at java.security.AccessController.doPrivileged(Native Method)
>
>          at javax.security.auth.Subject.doAs(Subject.java:422)
>
>          at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
>
>          at
> org.apache.flink.runtime.security.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:40)
>
>          at org.apache.flink.client.CliFrontend.main(CliFrontend.java:1116)
>
> Caused by: java.lang.RuntimeException: Unable to get ClusterClient status
> from Application Client
>
>          at
> org.apache.flink.yarn.YarnClusterClient.getClusterStatus(YarnClusterClient.java:248)
>
>          at
> org.apache.flink.yarn.YarnClusterClient.waitForClusterToBeReady(YarnClusterClient.java:520)
>
>          at
> org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:412)
>
>          at
> org.apache.flink.yarn.YarnClusterClient.submitJob(YarnClusterClient.java:210)
>
>          at
> org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:400)
>
>          at
> org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:387)
>
>          at
> org.apache.flink.client.program.ContextEnvironment.execute(ContextEnvironment.java:62)
>
>          at
> org.apache.flink.api.java.ExecutionEnvironment.execute(ExecutionEnvironment.java:926)
>
>          at org.apache.flink.api.java.DataSet.collect(DataSet.java:410)
>
>          at org.apache.flink.api.java.DataSet.print(DataSet.java:1605)
>
>          at
> org.apache.flink.examples.java.wordcount.WordCount.main(WordCount.java:92)
>
>          at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>
>          at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>
>          at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>
>          at java.lang.reflect.Method.invoke(Method.java:498)
>
>          at
> org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:528)
>
>          ... 13 more
>
> Caused by:
> org.apache.flink.runtime.leaderretrieval.LeaderRetrievalException: Could
> not retrieve the leader gateway
>
>          at
> org.apache.flink.runtime.util.LeaderRetrievalUtils.retrieveLeaderGateway(LeaderRetrievalUtils.java:141)
>
>          at
> org.apache.flink.client.program.ClusterClient.getJobManagerGateway(ClusterClient.java:691)
>
>          at
> org.apache.flink.yarn.YarnClusterClient.getClusterStatus(YarnClusterClient.java:242)
>
>          ... 28 more
>
> Caused by: java.util.concurrent.TimeoutException: Futures timed out after
> [10000 milliseconds]
>
>          at
> scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219)
>
>          at
> scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223)
>
>          at
> scala.concurrent.Await$$anonfun$result$1.apply(package.scala:190)
>
>          at
> scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53)
>
>          at scala.concurrent.Await$.result(package.scala:190)
>
>          at scala.concurrent.Await.result(package.scala)
>
>          at
> org.apache.flink.runtime.util.LeaderRetrievalUtils.retrieveLeaderGateway(LeaderRetrievalUtils.java:139)
>
>          ... 30 more
>
> 2017-02-17 15:52:21,145 INFO
> org.apache.flink.yarn.YarnClusterClient                       - Sending
> shutdown request to the Application Master
>
> 2017-02-17 15:52:21,145 INFO
> org.apache.flink.yarn.YarnClusterClient                       - Start
> application client.
>
> 2017-02-17 15:52:21,151 WARN
> org.apache.flink.yarn.YarnClusterClient                       - YARN
> reported application state FAILED
>
> 2017-02-17 15:52:21,152 WARN
> org.apache.flink.yarn.YarnClusterClient                       -
> Diagnostics: Application application_1487247313588_0016 failed 1 times due
> to AM Container for appattempt_1487247313588_0016_000001 exited with
> exitCode: -103
>
> For more detailed output, check application tracking page:
> http://vip-rc-ucsww.vclound.com:8088/cluster/app/application_1487247313588_0016Then,
> click on links to logs of each attempt.
>
> Diagnostics: Container
> [pid=18590,containerID=container_1487247313588_0016_01_000001] is running
> beyond virtual memory limits. Current usage: 266.1 MB of 1 GB physical
> memory used; 2.2 GB of 2.1 GB virtual memory used. Killing container.
>
> Dump of the process-tree for container_1487247313588_0016_01_000001 :
>
>          |- PID PPID PGRPID SESSID CMD_NAME USER_MODE_TIME(MILLIS)
> SYSTEM_TIME(MILLIS) VMEM_USAGE(BYTES) RSSMEM_USAGE(PAGES) FULL_CMD_LINE
>
>          |- 18598 18590 18590 18590 (java) 894 48 2294116352
> <(229)%20411-6352> 67782 /home/software/jdk1.8.0_111/bin/java -Xmx424M
> -Dlog.file=/home/software/hadoop-2.7.3/logs/userlogs/application_1487247313588_0016/container_1487247313588_0016_01_000001/jobmanager.log
> -Dlogback.configurationFile=file:logback.xml
> -Dlog4j.configuration=file:log4j.properties
> org.apache.flink.yarn.YarnApplicationMasterRunner
>
>          |- 18590 18588 18590 18590 (bash) 0 0 108605440 334 /bin/bash -c
> /home/software/jdk1.8.0_111/bin/java -Xmx424M
> -Dlog.file=/home/software/hadoop-2.7.3/logs/userlogs/application_1487247313588_0016/container_1487247313588_0016_01_000001/jobmanager.log
> -Dlogback.configurationFile=file:logback.xml
> -Dlog4j.configuration=file:log4j.properties
> org.apache.flink.yarn.YarnApplicationMasterRunner
> 1>/home/software/hadoop-2.7.3/logs/userlogs/application_1487247313588_0016/container_1487247313588_0016_01_000001/jobmanager.out
> 2>/home/software/hadoop-2.7.3/logs/userlogs/application_1487247313588_0016/container_1487247313588_0016_01_000001/jobmanager.err
>
>
>
>
> Container killed on request. Exit code is 143
>
> Container exited with a non-zero exit code 143
>
> Failing this attempt. Failing the application.
>
> 2017-02-17 15:52:21,160 INFO
> org.apache.flink.yarn.ApplicationClient                       -
> Notification about new leader address akka.tcp://
> flink@vip-rc-vsubu.vclound.com:55926/user/jobmanager with session ID null.
>
> 2017-02-17 15:52:21,163 INFO
> org.apache.flink.yarn.ApplicationClient                       - Sending
> StopCluster request to JobManager.
>
> 2017-02-17 15:52:21,164 INFO
> org.apache.flink.yarn.ApplicationClient                       - Received
> address of new leader akka.tcp://
> flink@vip-rc-vsubu.vclound.com:55926/user/jobmanager with session ID null.
>
> 2017-02-17 15:52:21,165 INFO
> org.apache.flink.yarn.ApplicationClient                       - Disconnect
> from JobManager null.
>
> 2017-02-17 15:52:21,168 INFO  org.apache.flink.yarn.ApplicationClient
>                      - Trying to register at JobManager akka.tcp://
> flink@vip-rc-vsubu.vclound.com:55926/user/jobmanager.
>
> 2017-02-17 15:52:21,684 INFO
> org.apache.flink.yarn.ApplicationClient                       - Trying to
> register at JobManager akka.tcp://
> flink@vip-rc-vsubu.vclound.com:55926/user/jobmanager.
>
> 2017-02-17 15:52:22,174 INFO
> org.apache.flink.yarn.ApplicationClient                       - Sending
> StopCluster request to JobManager.
>
> 2017-02-17 15:52:22,704 INFO
> org.apache.flink.yarn.ApplicationClient                       - Trying to
> register at JobManager akka.tcp://
> flink@vip-rc-vsubu.vclound.com:55926/user/jobmanager.
>
> 2017-02-17 15:52:23,194 INFO
> org.apache.flink.yarn.ApplicationClient                       - Sending
> StopCluster request to JobManager.
>
> 2017-02-17 15:52:24,214 INFO
> org.apache.flink.yarn.ApplicationClient                       - Sending
> StopCluster request to JobManager.
>
> 2017-02-17 15:52:24,725 INFO
> org.apache.flink.yarn.ApplicationClient                       - Trying to
> register at JobManager akka.tcp://
> flink@vip-rc-vsubu.vclound.com:55926/user/jobmanager.
>
> 2017-02-17 15:52:25,234 INFO
> org.apache.flink.yarn.ApplicationClient                       - Sending
> StopCluster request to JobManager.
>
> 2017-02-17 15:52:26,254 INFO
> org.apache.flink.yarn.ApplicationClient                       - Sending
> StopCluster request to JobManager.
>
> 2017-02-17 15:52:27,274 INFO
> org.apache.flink.yarn.ApplicationClient                       - Sending
> StopCluster request to JobManager.
>
> 2017-02-17 15:52:28,294 INFO
> org.apache.flink.yarn.ApplicationClient                       - Sending
> StopCluster request to JobManager.
>
> 2017-02-17 15:52:28,744 INFO
> org.apache.flink.yarn.ApplicationClient                       - Trying to
> register at JobManager akka.tcp://
> flink@vip-rc-vsubu.vclound.com:55926/user/jobmanager.
>
> 2017-02-17 15:52:29,314 INFO
> org.apache.flink.yarn.ApplicationClient                       - Sending
> StopCluster request to JobManager.
>
> 2017-02-17 15:52:30,334 INFO
> org.apache.flink.yarn.ApplicationClient                       - Sending
> StopCluster request to JobManager.
>
> 2017-02-17 15:52:31,155 WARN
> org.apache.flink.yarn.YarnClusterClient                       - Error while
> stopping YARN cluster.
>
> java.util.concurrent.TimeoutException: Futures timed out after [10000
> milliseconds]
>
>          at
> scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219)
>
>          at
> scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:153)
>
>          at
> scala.concurrent.Await$$anonfun$ready$1.apply(package.scala:169)
>
>          at
> scala.concurrent.Await$$anonfun$ready$1.apply(package.scala:169)
>
>          at
> scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53)
>
>          at scala.concurrent.Await$.ready(package.scala:169)
>
>          at scala.concurrent.Await.ready(package.scala)
>
>          at
> org.apache.flink.yarn.YarnClusterClient.shutdownCluster(YarnClusterClient.java:372)
>
>          at
> org.apache.flink.yarn.YarnClusterClient.finalizeCluster(YarnClusterClient.java:342)
>
>          at
> org.apache.flink.client.program.ClusterClient.shutdown(ClusterClient.java:208)
>
>          at org.apache.flink.client.CliFrontend.run(CliFrontend.java:263)
>
>          at
> org.apache.flink.client.CliFrontend.parseParameters(CliFrontend.java:1073)
>
>          at
> org.apache.flink.client.CliFrontend$2.call(CliFrontend.java:1120)
>
>          at
> org.apache.flink.client.CliFrontend$2.call(CliFrontend.java:1117)
>
>          at
> org.apache.flink.runtime.security.HadoopSecurityContext$1.run(HadoopSecurityContext.java:43)
>
>          at java.security.AccessController.doPrivileged(Native Method)
>
>          at javax.security.auth.Subject.doAs(Subject.java:422)
>
>          at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
>
>          at
> org.apache.flink.runtime.security.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:40)
>
>          at org.apache.flink.client.CliFrontend.main(CliFrontend.java:1116)
>
> 2017-02-17 15:52:31,156 INFO
> org.apache.flink.yarn.YarnClusterClient                       - Deleting
> files in hdfs://
> 10.199.202.161:9000/user/root/.flink/application_1487247313588_0016
>
> 2017-02-17 15:52:31,354 INFO
> org.apache.flink.yarn.ApplicationClient                       - Sending
> StopCluster request to JobManager.
>
> 2017-02-17 15:52:32,163 INFO
> org.apache.flink.yarn.YarnClusterClient                       - Application
> application_1487247313588_0016 finished with state FAILED and final state
> FAILED at 1487317906227
>
> 2017-02-17 15:52:32,163 WARN
> org.apache.flink.yarn.YarnClusterClient                       - Application
> failed. Diagnostics Application application_1487247313588_0016 failed 1
> times due to AM Container for appattempt_1487247313588_0016_000001 exited
> with  exitCode: -103
>
> For more detailed output, check application tracking page:
> http://vip-rc-ucsww.vclound.com:8088/cluster/app/application_1487247313588_0016Then,
> click on links to logs of each attempt.
>
> Diagnostics: Container
> [pid=18590,containerID=container_1487247313588_0016_01_000001] is running
> beyond virtual memory limits. Current usage: 266.1 MB of 1 GB physical
> memory used; 2.2 GB of 2.1 GB virtual memory used. Killing container.
>
> Dump of the process-tree for container_1487247313588_0016_01_000001 :
>
>          |- PID PPID PGRPID SESSID CMD_NAME USER_MODE_TIME(MILLIS)
> SYSTEM_TIME(MILLIS) VMEM_USAGE(BYTES) RSSMEM_USAGE(PAGES) FULL_CMD_LINE
>
>          |- 18598 18590 18590 18590 (java) 894 48 2294116352
> <(229)%20411-6352> 67782 /home/software/jdk1.8.0_111/bin/java -Xmx424M
> -Dlog.file=/home/software/hadoop-2.7.3/logs/userlogs/application_1487247313588_0016/container_1487247313588_0016_01_000001/jobmanager.log
> -Dlogback.configurationFile=file:logback.xml
> -Dlog4j.configuration=file:log4j.properties
> org.apache.flink.yarn.YarnApplicationMasterRunner
>
>          |- 18590 18588 18590 18590 (bash) 0 0 108605440 334 /bin/bash -c
> /home/software/jdk1.8.0_111/bin/java -Xmx424M
> -Dlog.file=/home/software/hadoop-2.7.3/logs/userlogs/application_1487247313588_0016/container_1487247313588_0016_01_000001/jobmanager.log
> -Dlogback.configurationFile=file:logback.xml
> -Dlog4j.configuration=file:log4j.properties
> org.apache.flink.yarn.YarnApplicationMasterRunner
> 1>/home/software/hadoop-2.7.3/logs/userlogs/application_1487247313588_0016/container_1487247313588_0016_01_000001/jobmanager.out
> 2>/home/software/hadoop-2.7.3/logs/userlogs/application_1487247313588_0016/container_1487247313588_0016_01_000001/jobmanager.err
>
>
>
>
> Container killed on request. Exit code is 143
>
> Container exited with a non-zero exit code 143
>
> Failing this attempt. Failing the application.
>
> 2017-02-17 15:52:32,164 WARN
> org.apache.flink.yarn.YarnClusterClient                       - If log
> aggregation is activated in the Hadoop cluster, we recommend to retrieve
> the full application log using this command:
>
>          yarn logs -applicationId application_1487247313588_0016
>
> (It sometimes takes a few seconds until the logs are aggregated)
>
> 2017-02-17 15:52:32,164 INFO
> org.apache.flink.yarn.YarnClusterClient                       - YARN Client
> is shutting down
>
> 2017-02-17 15:52:32,267 INFO
> org.apache.flink.yarn.ApplicationClient                       - Stopped
> Application client.
>
> 2017-02-17 15:52:32,267 INFO
> org.apache.flink.yarn.ApplicationClient                       - Disconnect
> from JobManager null.
>
>
>
>
>
> *发件人:* Bruno Aranda [mailto:brunoaranda@gmail.com]
> *发送时间:* 2017年2月17日 17:02
> *收件人:* user@flink.apache.org
> *主题:* Re: Can't run flink on yarn on version 1.2.0
>
>
>
> Hi Howard,
>
>
>
> We run Flink 1.2 in Yarn without issues. Sorry I don't have any specific
> solution, but are you sure you don't have some sort of Flink mix? In your
> logs I can see:
>
>
>
> *The configuration directory ('/home/software/flink-1.1.4/conf') contains
> both LOG4J and Logback configuration files. Please delete or rename one of
> them.*
>
>
>
> Where it mentions 1.1.4 in the folder for the conf dir instead of 1.2.
>
>
>
> Cheers,
>
>
>
> Bruno
>
>
>
> On Fri, 17 Feb 2017 at 08:50 Howard,Li(vip.com) <howard.li@vipshop.com>
> wrote:
>
> Hi,
>
>          I’m trying to run flink on yarn by using command: bin/flink run
> -m yarn-cluster -yn 2 -ys 4 ./examples/batch/WordCount.jar
>
>          But I got the following error:
>
>
>
> 2017-02-17 15:52:40,746 INFO
> org.apache.flink.yarn.cli.FlinkYarnSessionCli                 - No path for
> the flink jar passed. Using the location of class
> org.apache.flink.yarn.YarnClusterDescriptor to locate the jar
>
> 2017-02-17 15:52:40,746 INFO
> org.apache.flink.yarn.cli.FlinkYarnSessionCli                 - No path for
> the flink jar passed. Using the location of class
> org.apache.flink.yarn.YarnClusterDescriptor to locate the jar
>
> 2017-02-17 15:52:40,775 INFO
> org.apache.flink.yarn.YarnClusterDescriptor                   - Using
> values:
>
> 2017-02-17 15:52:40,775 INFO
> org.apache.flink.yarn.YarnClusterDescriptor                   -
> TaskManager count = 2
>
> 2017-02-17 15:52:40,775 INFO
> org.apache.flink.yarn.YarnClusterDescriptor                   -
> JobManager memory = 1024
>
> 2017-02-17 15:52:40,775 INFO
> org.apache.flink.yarn.YarnClusterDescriptor                   -
> TaskManager memory = 1024
>
> 2017-02-17 15:52:40,796 INFO
> org.apache.hadoop.yarn.client.RMProxy                         - Connecting
> to ResourceManager at /0.0.0.0:8032
>
> 2017-02-17 15:52:41,680 WARN
> org.apache.flink.yarn.YarnClusterDescriptor                   - The
> configuration directory ('/home/software/flink-1.1.4/conf') contains both
> LOG4J and Logback configuration files. Please delete or rename one of them.
>
> 2017-02-17 15:52:41,702 INFO
> org.apache.flink.yarn.Utils                                   - Copying
> from file:/home/software/flink-1.1.4/conf/logback.xml to hdfs://
> 10.199.202.161:9000/user/root/.flink/application_1487247313588_0017/logback.xml
>
> 2017-02-17 15:52:42,025 INFO
> org.apache.flink.yarn.Utils                                   - Copying
> from file:/home/software/flink-1.1.4/lib to hdfs://
> 10.199.202.161:9000/user/root/.flink/application_1487247313588_0017/lib
>
> 2017-02-17 15:52:42,695 INFO
> org.apache.flink.yarn.Utils                                   - Copying
> from file:/home/software/flink-1.1.4/conf/log4j.properties to hdfs://
> 10.199.202.161:9000/user/root/.flink/application_1487247313588_0017/log4j.properties
>
> 2017-02-17 15:52:42,722 INFO
> org.apache.flink.yarn.Utils                                   - Copying
> from file:/home/software/flink-1.1.4/lib/flink-dist_2.10-1.1.4.jar to
> hdfs://
> 10.199.202.161:9000/user/root/.flink/application_1487247313588_0017/flink-dist_2.10-1.1.4.jar
>
> 2017-02-17 15:52:43,346 INFO
> org.apache.flink.yarn.Utils                                   - Copying
> from /home/software/flink-1.1.4/conf/flink-conf.yaml to hdfs://
> 10.199.202.161:9000/user/root/.flink/application_1487247313588_0017/flink-conf.yaml
>
> 2017-02-17 15:52:43,386 INFO
> org.apache.flink.yarn.YarnClusterDescriptor                   - Submitting
> application master application_1487247313588_0017
>
> 2017-02-17 15:52:43,425 INFO
> org.apache.hadoop.yarn.client.api.impl.YarnClientImpl         - Submitted
> application application_1487247313588_0017
>
> 2017-02-17 15:52:43,425 INFO
> org.apache.flink.yarn.YarnClusterDescriptor                   - Waiting for
> the cluster to be allocated
>
> 2017-02-17 15:52:43,427 INFO
> org.apache.flink.yarn.YarnClusterDescriptor                   - Deploying
> cluster, current state ACCEPTED
>
> 2017-02-17 15:52:48,471 INFO
> org.apache.flink.yarn.YarnClusterDescriptor                   - YARN
> application has been deployed successfully.
>
> Cluster started: Yarn cluster with application id
> application_1487247313588_0017
>
> Using address 10.199.202.162:43809 to connect to JobManager.
>
> JobManager web interface address
> http://vip-rc-ucsww.vclound.com:8088/proxy/application_1487247313588_0017/
>
> Using the parallelism provided by the remote cluster (8). To use another
> parallelism, set it at the ./bin/flink client.
>
> Starting execution of program
>
> 2017-02-17 15:52:49,278 INFO
> org.apache.flink.yarn.YarnClusterClient                       - Starting
> program in interactive mode
>
> Executing WordCount example with default input data set.
>
> Use --input to specify file input.
>
> Printing result to stdout. Use --output to specify output path.
>
> 2017-02-17 15:52:49,609 INFO
> org.apache.flink.yarn.YarnClusterClient                       - Waiting
> until all TaskManagers have connected
>
> Waiting until all TaskManagers have connected
>
> 2017-02-17 15:52:49,610 INFO  org.apache.flink.yarn.YarnClusterClient
>                       - Starting client actor system.
>
>
>
> ------------------------------------------------------------
>
> The program finished with the following exception:
>
>
>
> org.apache.flink.client.program.ProgramInvocationException: The main
> method caused an error.
>
>      at
> org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:525)
>
>      at
> org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:404)
>
>      at
> org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:321)
>
>      at
> org.apache.flink.client.CliFrontend.executeProgram(CliFrontend.java:777)
>
>      at org.apache.flink.client.CliFrontend.run(CliFrontend.java:253)
>
>      at
> org.apache.flink.client.CliFrontend.parseParameters(CliFrontend.java:1005)
>
>      at org.apache.flink.client.CliFrontend.main(CliFrontend.java:1048)
>
> Caused by: java.lang.RuntimeException: Unable to get ClusterClient status
> from Application Client
>
>      at
> org.apache.flink.yarn.YarnClusterClient.getClusterStatus(YarnClusterClient.java:242)
>
>      at
> org.apache.flink.yarn.YarnClusterClient.waitForClusterToBeReady(YarnClusterClient.java:514)
>
>      at
> org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:395)
>
>      at
> org.apache.flink.yarn.YarnClusterClient.submitJob(YarnClusterClient.java:204)
>
>      at
> org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:383)
>
>      at
> org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:370)
>
>      at
> org.apache.flink.client.program.ContextEnvironment.execute(ContextEnvironment.java:62)
>
>      at
> org.apache.flink.api.java.ExecutionEnvironment.execute(ExecutionEnvironment.java:896)
>
>      at org.apache.flink.api.java.DataSet.collect(DataSet.java:410)
>
>      at org.apache.flink.api.java.DataSet.print(DataSet.java:1605)
>
>      at
> org.apache.flink.examples.java.wordcount.WordCount.main(WordCount.java:92)
>
>      at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Meth
>
>

Mime
View raw message