flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Robert Metzger <rmetz...@apache.org>
Subject Re: Can't run flink on yarn on version 1.2.0
Date Thu, 23 Feb 2017 20:33:47 GMT
Hi,

were both JDKs from the same vendor? (say OpenJDK?) both installed
"vanilla" from the package manager?
Java is usually pretty good with backwards compatibility.
I think this issue is caused by some other effects we are overseeing here.

On Thu, Feb 23, 2017 at 10:43 AM, Bruno Aranda <brunoaranda@gmail.com>
wrote:

> Hi,
>
> Good you found a solution, but are you sure it is the JDK version?
>
> We are running Flink 1.2.0 on Yarn on an AWS EMR Cluster with no issues,
> using JDK 8 (1.8.0_121).
>
> Cheers,
>
> Bruno
>
> On Thu, 23 Feb 2017 at 09:26 Howard,Li(vip.com) <howard.li@vipshop.com>
> wrote:
>
>> Hi All:
>>
>>          We finally find out the problem.
>>
>>          The Flink on Yarn only works on JDK7, but not JDK8. If you use
>> JDK8, you may meet the problem discussed before.
>>
>>          For more information: OS: CentOS 6.6. JDK7 version: 1.7.0u75
>> JDK8 version: 1.8.0u111.
>>
>>
>>
>>          This problem may have some relationship with akka.
>>
>>
>>
>> *发件人:* Till Rohrmann [mailto:trohrmann@apache.org]
>> *发送时间:* 2017年2月17日 18:33
>>
>> *收件人:* user@flink.apache.org
>> *主题:* Re: Can't run flink on yarn on version 1.2.0
>>
>>
>>
>> Hi Howard,
>>
>>
>>
>> could you check whether the JobManager's actor system was bound to "
>> vip-rc-vsubu.vclound.com:55926"? You should see that in the job manager
>> logs. Furthermore, have you checked that you Yarn cluster nodes are
>> actually reachable from the node where you start the Flink application? If
>> so, the logs of the cli client as well as the JobManager logs (both on
>> debug level) would be tremendously helpful.
>>
>>
>>
>> Cheers,
>>
>> Till
>>
>>
>>
>> On Fri, Feb 17, 2017 at 10:41 AM, Howard,Li(vip.com) <
>> howard.li@vipshop.com> wrote:
>>
>> Sorry for the confusion I made. I do copy the wrong log, but we do meet
>> this problem on 1.2.0.
>>
>> for version 1.1.4 however, we meet this in one cluster but not in
>> another. We are still trying to figure out what happened.
>>
>>
>>
>> The following is the log for 1.2.0 version:
>>
>>
>>
>> 2017-02-17 15:51:37,775 INFO  org.apache.flink.yarn.cli.
>> FlinkYarnSessionCli                 - No path for the flink jar passed.
>> Using the location of class org.apache.flink.yarn.YarnClusterDescriptor
>> to locate the jar
>>
>> 2017-02-17 15:51:37,775 INFO  org.apache.flink.yarn.cli.
>> FlinkYarnSessionCli                 - No path for the flink jar passed.
>> Using the location of class org.apache.flink.yarn.YarnClusterDescriptor
>> to locate the jar
>>
>> 2017-02-17 15:51:37,803 INFO  org.apache.flink.yarn.YarnClusterDescriptor
>>                 - Using values:
>>
>> 2017-02-17 15:51:37,804 INFO  org.apache.flink.yarn.
>> YarnClusterDescriptor                   -    TaskManager count = 2
>>
>> 2017-02-17 15:51:37,804 INFO  org.apache.flink.yarn.
>> YarnClusterDescriptor                   -    JobManager memory = 1024
>>
>> 2017-02-17 15:51:37,804 INFO  org.apache.flink.yarn.
>> YarnClusterDescriptor                   -    TaskManager memory = 1024
>>
>> 2017-02-17 15:51:37,827 INFO  org.apache.hadoop.yarn.client.
>> RMProxy                         - Connecting to ResourceManager at /
>> 0.0.0.0:8032
>>
>> 2017-02-17 15:51:38,672 WARN  org.apache.flink.yarn.
>> YarnClusterDescriptor                   - The configuration directory
>> ('/home/software/flink-1.2.0/conf') contains both LOG4J and Logback
>> configuration files. Please delete or rename one of them.
>>
>> 2017-02-17 15:51:38,685 INFO  org.apache.flink.yarn.Utils
>>                                 - Copying from
>> file:/home/software/flink-1.2.0/examples/batch/WordCount.jar to hdfs://
>> 10.199.202.161:9000/user/root/.flink/application_
>> 1487247313588_0016/WordCount.jar
>>
>> 2017-02-17 15:51:38,992 INFO  org.apache.flink.yarn.Utils
>>                                 - Copying from
>> file:/home/software/flink-1.2.0/conf/log4j.properties to hdfs://
>> 10.199.202.161:9000/user/root/.flink/application_
>> 1487247313588_0016/log4j.properties
>>
>> 2017-02-17 15:51:39,058 INFO  org.apache.flink.yarn.Utils
>>                                 - Copying from
>> file:/home/software/flink-1.2.0/conf/logback.xml to hdfs://
>> 10.199.202.161:9000/user/root/.flink/application_
>> 1487247313588_0016/logback.xml
>>
>> 2017-02-17 15:51:39,085 INFO  org.apache.flink.yarn.Utils
>>                                 - Copying from
>> file:/home/software/flink-1.2.0/lib to hdfs://10.199.202.161:9000/
>> user/root/.flink/application_1487247313588_0016/lib
>>
>> 2017-02-17 15:51:39,695 INFO  org.apache.flink.yarn.Utils
>>                                 - Copying from
>> file:/home/software/flink-1.2.0/lib/flink-dist_2.11-1.2.0.jar to hdfs://
>> 10.199.202.161:9000/user/root/.flink/application_
>> 1487247313588_0016/flink-dist_2.11-1.2.0.jar
>>
>> 2017-02-17 15:51:40,493 INFO  org.apache.flink.yarn.Utils
>>                                 - Copying from
>> /home/software/flink-1.2.0/conf/flink-conf.yaml to hdfs://
>> 10.199.202.161:9000/user/root/.flink/application_
>> 1487247313588_0016/flink-conf.yaml
>>
>> 2017-02-17 15:51:40,547 INFO  org.apache.flink.yarn.
>> YarnClusterDescriptor                   - Submitting application master
>> application_1487247313588_0016
>>
>> 2017-02-17 15:51:40,585 INFO  org.apache.hadoop.yarn.client.
>> api.impl.YarnClientImpl         - Submitted application
>> application_1487247313588_0016
>>
>> 2017-02-17 15:51:40,585 INFO  org.apache.flink.yarn.
>> YarnClusterDescriptor                   - Waiting for the cluster to be
>> allocated
>>
>> 2017-02-17 15:51:40,587 INFO  org.apache.flink.yarn.
>> YarnClusterDescriptor                   - Deploying cluster, current
>> state ACCEPTED
>>
>> 2017-02-17 15:51:45,879 INFO  org.apache.flink.yarn.
>> YarnClusterDescriptor                   - YARN application has been
>> deployed successfully.
>>
>> Cluster started: Yarn cluster with application id
>> application_1487247313588_0016
>>
>> Using address vip-rc-vsubu.vclound.com:55926 to connect to JobManager.
>>
>> JobManager web interface address http://vip-rc-ucsww.vclound.
>> com:8088/proxy/application_1487247313588_0016/
>>
>> Using the parallelism provided by the remote cluster (8). To use another
>> parallelism, set it at the ./bin/flink client.
>>
>> Starting execution of program
>>
>> 2017-02-17 15:51:46,704 INFO  org.apache.flink.yarn.
>> YarnClusterClient                       - Starting program in
>> interactive mode
>>
>> Executing WordCount example with default input data set.
>>
>> Use --input to specify file input.
>>
>> Printing result to stdout. Use --output to specify output path.
>>
>> 2017-02-17 15:51:47,029 INFO  org.apache.flink.yarn.
>> YarnClusterClient                       - Waiting until all TaskManagers
>> have connected
>>
>> Waiting until all TaskManagers have connected
>>
>> 2017-02-17 15:51:47,029 INFO  org.apache.flink.yarn.
>> YarnClusterClient                       - Starting client actor system.
>>
>>
>>
>> ------------------------------------------------------------
>>
>> The program finished with the following exception:
>>
>>
>>
>> org.apache.flink.client.program.ProgramInvocationException: The main
>> method caused an error.
>>
>>          at org.apache.flink.client.program.PackagedProgram.
>> callMainMethod(PackagedProgram.java:545)
>>
>>          at org.apache.flink.client.program.PackagedProgram.
>> invokeInteractiveModeForExecution(PackagedProgram.java:419)
>>
>>          at org.apache.flink.client.program.ClusterClient.run(
>> ClusterClient.java:339)
>>
>>          at org.apache.flink.client.CliFrontend.executeProgram(
>> CliFrontend.java:831)
>>
>>          at org.apache.flink.client.CliFrontend.run(CliFrontend.java:256)
>>
>>          at org.apache.flink.client.CliFrontend.parseParameters(
>> CliFrontend.java:1073)
>>
>>          at org.apache.flink.client.CliFrontend$2.call(
>> CliFrontend.java:1120)
>>
>>          at org.apache.flink.client.CliFrontend$2.call(
>> CliFrontend.java:1117)
>>
>>          at org.apache.flink.runtime.security.
>> HadoopSecurityContext$1.run(HadoopSecurityContext.java:43)
>>
>>          at java.security.AccessController.doPrivileged(Native Method)
>>
>>          at javax.security.auth.Subject.doAs(Subject.java:422)
>>
>>          at org.apache.hadoop.security.UserGroupInformation.doAs(
>> UserGroupInformation.java:1657)
>>
>>          at org.apache.flink.runtime.security.HadoopSecurityContext.
>> runSecured(HadoopSecurityContext.java:40)
>>
>>          at org.apache.flink.client.CliFrontend.main(CliFrontend.
>> java:1116)
>>
>> Caused by: java.lang.RuntimeException: Unable to get ClusterClient status
>> from Application Client
>>
>>          at org.apache.flink.yarn.YarnClusterClient.getClusterStatus(
>> YarnClusterClient.java:248)
>>
>>          at org.apache.flink.yarn.YarnClusterClient.
>> waitForClusterToBeReady(YarnClusterClient.java:520)
>>
>>          at org.apache.flink.client.program.ClusterClient.run(
>> ClusterClient.java:412)
>>
>>          at org.apache.flink.yarn.YarnClusterClient.submitJob(
>> YarnClusterClient.java:210)
>>
>>          at org.apache.flink.client.program.ClusterClient.run(
>> ClusterClient.java:400)
>>
>>          at org.apache.flink.client.program.ClusterClient.run(
>> ClusterClient.java:387)
>>
>>          at org.apache.flink.client.program.ContextEnvironment.
>> execute(ContextEnvironment.java:62)
>>
>>          at org.apache.flink.api.java.ExecutionEnvironment.execute(
>> ExecutionEnvironment.java:926)
>>
>>          at org.apache.flink.api.java.DataSet.collect(DataSet.java:410)
>>
>>          at org.apache.flink.api.java.DataSet.print(DataSet.java:1605)
>>
>>          at org.apache.flink.examples.java.wordcount.WordCount.main(
>> WordCount.java:92)
>>
>>          at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>
>>          at sun.reflect.NativeMethodAccessorImpl.invoke(
>> NativeMethodAccessorImpl.java:62)
>>
>>          at sun.reflect.DelegatingMethodAccessorImpl.invoke(
>> DelegatingMethodAccessorImpl.java:43)
>>
>>          at java.lang.reflect.Method.invoke(Method.java:498)
>>
>>          at org.apache.flink.client.program.PackagedProgram.
>> callMainMethod(PackagedProgram.java:528)
>>
>>          ... 13 more
>>
>> Caused by: org.apache.flink.runtime.leaderretrieval.LeaderRetrievalException:
>> Could not retrieve the leader gateway
>>
>>          at org.apache.flink.runtime.util.LeaderRetrievalUtils.
>> retrieveLeaderGateway(LeaderRetrievalUtils.java:141)
>>
>>          at org.apache.flink.client.program.ClusterClient.
>> getJobManagerGateway(ClusterClient.java:691)
>>
>>          at org.apache.flink.yarn.YarnClusterClient.getClusterStatus(
>> YarnClusterClient.java:242)
>>
>>          ... 28 more
>>
>> Caused by: java.util.concurrent.TimeoutException: Futures timed out
>> after [10000 milliseconds]
>>
>>          at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.
>> scala:219)
>>
>>          at scala.concurrent.impl.Promise$DefaultPromise.result(Promise.
>> scala:223)
>>
>>          at scala.concurrent.Await$$anonfun$result$1.apply(
>> package.scala:190)
>>
>>          at scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(
>> BlockContext.scala:53)
>>
>>          at scala.concurrent.Await$.result(package.scala:190)
>>
>>          at scala.concurrent.Await.result(package.scala)
>>
>>          at org.apache.flink.runtime.util.LeaderRetrievalUtils.
>> retrieveLeaderGateway(LeaderRetrievalUtils.java:139)
>>
>>          ... 30 more
>>
>> 2017-02-17 15:52:21,145 INFO  org.apache.flink.yarn.
>> YarnClusterClient                       - Sending shutdown request to
>> the Application Master
>>
>> 2017-02-17 15:52:21,145 INFO  org.apache.flink.yarn.
>> YarnClusterClient                       - Start application client.
>>
>> 2017-02-17 15:52:21,151 WARN  org.apache.flink.yarn.
>> YarnClusterClient                       - YARN reported application
>> state FAILED
>>
>> 2017-02-17 15:52:21,152 WARN  org.apache.flink.yarn.
>> YarnClusterClient                       - Diagnostics: Application
>> application_1487247313588_0016 failed 1 times due to AM Container for
>> appattempt_1487247313588_0016_000001 exited with  exitCode: -103
>>
>> For more detailed output, check application tracking page:
>> http://vip-rc-ucsww.vclound.com:8088/cluster/app/
>> application_1487247313588_0016Then, click on links to logs of each
>> attempt.
>>
>> Diagnostics: Container [pid=18590,containerID=
>> container_1487247313588_0016_01_000001] is running beyond virtual memory
>> limits. Current usage: 266.1 MB of 1 GB physical memory used; 2.2 GB of 2.1
>> GB virtual memory used. Killing container.
>>
>> Dump of the process-tree for container_1487247313588_0016_01_000001 :
>>
>>          |- PID PPID PGRPID SESSID CMD_NAME USER_MODE_TIME(MILLIS)
>> SYSTEM_TIME(MILLIS) VMEM_USAGE(BYTES) RSSMEM_USAGE(PAGES) FULL_CMD_LINE
>>
>>          |- 18598 18590 18590 18590 (java) 894 48 2294116352
>> <(229)%20411-6352> 67782 /home/software/jdk1.8.0_111/bin/java -Xmx424M
>> -Dlog.file=/home/software/hadoop-2.7.3/logs/userlogs/
>> application_1487247313588_0016/container_1487247313588_0016_01_000001/jobmanager.log
>> -Dlogback.configurationFile=file:logback.xml -Dlog4j.configuration=file:log4j.properties
>> org.apache.flink.yarn.YarnApplicationMasterRunner
>>
>>          |- 18590 18588 18590 18590 (bash) 0 0 108605440 334 /bin/bash -c
>> /home/software/jdk1.8.0_111/bin/java -Xmx424M  -Dlog.file=/home/software/
>> hadoop-2.7.3/logs/userlogs/application_1487247313588_
>> 0016/container_1487247313588_0016_01_000001/jobmanager.log
>> -Dlogback.configurationFile=file:logback.xml -Dlog4j.configuration=file:log4j.properties
>> org.apache.flink.yarn.YarnApplicationMasterRunner
>> 1>/home/software/hadoop-2.7.3/logs/userlogs/application_
>> 1487247313588_0016/container_1487247313588_0016_01_000001/jobmanager.out
>> 2>/home/software/hadoop-2.7.3/logs/userlogs/application_
>> 1487247313588_0016/container_1487247313588_0016_01_000001/jobmanager.err
>>
>>
>>
>> Container killed on request. Exit code is 143
>>
>> Container exited with a non-zero exit code 143
>>
>> Failing this attempt. Failing the application.
>>
>> 2017-02-17 15:52:21,160 INFO  org.apache.flink.yarn.
>> ApplicationClient                       - Notification about new leader
>> address akka.tcp://flink@vip-rc-vsubu.vclound.com:55926/user/jobmanager
>> with session ID null.
>>
>> 2017-02-17 15:52:21,163 INFO  org.apache.flink.yarn.
>> ApplicationClient                       - Sending StopCluster request to
>> JobManager.
>>
>> 2017-02-17 15:52:21,164 INFO  org.apache.flink.yarn.
>> ApplicationClient                       - Received address of new leader
>> akka.tcp://flink@vip-rc-vsubu.vclound.com:55926/user/jobmanager with
>> session ID null.
>>
>> 2017-02-17 15:52:21,165 INFO  org.apache.flink.yarn.
>> ApplicationClient                       - Disconnect from JobManager
>> null.
>>
>> 2017-02-17 15:52:21,168 INFO  org.apache.flink.yarn.ApplicationClient
>>                      - Trying to register at JobManager akka.tcp://
>> flink@vip-rc-vsubu.vclound.com:55926/user/jobmanager.
>>
>> 2017-02-17 15:52:21,684 INFO  org.apache.flink.yarn.
>> ApplicationClient                       - Trying to register at
>> JobManager akka.tcp://flink@vip-rc-vsubu.vclound.com:55926/user/
>> jobmanager.
>>
>> 2017-02-17 15:52:22,174 INFO  org.apache.flink.yarn.
>> ApplicationClient                       - Sending StopCluster request to
>> JobManager.
>>
>> 2017-02-17 15:52:22,704 INFO  org.apache.flink.yarn.
>> ApplicationClient                       - Trying to register at
>> JobManager akka.tcp://flink@vip-rc-vsubu.vclound.com:55926/user/
>> jobmanager.
>>
>> 2017-02-17 15:52:23,194 INFO  org.apache.flink.yarn.
>> ApplicationClient                       - Sending StopCluster request to
>> JobManager.
>>
>> 2017-02-17 15:52:24,214 INFO  org.apache.flink.yarn.
>> ApplicationClient                       - Sending StopCluster request to
>> JobManager.
>>
>> 2017-02-17 15:52:24,725 INFO  org.apache.flink.yarn.
>> ApplicationClient                       - Trying to register at
>> JobManager akka.tcp://flink@vip-rc-vsubu.vclound.com:55926/user/
>> jobmanager.
>>
>> 2017-02-17 15:52:25,234 INFO  org.apache.flink.yarn.
>> ApplicationClient                       - Sending StopCluster request to
>> JobManager.
>>
>> 2017-02-17 15:52:26,254 INFO  org.apache.flink.yarn.
>> ApplicationClient                       - Sending StopCluster request to
>> JobManager.
>>
>> 2017-02-17 15:52:27,274 INFO  org.apache.flink.yarn.
>> ApplicationClient                       - Sending StopCluster request to
>> JobManager.
>>
>> 2017-02-17 15:52:28,294 INFO  org.apache.flink.yarn.
>> ApplicationClient                       - Sending StopCluster request to
>> JobManager.
>>
>> 2017-02-17 15:52:28,744 INFO  org.apache.flink.yarn.
>> ApplicationClient                       - Trying to register at
>> JobManager akka.tcp://flink@vip-rc-vsubu.vclound.com:55926/user/
>> jobmanager.
>>
>> 2017-02-17 15:52:29,314 INFO  org.apache.flink.yarn.
>> ApplicationClient                       - Sending StopCluster request to
>> JobManager.
>>
>> 2017-02-17 15:52:30,334 INFO  org.apache.flink.yarn.
>> ApplicationClient                       - Sending StopCluster request to
>> JobManager.
>>
>> 2017-02-17 15:52:31,155 WARN  org.apache.flink.yarn.
>> YarnClusterClient                       - Error while stopping YARN
>> cluster.
>>
>> java.util.concurrent.TimeoutException: Futures timed out after [10000
>> milliseconds]
>>
>>          at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.
>> scala:219)
>>
>>          at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.
>> scala:153)
>>
>>          at scala.concurrent.Await$$anonfun$ready$1.apply(package.
>> scala:169)
>>
>>          at scala.concurrent.Await$$anonfun$ready$1.apply(package.
>> scala:169)
>>
>>          at scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(
>> BlockContext.scala:53)
>>
>>          at scala.concurrent.Await$.ready(package.scala:169)
>>
>>          at scala.concurrent.Await.ready(package.scala)
>>
>>          at org.apache.flink.yarn.YarnClusterClient.shutdownCluster(
>> YarnClusterClient.java:372)
>>
>>          at org.apache.flink.yarn.YarnClusterClient.finalizeCluster(
>> YarnClusterClient.java:342)
>>
>>          at org.apache.flink.client.program.ClusterClient.
>> shutdown(ClusterClient.java:208)
>>
>>          at org.apache.flink.client.CliFrontend.run(CliFrontend.java:263)
>>
>>          at org.apache.flink.client.CliFrontend.parseParameters(
>> CliFrontend.java:1073)
>>
>>          at org.apache.flink.client.CliFrontend$2.call(
>> CliFrontend.java:1120)
>>
>>          at org.apache.flink.client.CliFrontend$2.call(
>> CliFrontend.java:1117)
>>
>>          at org.apache.flink.runtime.security.
>> HadoopSecurityContext$1.run(HadoopSecurityContext.java:43)
>>
>>          at java.security.AccessController.doPrivileged(Native Method)
>>
>>          at javax.security.auth.Subject.doAs(Subject.java:422)
>>
>>          at org.apache.hadoop.security.UserGroupInformation.doAs(
>> UserGroupInformation.java:1657)
>>
>>          at org.apache.flink.runtime.security.HadoopSecurityContext.
>> runSecured(HadoopSecurityContext.java:40)
>>
>>          at org.apache.flink.client.CliFrontend.main(CliFrontend.
>> java:1116)
>>
>> 2017-02-17 15:52:31,156 INFO  org.apache.flink.yarn.
>> YarnClusterClient                       - Deleting files in hdfs://
>> 10.199.202.161:9000/user/root/.flink/application_1487247313588_0016
>>
>> 2017-02-17 15:52:31,354 INFO  org.apache.flink.yarn.
>> ApplicationClient                       - Sending StopCluster request to
>> JobManager.
>>
>> 2017-02-17 15:52:32,163 INFO  org.apache.flink.yarn.
>> YarnClusterClient                       - Application
>> application_1487247313588_0016 finished with state FAILED and final state
>> FAILED at 1487317906227
>>
>> 2017-02-17 15:52:32,163 WARN  org.apache.flink.yarn.
>> YarnClusterClient                       - Application failed.
>> Diagnostics Application application_1487247313588_0016 failed 1 times due
>> to AM Container for appattempt_1487247313588_0016_000001 exited with
>> exitCode: -103
>>
>> For more detailed output, check application tracking page:
>> http://vip-rc-ucsww.vclound.com:8088/cluster/app/
>> application_1487247313588_0016Then, click on links to logs of each
>> attempt.
>>
>> Diagnostics: Container [pid=18590,containerID=
>> container_1487247313588_0016_01_000001] is running beyond virtual memory
>> limits. Current usage: 266.1 MB of 1 GB physical memory used; 2.2 GB of 2.1
>> GB virtual memory used. Killing container.
>>
>> Dump of the process-tree for container_1487247313588_0016_01_000001 :
>>
>>          |- PID PPID PGRPID SESSID CMD_NAME USER_MODE_TIME(MILLIS)
>> SYSTEM_TIME(MILLIS) VMEM_USAGE(BYTES) RSSMEM_USAGE(PAGES) FULL_CMD_LINE
>>
>>          |- 18598 18590 18590 18590 (java) 894 48 2294116352
>> <(229)%20411-6352> 67782 /home/software/jdk1.8.0_111/bin/java -Xmx424M
>> -Dlog.file=/home/software/hadoop-2.7.3/logs/userlogs/
>> application_1487247313588_0016/container_1487247313588_0016_01_000001/jobmanager.log
>> -Dlogback.configurationFile=file:logback.xml -Dlog4j.configuration=file:log4j.properties
>> org.apache.flink.yarn.YarnApplicationMasterRunner
>>
>>          |- 18590 18588 18590 18590 (bash) 0 0 108605440 334 /bin/bash -c
>> /home/software/jdk1.8.0_111/bin/java -Xmx424M  -Dlog.file=/home/software/
>> hadoop-2.7.3/logs/userlogs/application_1487247313588_
>> 0016/container_1487247313588_0016_01_000001/jobmanager.log
>> -Dlogback.configurationFile=file:logback.xml -Dlog4j.configuration=file:log4j.properties
>> org.apache.flink.yarn.YarnApplicationMasterRunner
>> 1>/home/software/hadoop-2.7.3/logs/userlogs/application_
>> 1487247313588_0016/container_1487247313588_0016_01_000001/jobmanager.out
>> 2>/home/software/hadoop-2.7.3/logs/userlogs/application_
>> 1487247313588_0016/container_1487247313588_0016_01_000001/jobmanager.err
>>
>>
>>
>> Container killed on request. Exit code is 143
>>
>> Container exited with a non-zero exit code 143
>>
>> Failing this attempt. Failing the application.
>>
>> 2017-02-17 15:52:32,164 WARN  org.apache.flink.yarn.
>> YarnClusterClient                       - If log aggregation is
>> activated in the Hadoop cluster, we recommend to retrieve the full
>> application log using this command:
>>
>>          yarn logs -applicationId application_1487247313588_0016
>>
>> (It sometimes takes a few seconds until the logs are aggregated)
>>
>> 2017-02-17 15:52:32,164 INFO  org.apache.flink.yarn.
>> YarnClusterClient                       - YARN Client is shutting down
>>
>> 2017-02-17 15:52:32,267 INFO  org.apache.flink.yarn.
>> ApplicationClient                       - Stopped Application client.
>>
>> 2017-02-17 15:52:32,267 INFO  org.apache.flink.yarn.
>> ApplicationClient                       - Disconnect from JobManager
>> null.
>>
>>
>>
>>
>>
>> *发件人:* Bruno Aranda [mailto:brunoaranda@gmail.com]
>> *发送时间:* 2017年2月17日 17:02
>> *收件人:* user@flink.apache.org
>> *主题:* Re: Can't run flink on yarn on version 1.2.0
>>
>>
>>
>> Hi Howard,
>>
>>
>>
>> We run Flink 1.2 in Yarn without issues. Sorry I don't have any specific
>> solution, but are you sure you don't have some sort of Flink mix? In your
>> logs I can see:
>>
>>
>>
>> *The configuration directory ('/home/software/flink-1.1.4/conf') contains
>> both LOG4J and Logback configuration files. Please delete or rename one of
>> them.*
>>
>>
>>
>> Where it mentions 1.1.4 in the folder for the conf dir instead of 1.2.
>>
>>
>>
>> Cheers,
>>
>>
>>
>> Bruno
>>
>>
>>
>> On Fri, 17 Feb 2017 at 08:50 Howard,Li(vip.com) <howard.li@vipshop.com>
>> wrote:
>>
>> Hi,
>>
>>          I’m trying to run flink on yarn by using command: bin/flink run
>> -m yarn-cluster -yn 2 -ys 4 ./examples/batch/WordCount.jar
>>
>>          But I got the following error:
>>
>>
>>
>> 2017-02-17 15:52:40,746 INFO  org.apache.flink.yarn.cli.
>> FlinkYarnSessionCli                 - No path for the flink jar passed.
>> Using the location of class org.apache.flink.yarn.YarnClusterDescriptor
>> to locate the jar
>>
>> 2017-02-17 15:52:40,746 INFO  org.apache.flink.yarn.cli.
>> FlinkYarnSessionCli                 - No path for the flink jar passed.
>> Using the location of class org.apache.flink.yarn.YarnClusterDescriptor
>> to locate the jar
>>
>> 2017-02-17 15:52:40,775 INFO  org.apache.flink.yarn.
>> YarnClusterDescriptor                   - Using values:
>>
>> 2017-02-17 15:52:40,775 INFO  org.apache.flink.yarn.
>> YarnClusterDescriptor                   -         TaskManager count = 2
>>
>> 2017-02-17 15:52:40,775 INFO  org.apache.flink.yarn.
>> YarnClusterDescriptor                   -         JobManager memory =
>> 1024
>>
>> 2017-02-17 15:52:40,775 INFO  org.apache.flink.yarn.
>> YarnClusterDescriptor                   -         TaskManager memory =
>> 1024
>>
>> 2017-02-17 15:52:40,796 INFO  org.apache.hadoop.yarn.client.
>> RMProxy                         - Connecting to ResourceManager at /
>> 0.0.0.0:8032
>>
>> 2017-02-17 15:52:41,680 WARN  org.apache.flink.yarn.
>> YarnClusterDescriptor                   - The configuration directory
>> ('/home/software/flink-1.1.4/conf') contains both LOG4J and Logback
>> configuration files. Please delete or rename one of them.
>>
>> 2017-02-17 15:52:41,702 INFO  org.apache.flink.yarn.Utils
>>                                 - Copying from
>> file:/home/software/flink-1.1.4/conf/logback.xml to hdfs://
>> 10.199.202.161:9000/user/root/.flink/application_
>> 1487247313588_0017/logback.xml
>>
>> 2017-02-17 15:52:42,025 INFO  org.apache.flink.yarn.Utils
>>                                 - Copying from
>> file:/home/software/flink-1.1.4/lib to hdfs://10.199.202.161:9000/
>> user/root/.flink/application_1487247313588_0017/lib
>>
>> 2017-02-17 15:52:42,695 INFO  org.apache.flink.yarn.Utils
>>                                 - Copying from
>> file:/home/software/flink-1.1.4/conf/log4j.properties to hdfs://
>> 10.199.202.161:9000/user/root/.flink/application_
>> 1487247313588_0017/log4j.properties
>>
>> 2017-02-17 15:52:42,722 INFO  org.apache.flink.yarn.Utils
>>                                 - Copying from
>> file:/home/software/flink-1.1.4/lib/flink-dist_2.10-1.1.4.jar to hdfs://
>> 10.199.202.161:9000/user/root/.flink/application_
>> 1487247313588_0017/flink-dist_2.10-1.1.4.jar
>>
>> 2017-02-17 15:52:43,346 INFO  org.apache.flink.yarn.Utils
>>                                 - Copying from
>> /home/software/flink-1.1.4/conf/flink-conf.yaml to hdfs://
>> 10.199.202.161:9000/user/root/.flink/application_
>> 1487247313588_0017/flink-conf.yaml
>>
>> 2017-02-17 15:52:43,386 INFO  org.apache.flink.yarn.
>> YarnClusterDescriptor                   - Submitting application master
>> application_1487247313588_0017
>>
>> 2017-02-17 15:52:43,425 INFO  org.apache.hadoop.yarn.client.
>> api.impl.YarnClientImpl         - Submitted application
>> application_1487247313588_0017
>>
>> 2017-02-17 15:52:43,425 INFO  org.apache.flink.yarn.
>> YarnClusterDescriptor                   - Waiting for the cluster to be
>> allocated
>>
>> 2017-02-17 15:52:43,427 INFO  org.apache.flink.yarn.
>> YarnClusterDescriptor                   - Deploying cluster, current
>> state ACCEPTED
>>
>> 2017-02-17 15:52:48,471 INFO  org.apache.flink.yarn.
>> YarnClusterDescriptor                   - YARN application has been
>> deployed successfully.
>>
>> Cluster started: Yarn cluster with application id
>> application_1487247313588_0017
>>
>> Using address 10.199.202.162:43809 to connect to JobManager.
>>
>> JobManager web interface address http://vip-rc-ucsww.vclound.
>> com:8088/proxy/application_1487247313588_0017/
>>
>> Using the parallelism provided by the remote cluster (8). To use another
>> parallelism, set it at the ./bin/flink client.
>>
>> Starting execution of program
>>
>> 2017-02-17 15:52:49,278 INFO  org.apache.flink.yarn.
>> YarnClusterClient                       - Starting program in
>> interactive mode
>>
>> Executing WordCount example with default input data set.
>>
>> Use --input to specify file input.
>>
>> Printing result to stdout. Use --output to specify output path.
>>
>> 2017-02-17 15:52:49,609 INFO  org.apache.flink.yarn.
>> YarnClusterClient                       - Waiting until all TaskManagers
>> have connected
>>
>> Waiting until all TaskManagers have connected
>>
>> 2017-02-17 15:52:49,610 INFO  org.apache.flink.yarn.YarnClusterClient
>>                       - Starting client actor system.
>>
>>
>>
>> ------------------------------------------------------------
>>
>> The program finished with the following exception:
>>
>>
>>
>> org.apache.flink.client.program.ProgramInvocationException: The main
>> method caused an error.
>>
>>      at org.apache.flink.client.program.PackagedProgram.callMainMethod(
>> PackagedProgram.java:525)
>>
>>      at org.apache.flink.client.program.PackagedProgram.
>> invokeInteractiveModeForExecution(PackagedProgram.java:404)
>>
>>      at org.apache.flink.client.program.ClusterClient.run(
>> ClusterClient.java:321)
>>
>>      at org.apache.flink.client.CliFrontend.executeProgram(
>> CliFrontend.java:777)
>>
>>      at org.apache.flink.client.CliFrontend.run(CliFrontend.java:253)
>>
>>      at org.apache.flink.client.CliFrontend.parseParameters(
>> CliFrontend.java:1005)
>>
>>      at org.apache.flink.client.CliFrontend.main(CliFrontend.java:1048)
>>
>> Caused by: java.lang.RuntimeException: Unable to get ClusterClient status
>> from Application Client
>>
>>      at org.apache.flink.yarn.YarnClusterClient.getClusterStatus(
>> YarnClusterClient.java:242)
>>
>>      at org.apache.flink.yarn.YarnClusterClient.waitForClusterToBeReady(
>> YarnClusterClient.java:514)
>>
>>      at org.apache.flink.client.program.ClusterClient.run(
>> ClusterClient.java:395)
>>
>>      at org.apache.flink.yarn.YarnClusterClient.submitJob(
>> YarnClusterClient.java:204)
>>
>>      at org.apache.flink.client.program.ClusterClient.run(
>> ClusterClient.java:383)
>>
>>      at org.apache.flink.client.program.ClusterClient.run(
>> ClusterClient.java:370)
>>
>>      at org.apache.flink.client.program.ContextEnvironment.
>> execute(ContextEnvironment.java:62)
>>
>>      at org.apache.flink.api.java.ExecutionEnvironment.execute(
>> ExecutionEnvironment.java:896)
>>
>>      at org.apache.flink.api.java.DataSet.collect(DataSet.java:410)
>>
>>      at org.apache.flink.api.java.DataSet.print(DataSet.java:1605)
>>
>>      at org.apache.flink.examples.java.wordcount.WordCount.main(
>> WordCount.java:92)
>>
>>      at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Meth
>>
>>

Mime
View raw message