flink-user-zh mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From air23 <wangfei23_...@163.com>
Subject Re:Re: flink1.10 on yarn 问题
Date Fri, 29 May 2020 06:33:17 GMT



你好 是cluster的 本地代码没有报错的 报错的消息贴下面了
flink1.7 时正常的。
后来我加上了flink的环境变量
#flink 
export FLINK_HOME=/opt/module/flink-1.10.1
export PATH=${FLINK_HOME}/bin:$PATH
这个报错的例子 就正常跑了


但是换了另外一个任务 在1.7 和本地都是可以的。报错如下
------------------------------------------------------------
 The program finished with the following exception:


org.apache.flink.client.program.ProgramInvocationException: The main method caused an error:
Could not deploy Yarn job cluster.
        at org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:335)
        at org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:205)
        at org.apache.flink.client.ClientUtils.executeProgram(ClientUtils.java:138)
        at org.apache.flink.client.cli.CliFrontend.executeProgram(CliFrontend.java:662)
        at org.apache.flink.client.cli.CliFrontend.run(CliFrontend.java:210)
        at org.apache.flink.client.cli.CliFrontend.parseParameters(CliFrontend.java:893)
        at org.apache.flink.client.cli.CliFrontend.lambda$main$10(CliFrontend.java:966)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1692)
        at org.apache.flink.runtime.security.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:41)
        at org.apache.flink.client.cli.CliFrontend.main(CliFrontend.java:966)
Caused by: org.apache.flink.client.deployment.ClusterDeploymentException: Could not deploy
Yarn job cluster.
        at org.apache.flink.yarn.YarnClusterDescriptor.deployJobCluster(YarnClusterDescriptor.java:398)
        at org.apache.flink.client.deployment.executors.AbstractJobClusterExecutor.execute(AbstractJobClusterExecutor.java:70)
        at org.apache.flink.streaming.api.environment.StreamExecutionEnvironment.executeAsync(StreamExecutionEnvironment.java:1733)
        at org.apache.flink.streaming.api.environment.StreamContextEnvironment.executeAsync(StreamContextEnvironment.java:94)
        at org.apache.flink.streaming.api.environment.StreamContextEnvironment.execute(StreamContextEnvironment.java:63)
        at org.apache.flink.streaming.api.environment.StreamExecutionEnvironment.execute(StreamExecutionEnvironment.java:1620)
        at com.zongteng.ztstream.etl.MongoToKafka.main(MongoToKafka.java:77)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:321)
        ... 11 more
Caused by: org.apache.flink.yarn.YarnClusterDescriptor$YarnDeploymentException: The YARN application
unexpectedly switched to state FAILED during deployment. 
Diagnostics from YARN: Application application_1590715263014_0033 failed 1 times due to AM
Container for appattempt_1590715263014_0033_000001 exited with  exitCode: 2
For more detailed output, check application tracking page:http://zongteng72:8088/proxy/application_1590715263014_0033/Then,
click on links to logs of each attempt.
Diagnostics: Exception from container-launch.
Container id: container_1590715263014_0033_01_000001
Exit code: 2
Stack trace: ExitCodeException exitCode=2: 
        at org.apache.hadoop.util.Shell.runCommand(Shell.java:604)
        at org.apache.hadoop.util.Shell.run(Shell.java:507)
        at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:789)
        at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:213)
        at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
        at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)




Container exited with a non-zero exit code 2
Failing this attempt. Failing the application.
If log aggregation is enabled on your cluster, use this command to further investigate the
issue:
yarn logs -applicationId application_1590715263014_0033
        at org.apache.flink.yarn.YarnClusterDescriptor.startAppMaster(YarnClusterDescriptor.java:999)
        at org.apache.flink.yarn.YarnClusterDescriptor.deployInternal(YarnClusterDescriptor.java:488)
        at org.apache.flink.yarn.YarnClusterDescriptor.deployJobCluster(YarnClusterDescriptor.java:391)
        ... 22 more
2020-05-29 14:18:25,529 INFO  org.apache.flink.yarn.YarnClusterDescriptor                
  - Cancelling deployment from Deployment Failure Hook
2020-05-29 14:18:25,530 INFO  org.apache.hadoop.yarn.client.RMProxy                      
  - Connecting to ResourceManager at zongteng72/192.168.109.72:8032
2020-05-29 14:18:25,532 INFO  org.apache.flink.yarn.YarnClusterDescriptor                
  - Killing YARN application
2020-05-29 14:18:25,540 INFO  org.apache.hadoop.yarn.client.api.impl.YarnClientImpl      
  - Killed application application_1590715263014_0033
2020-05-29 14:18:25,641 INFO  org.apache.flink.yarn.YarnClusterDescriptor                
  - Deleting files in hdfs://ZONGTENGSERIVCE/user/root/.flink/application_1590715263014_0033.














在 2020-05-29 14:21:39,"tison" <wander4096@gmail.com> 写道:
>这个问题好诡异啊,一般来说编译会在 env.execute
>的时候拦截,不应该真的调度起来才对。你能详细描述一下你提交作业的方法还有这个错误报在哪里吗(client?cluster?)?
>
>Best,
>tison.
>
>
>air23 <wangfei23_job@163.com> 于2020年5月29日周五 下午1:38写道:
>
>> cdh运行flink1.10 on cdh yarn 报错如下。 用1.7.2版本就没有问题
>> flink-shaded-hadoop-2-uber-2.6.5-10.0.jar 也加了
>> hadoop环境变量 export HADOOP_CONF_DIR=/etc/hadoop/conf
>> 求解答
>>
>>
>>
>>
>>
>>
>>
>> org.apache.flink.client.program.ProgramInvocationException: The main
>> method caused an error:
>> org.apache.flink.client.program.ProgramInvocationException: Job failed
>> (JobID: e358699c1be6be1472078771e1fd027f)
>>
>>         at
>> org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:335)
>>
>>         at
>> org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:205)
>>
>>         at
>> org.apache.flink.client.ClientUtils.executeProgram(ClientUtils.java:138)
>>
>>         at
>> org.apache.flink.client.cli.CliFrontend.executeProgram(CliFrontend.java:662)
>>
>>         at
>> org.apache.flink.client.cli.CliFrontend.run(CliFrontend.java:210)
>>
>>         at
>> org.apache.flink.client.cli.CliFrontend.parseParameters(CliFrontend.java:893)
>>
>>         at
>> org.apache.flink.client.cli.CliFrontend.lambda$main$10(CliFrontend.java:966)
>>
>>         at java.security.AccessController.doPrivileged(Native Method)
>>
>>         at javax.security.auth.Subject.doAs(Subject.java:422)
>>
>>         at
>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1692)
>>
>>         at
>> org.apache.flink.runtime.security.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:41)
>>
>>         at
>> org.apache.flink.client.cli.CliFrontend.main(CliFrontend.java:966)
>>
>> Caused by: java.util.concurrent.ExecutionException:
>> org.apache.flink.client.program.ProgramInvocationException: Job failed
>> (JobID: e358699c1be6be1472078771e1fd027f)
>>
>>         at
>> java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:357)
>>
>>         at
>> java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1895)
>>
>>         at
>> org.apache.flink.streaming.api.environment.StreamContextEnvironment.execute(StreamContextEnvironment.java:83)
>>
>>         at
>> org.apache.flink.streaming.api.environment.StreamExecutionEnvironment.execute(StreamExecutionEnvironment.java:1620)
>>
>>         at
>> tt.WordCountStreamingByJava.main(WordCountStreamingByJava.java:36)
>>
>>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>
>>         at
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>>
>>         at
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>>
>>         at java.lang.reflect.Method.invoke(Method.java:498)
>>
>>         at
>> org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:321)
>>
>>         ... 11 more
>>
>> Caused by: org.apache.flink.client.program.ProgramInvocationException: Job
>> failed (JobID: e358699c1be6be1472078771e1fd027f)
>>
>>         at
>> org.apache.flink.client.deployment.ClusterClientJobClientAdapter.lambda$null$6(ClusterClientJobClientAdapter.java:112)
>>
>>         at
>> java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:602)
>>
>>         at
>> java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:577)
>>
>>         at
>> java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
>>
>>         at
>> java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:1962)
>>
>>         at
>> org.apache.flink.client.program.rest.RestClusterClient.lambda$pollResourceAsync$21(RestClusterClient.java:565)
>>
>>         at
>> java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:760)
>>
>>         at
>> java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:736)
>>
>>         at
>> java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
>>
>>         at
>> java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:1962)
>>
>>         at
>> org.apache.flink.runtime.concurrent.FutureUtils.lambda$retryOperationWithDelay$8(FutureUtils.java:291)
>>
>>         at
>> java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:760)
>>
>>         at
>> java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:736)
>>
>>         at
>> java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
>>
>>         at
>> java.util.concurrent.CompletableFuture.postFire(CompletableFuture.java:561)
>>
>>         at
>> java.util.concurrent.CompletableFuture$UniCompose.tryFire(CompletableFuture.java:929)
>>
>>         at
>> java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:442)
>>
>>         at
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>>
>>         at
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>>
>>         at java.lang.Thread.run(Thread.java:748)
>>
>> Caused by: org.apache.flink.runtime.client.JobExecutionException: Job
>> execution failed.
>>
>>         at
>> org.apache.flink.runtime.jobmaster.JobResult.toJobExecutionResult(JobResult.java:147)
>>
>>         at
>> org.apache.flink.client.deployment.ClusterClientJobClientAdapter.lambda$null$6(ClusterClientJobClientAdapter.java:110)
>>
>>         ... 19 more
>>
>> Caused by: org.apache.flink.runtime.JobException: Recovery is suppressed
>> by NoRestartBackoffTimeStrategy
>>
>>         at
>> org.apache.flink.runtime.executiongraph.failover.flip1.ExecutionFailureHandler.handleFailure(ExecutionFailureHandler.java:110)
>>
>>         at
>> org.apache.flink.runtime.executiongraph.failover.flip1.ExecutionFailureHandler.getFailureHandlingResult(ExecutionFailureHandler.java:76)
>>
>>         at
>> org.apache.flink.runtime.scheduler.DefaultScheduler.handleTaskFailure(DefaultScheduler.java:192)
>>
>>         at
>> org.apache.flink.runtime.scheduler.DefaultScheduler.maybeHandleTaskFailure(DefaultScheduler.java:186)
>>
>>         at
>> org.apache.flink.runtime.scheduler.DefaultScheduler.updateTaskExecutionStateInternal(DefaultScheduler.java:180)
>>
>>         at
>> org.apache.flink.runtime.scheduler.SchedulerBase.updateTaskExecutionState(SchedulerBase.java:496)
>>
>>         at
>> org.apache.flink.runtime.jobmaster.JobMaster.updateTaskExecutionState(JobMaster.java:380)
>>
>>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>
>>         at
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>>
>>         at
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>>
>>         at java.lang.reflect.Method.invoke(Method.java:498)
>>
>>         at
>> org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcInvocation(AkkaRpcActor.java:284)
>>
>>         at
>> org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcMessage(AkkaRpcActor.java:199)
>>
>>         at
>> org.apache.flink.runtime.rpc.akka.FencedAkkaRpcActor.handleRpcMessage(FencedAkkaRpcActor.java:74)
>>
>>         at
>> org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleMessage(AkkaRpcActor.java:152)
>>
>>         at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:26)
>>
>>         at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:21)
>>
>>         at
>> scala.PartialFunction$class.applyOrElse(PartialFunction.scala:123)
>>
>>         at akka.japi.pf
>> .UnitCaseStatement.applyOrElse(CaseStatements.scala:21)
>>
>>         at
>> scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:170)
>>
>>         at
>> scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:171)
>>
>>         at
>> scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:171)
>>
>>         at akka.actor.Actor$class.aroundReceive(Actor.scala:517)
>>
>>         at akka.actor.AbstractActor.aroundReceive(AbstractActor.scala:225)
>>
>>         at akka.actor.ActorCell.receiveMessage(ActorCell.scala:592)
>>
>>         at akka.actor.ActorCell.invoke(ActorCell.scala:561)
>>
>>         at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:258)
>>
>>         at akka.dispatch.Mailbox.run(Mailbox.scala:225)
>>
>>         at akka.dispatch.Mailbox.exec(Mailbox.scala:235)
>>
>>         at
>> akka.dispatch.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
>>
>>         at
>> akka.dispatch.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
>>
>>         at
>> akka.dispatch.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
>>
>>         at
>> akka.dispatch.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
>>
>> Caused by: java.net.ConnectException: Connection refused (Connection
>> refused)
>>
>>         at java.net.PlainSocketImpl.socketConnect(Native Method)
>>
>>         at java.net
>> .AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350)
>>
>>         at java.net
>> .AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206)
>>
>>         at java.net
>> .AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)
>>
>>         at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
>>
>>         at java.net.Socket.connect(Socket.java:606)
>>
>>         at
>> org.apache.flink.streaming.api.functions.source.SocketTextStreamFunction.run(SocketTextStreamFunction.java:97)
>>
>>         at
>> org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:100)
>>
>>         at
>> org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:63)
>>
>>         at
>> org.apache.flink.streaming.runtime.tasks.SourceStreamTask$LegacySourceFunctionThread.run(SourceStreamTask.java:200)
Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message