Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 4863B200D06 for ; Mon, 25 Sep 2017 19:58:19 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 457871609E9; Mon, 25 Sep 2017 17:58:19 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 3CD8C1609B5 for ; Mon, 25 Sep 2017 19:58:18 +0200 (CEST) Received: (qmail 27348 invoked by uid 500); 25 Sep 2017 17:58:17 -0000 Mailing-List: contact user-help@flink.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list user@flink.apache.org Received: (qmail 27335 invoked by uid 99); 25 Sep 2017 17:58:17 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd2-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 25 Sep 2017 17:58:17 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd2-us-west.apache.org (ASF Mail Server at spamd2-us-west.apache.org) with ESMTP id 91F541A3B99 for ; Mon, 25 Sep 2017 17:58:16 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd2-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 2.48 X-Spam-Level: ** X-Spam-Status: No, score=2.48 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, HTML_MESSAGE=2, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, RCVD_IN_SORBS_SPAM=0.5, SPF_PASS=-0.001, WEIRD_PORT=0.001] autolearn=disabled Authentication-Results: spamd2-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=remitly-com.20150623.gappssmtp.com Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd2-us-west.apache.org [10.40.0.9]) (amavisd-new, port 10024) with ESMTP id E9hUKpb8JwP6 for ; Mon, 25 Sep 2017 17:58:13 +0000 (UTC) Received: from mail-oi0-f51.google.com (mail-oi0-f51.google.com [209.85.218.51]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTPS id A3D5D5FAF3 for ; Mon, 25 Sep 2017 17:58:12 +0000 (UTC) Received: by mail-oi0-f51.google.com with SMTP id p126so8148530oih.9 for ; Mon, 25 Sep 2017 10:58:12 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=remitly-com.20150623.gappssmtp.com; s=20150623; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc; bh=sccuXLpv+OP1Of4fV/O7hGZDPpKETImJ44Lr9HvtdCQ=; b=06jub9gRtk4mCuJmxmPpARGbrW0dsrQgI3VxZ4oMLSMumpfUiCSFacBc8xQw8F0iX5 IUlVXr2vqy5is//v4snBpVXPdWYcXYKT7vrkLr4IO0Hw+7Sg/1ye5gwIQoeE/1Gvr7Uy riAiVmjhAbUjbuGwwdiCIpR3rJqn/WrnNJND/QuNNRwCcZgHelOVp5ygl918hWKl9gnY Gx5f9H/Blis7fAfyPjO57Ozh45itPrZmP3cSXtVDvbeCDx/JvCLIjNjLOO9yCf7C8GO4 b5jDYWKLu4IQaNmmXCumhYKkotIYjUG9gcR93EQ7B1LrwP9QaWvGHfI40PSwIDImmBng t68Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc; bh=sccuXLpv+OP1Of4fV/O7hGZDPpKETImJ44Lr9HvtdCQ=; b=mOxyhCu+L+k69WeLvGMh/HBoqdH0eLYaKkM7DxZAXqS2v/oFXIjfHD27Z8ta9QQ5I1 bR1w00MXHLCdu2T4J3+bmU9p44I+18LG3VMzUyEzLOWbqh+p2BAkUJi1Xgpdc8QdYMKK Xw/H+AZk8wOTOxcUai9BZpuIGJvCSrqr0ouOH4qW+nQgwRwEh8hRvdUv8U34ZoKvEiIf 120w4CSzphsBvR2dUblMyYktn2F8R8rxOGPtZVMZDJScAOepN92FPhnTTDOvnWylX3S6 NnW3h7c5yyGl88d2IQpQfc9VHUrrggsjSL6BqOa9EEeoLFA8PekgID20ZK94585rbeSl J1tw== X-Gm-Message-State: AHPjjUjW0lRXHzj44po3e6OenqYtpRoIdw3AAQFERzfRpbyQyujVa1kE jQW3CMVrmLCUciB/kg/Aun5SGwJc2UfKM9cVluTW6w== X-Google-Smtp-Source: AOwi7QBXEG4jTHEtLoEHXY9at0X6w18uGyZyr3830aFngXCxSBhOQE+Yx54Gc8cYT6mNNQpRAvFutY8wixezYvvdVxg= X-Received: by 10.157.81.77 with SMTP id u13mr683022oti.131.1506362286431; Mon, 25 Sep 2017 10:58:06 -0700 (PDT) MIME-Version: 1.0 Received: by 10.157.52.218 with HTTP; Mon, 25 Sep 2017 10:58:05 -0700 (PDT) Received: by 10.157.52.218 with HTTP; Mon, 25 Sep 2017 10:58:05 -0700 (PDT) In-Reply-To: References: From: Emily McMahon Date: Mon, 25 Sep 2017 10:58:05 -0700 Message-ID: Subject: Re: Cannot deploy Flink on YARN To: Sridhar Chellappa Cc: user Content-Type: multipart/alternative; boundary="94eb2c0bd5d8fc785f055a074c83" archived-at: Mon, 25 Sep 2017 17:58:19 -0000 --94eb2c0bd5d8fc785f055a074c83 Content-Type: text/plain; charset="UTF-8" What's in the container log for the container that failed? On Sep 11, 2017 2:17 AM, "Sridhar Chellappa" wrote: I am trying to start Flink(Version 1.3.0) on YARN (Hadoop 2.8.1) by issuing the following command: ~/flink-1.3.0/bin/yarn-session.sh -s 4 -n 10 -jm 4096 -tm 4096-d I am seeing a flurry of these Errors: 2017-09-11 08:17:11,410 INFO org.apache.flink.yarn. YarnClusterDescriptor - Deployment took more than 60 seconds. Please check if the requested resources are available in the YARN cluster 2017-09-11 08:17:11,661 INFO org.apache.flink.yarn. YarnClusterDescriptor - Deployment took more than 60 seconds. Please check if the requested resources are available in the YARN cluster 2017-09-11 08:17:11,912 INFO org.apache.flink.yarn. YarnClusterDescriptor - Deployment took more than 60 seconds. Please check if the requested resources are available in the YARN cluster 2017-09-11 08:17:12,163 INFO org.apache.flink.yarn. YarnClusterDescriptor - Deployment took more than 60 seconds. Please check if the requested resources are available in the YARN cluster And then, my deployment fails with the following exception : Error while deploying YARN cluster: Couldn't deploy Yarn cluster java.lang.RuntimeException: Couldn't deploy Yarn cluster at org.apache.flink.yarn.AbstractYarnClusterDescriptor.deploy( AbstractYarnClusterDescriptor.java:439) at org.apache.flink.yarn.cli.FlinkYarnSessionCli.run( FlinkYarnSessionCli.java:630) at org.apache.flink.yarn.cli.FlinkYarnSessionCli$1.call( FlinkYarnSessionCli.java:486) at org.apache.flink.yarn.cli.FlinkYarnSessionCli$1.call( FlinkYarnSessionCli.java:483) at org.apache.flink.runtime.security.HadoopSecurityContext$1.run( HadoopSecurityContext.java:43) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs( UserGroupInformation.java:1548) at org.apache.flink.runtime.security.HadoopSecurityContext.runSecured( HadoopSecurityContext.java:40) at org.apache.flink.yarn.cli.FlinkYarnSessionCli.main( FlinkYarnSessionCli.java:483) Caused by: org.apache.flink.yarn.AbstractYarnClusterDescriptor$YarnDeploymentException: The YARN application unexpectedly switched to state FAILED during deployment. Diagnostics from YARN: Application application_1504851547322_0003 failed 2 times due to AM Container for appattempt_1504851547322_0003_000002 exited with exitCode: 31 Failing this attempt.Diagnostics: Exception from container-launch. Container id: container_1504851547322_0003_02_000001 Exit code: 31 Stack trace: ExitCodeException exitCode=31: at org.apache.hadoop.util.Shell.runCommand(Shell.java:972) at org.apache.hadoop.util.Shell.run(Shell.java:869) at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute( Shell.java:1170) at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor. launchContainer(DefaultContainerExecutor.java:236) at org.apache.hadoop.yarn.server.nodemanager.containermanager. launcher.ContainerLaunch.call(ContainerLaunch.java:305) at org.apache.hadoop.yarn.server.nodemanager.containermanager. launcher.ContainerLaunch.call(ContainerLaunch.java:84) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker( ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run( ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:748) Further Debugging at the JobManager logs shows : Resetting connection and trying again with a new connection. 2017-09-11 08:17:11,820 INFO org.apache.zookeeper.ZooKeeper - Initiating client connection, connectString=high-availability.zookeeper.quorum: 10.200.0.6:2181,10.200.0.7:2181,10.200.0.9:2181 sessionTimeout=60000 watcher=org.apache.flink.shaded.org.apache.curator.ConnectionState@57bd802b 2017-09-11 08:17:11,927 ERROR org.apache.flink.yarn.YarnApplicationMasterRunner - YARN Application Master initialization failed java.net.UnknownHostException: high-availability.zookeeper.quorum: 10.200.0.6: Name or service not known at java.net.Inet6AddressImpl.lookupAllHostAddr(Native Method) at java.net.InetAddress$2.lookupAllHostAddr(InetAddress.java:928) at java.net.InetAddress.getAddressesFromNameService(InetAddress.java:1323) at java.net.InetAddress.getAllByName0(InetAddress.java:1276) at java.net.InetAddress.getAllByName(InetAddress.java:1192) at java.net.InetAddress.getAllByName(InetAddress.java:1126) at org.apache.zookeeper.client.StaticHostProvider.(StaticHostProvider.java:61) any help in figuring this out will be appreciated --94eb2c0bd5d8fc785f055a074c83 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
What's in the container log for the container th= at failed?=C2=A0

On Sep 11, 2017 2:17 AM, "Sridhar= Chellappa" <flinkenthu@gma= il.com> wrote:
I am trying to start Flink(Version 1.3.0) on YA= RN (Hadoop 2.8.1) by issuing the following command:

~/flink-1.3.0/bi= n/yarn-session.sh -s 4 -n 10 -jm 4096 -tm 4096-d

I am see= ing a flurry of these Errors:

2017-09-11 08:17:11,410 INFO=C2=A0 org= .apache.flink.yarn.YarnClusterDescriptor=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0 - Deployment took more than 60 seconds. Please check if the reque= sted resources are available in the YARN cluster
2017-09-11 08:17:11,661= INFO=C2=A0 org.apache.flink.yarn.YarnClusterDescriptor=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0 - Deployment took more than 60 seconds. Please che= ck if the requested resources are available in the YARN cluster
2017-09-= 11 08:17:11,912 INFO=C2=A0 org.apache.flink.yarn.YarnClusterDescriptor= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 - Deployment took more than 60 seco= nds. Please check if the requested resources are available in the YARN clus= ter
2017-09-11 08:17:12,163 INFO=C2=A0 org.apache.flink.yarn.YarnCl= usterDescriptor=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 - Deployment took mo= re than 60 seconds. Please check if the requested resources are available i= n the YARN cluster


And then, my deployment fails with the = following exception :

Error while deploying YARN cluster: Couldn'= ;t deploy Yarn cluster
java.lang.RuntimeException: Couldn't deploy Y= arn cluster
=C2=A0=C2=A0=C2=A0 at org.apache.flink.yarn.AbstractYar= nClusterDescriptor.deploy(AbstractYarnClusterDescriptor.java= :439)
=C2=A0=C2=A0=C2=A0 at org.apache.flink.yarn.cli.FlinkYarnSess= ionCli.run(FlinkYarnSessionCli.java:630)
=C2=A0=C2=A0=C2=A0 at org.= apache.flink.yarn.cli.FlinkYarnSessionCli$1.call(FlinkYarnSession= Cli.java:486)
=C2=A0=C2=A0=C2=A0 at org.apache.flink.yarn.cli.Flink= YarnSessionCli$1.call(FlinkYarnSessionCli.java:483)
=C2=A0=C2=A0=C2= =A0 at org.apache.flink.runtime.security.HadoopSecurityContext$1.= run(HadoopSecurityContext.java:43)
=C2=A0=C2=A0=C2=A0 at java.secur= ity.AccessController.doPrivileged(Native Method)
=C2=A0=C2=A0= =C2=A0 at javax.security.auth.Subject.doAs(Subject.java:422)
=C2=A0= =C2=A0=C2=A0 at org.apache.hadoop.security.UserGroupInformation.doAs(<= wbr>UserGroupInformation.java:1548)
=C2=A0=C2=A0=C2=A0 at org.apach= e.flink.runtime.security.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:40)
=C2=A0=C2=A0=C2=A0 at org.apache.flink= .yarn.cli.FlinkYarnSessionCli.main(FlinkYarnSessionCli.java:483)<= br>Caused by: org.apache.flink.yarn.AbstractYarnClusterDescriptor$YarnDeploymentException: The YARN application unexpectedly switched to sta= te FAILED during deployment.
Diagnostics from YARN: Application applicat= ion_1504851547322_0003 failed 2 times due to AM Container for appattempt_15= 04851547322_0003_000002 exited with=C2=A0 exitCode: 31
Failing this= attempt.Diagnostics: Exception from container-launch.
Container id: con= tainer_1504851547322_0003_02_000001
Exit code: 31
Stack trace: E= xitCodeException exitCode=3D31:
=C2=A0=C2=A0=C2=A0 at org.apache.hadoop.= util.Shell.runCommand(Shell.java:972)
=C2=A0=C2=A0=C2=A0 at org.apa= che.hadoop.util.Shell.run(Shell.java:869)
=C2=A0=C2=A0=C2=A0 at org= .apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java= :1170)
=C2=A0=C2=A0=C2=A0 at org.apache.hadoop.yarn.server.nodemana= ger.DefaultContainerExecutor.launchContainer(DefaultContaine= rExecutor.java:236)
=C2=A0=C2=A0=C2=A0 at org.apache.hadoop.yarn.se= rver.nodemanager.containermanager.launcher.ContainerLaunch.call(<= wbr>ContainerLaunch.java:305)
=C2=A0=C2=A0=C2=A0 at org.apache.hadoop.ya= rn.server.nodemanager.containermanager.launcher.ContainerLaunch.c= all(ContainerLaunch.java:84)
=C2=A0=C2=A0=C2=A0 at java.util.concur= rent.FutureTask.run(FutureTask.java:266)
=C2=A0=C2=A0=C2=A0 at= java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExec= utor.java:1142)
=C2=A0=C2=A0=C2=A0 at java.util.concurrent.ThreadPo= olExecutor$Worker.run(ThreadPoolExecutor.java:617)
=C2=A0=C2=A0=C2= =A0 at java.lang.Thread.run(Thread.java:748)



Furt= her Debugging at the JobManager logs shows :

Resetting con=
nection and trying again with a new connection.
2017-09-11 08:17:11,820 INFO  org.apache.zookeeper.ZooKeeper               =
                 - Initiating client connection, connectString=3Dhigh-=
availability.zookeeper.quorum: 10.200.0.6:2181,10.200.0.7:2181,10.200.0.9:2181 sessionTimeout=3D60000 watcher=3Dorg.apache=
.flink.shaded.org.apache.curator.ConnectionState@57bd802b
2017-09-11 08:17:11,927 ERROR org.apache.flink.yarn.YarnApplicationMas=
terRunner             - YARN Application Master initialization failed
java.net.UnknownHostException: high-availability.zookeeper.quorum: 10.200.0.6: Name or servic=
e not known
	at java.net.Inet6AddressImpl.lookupAllHostAddr(Native Method)
	at java.net.InetAddress$2.lookupAllHostAddr(InetAddress.java:928=
)
	at java.net.InetAddress.getAddressesFromNameService(InetAddress.=
java:1323)
	at java.net.InetAddress.getAllByName0(InetAddress.java:1276)
	at java.net.InetAddress.getAllByName(InetAddress.java:1192)
	at java.net.InetAddress.getAllByName(InetAddress.java:1126)
	at org.apache.zookeeper.client.StaticHostProvider.<init>(S=
taticHostProvider.java:61)


any help in figuring =
this out will be appreciated

--94eb2c0bd5d8fc785f055a074c83--