From issues-return-179266-archive-asf-public=cust-asf.ponee.io@flink.apache.org Sun Jul 22 22:21:06 2018 Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by mx-eu-01.ponee.io (Postfix) with SMTP id 2EF3818062F for ; Sun, 22 Jul 2018 22:21:05 +0200 (CEST) Received: (qmail 36545 invoked by uid 500); 22 Jul 2018 20:21:05 -0000 Mailing-List: contact issues-help@flink.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@flink.apache.org Delivered-To: mailing list issues@flink.apache.org Received: (qmail 36535 invoked by uid 99); 22 Jul 2018 20:21:05 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd2-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 22 Jul 2018 20:21:05 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd2-us-west.apache.org (ASF Mail Server at spamd2-us-west.apache.org) with ESMTP id C8D2B1A1929 for ; Sun, 22 Jul 2018 20:21:04 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd2-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -109.5 X-Spam-Level: X-Spam-Status: No, score=-109.5 tagged_above=-999 required=6.31 tests=[ENV_AND_HDR_SPF_MATCH=-0.5, KAM_ASCII_DIVIDERS=0.8, RCVD_IN_DNSWL_MED=-2.3, SPF_PASS=-0.001, USER_IN_DEF_SPF_WL=-7.5, USER_IN_WHITELIST=-100, WEIRD_PORT=0.001] autolearn=disabled Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd2-us-west.apache.org [10.40.0.9]) (amavisd-new, port 10024) with ESMTP id YGoBVD6vBRXL for ; Sun, 22 Jul 2018 20:21:01 +0000 (UTC) Received: from mailrelay1-us-west.apache.org (mailrelay1-us-west.apache.org [209.188.14.139]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTP id F393D5F27B for ; Sun, 22 Jul 2018 20:21:00 +0000 (UTC) Received: from jira-lw-us.apache.org (unknown [207.244.88.139]) by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id 74FB0E026E for ; Sun, 22 Jul 2018 20:21:00 +0000 (UTC) Received: from jira-lw-us.apache.org (localhost [127.0.0.1]) by jira-lw-us.apache.org (ASF Mail Server at jira-lw-us.apache.org) with ESMTP id 2491E21EE2 for ; Sun, 22 Jul 2018 20:21:00 +0000 (UTC) Date: Sun, 22 Jul 2018 20:21:00 +0000 (UTC) From: "Till Rohrmann (JIRA)" To: issues@flink.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Updated] (FLINK-9891) Flink cluster is not shutdown in YARN mode when Flink client is stopped MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/FLINK-9891?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Till Rohrmann updated FLINK-9891: --------------------------------- Priority: Major (was: Blocker) > Flink cluster is not shutdown in YARN mode when Flink client is stopped > ----------------------------------------------------------------------- > > Key: FLINK-9891 > URL: https://issues.apache.org/jira/browse/FLINK-9891 > Project: Flink > Issue Type: Bug > Affects Versions: 1.5.0, 1.5.1 > Reporter: Sergey Krasovskiy > Assignee: Shuyi Chen > Priority: Major > > We are not using session mode and detached mode. The command to run Flink job on YARN is: > {code:java} > /bin/flink run -m yarn-cluster -yn 1 -yqu flink -yjm 768 -ytm 2048 -j ./flink-quickstart-java-1.0-SNAPSHOT.jar -c org.test.WordCount > {code} > Flink CLI logs: > {code:java} > Setting HADOOP_CONF_DIR=/etc/hadoop/conf because no HADOOP_CONF_DIR was set. > SLF4J: Class path contains multiple SLF4J bindings. > SLF4J: Found binding in [jar:file:/opt/flink-streaming/flink-streaming-1.5.1-1.5.1-bin-hadoop27-scala_2.11-1531485329/lib/slf4j-log4j12-1.7.7.jar!/org/slf4j/impl/StaticLoggerBinder.class] > SLF4J: Found binding in [jar:file:/usr/hdp/2.4.2.10-1/hadoop/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class] > SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. > SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory] > 2018-07-18 12:47:03,747 INFO org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl - Timeline service address: http://hmaster-1.ipbl.rgcloud.net:8188/ws/v1/timeline/ > 2018-07-18 12:47:04,222 INFO org.apache.flink.yarn.cli.FlinkYarnSessionCli - No path for the flink jar passed. Using the location of class org.apache.flink.yarn.YarnClusterDescriptor to locate the jar > 2018-07-18 12:47:04,222 INFO org.apache.flink.yarn.cli.FlinkYarnSessionCli - No path for the flink jar passed. Using the location of class org.apache.flink.yarn.YarnClusterDescriptor to locate the jar > 2018-07-18 12:47:04,248 WARN org.apache.flink.yarn.AbstractYarnClusterDescriptor - Neither the HADOOP_CONF_DIR nor the YARN_CONF_DIR environment variable is set. The Flink YARN Client needs one of these to be set to properly load the Hadoop configuration for accessing YARN. > 2018-07-18 12:47:04,409 INFO org.apache.flink.yarn.AbstractYarnClusterDescriptor - Cluster specification: ClusterSpecification{masterMemoryMB=768, taskManagerMemoryMB=2048, numberTaskManagers=1, slotsPerTaskManager=1} > 2018-07-18 12:47:04,783 WARN org.apache.hadoop.hdfs.shortcircuit.DomainSocketFactory - The short-circuit local reads feature cannot be used because libhadoop cannot be loaded. > 2018-07-18 12:47:04,788 WARN org.apache.flink.yarn.AbstractYarnClusterDescriptor - The configuration directory ('/opt/flink-streaming/flink-streaming-1.5.1-1.5.1-bin-hadoop27-scala_2.11-1531485329/conf') contains both LOG4J and Logback configuration files. Please delete or rename one of them. > 2018-07-18 12:47:07,846 INFO org.apache.flink.yarn.AbstractYarnClusterDescriptor - Submitting application master application_1531474158783_10814 > 2018-07-18 12:47:08,073 INFO org.apache.hadoop.yarn.client.api.impl.YarnClientImpl - Submitted application application_1531474158783_10814 > 2018-07-18 12:47:08,074 INFO org.apache.flink.yarn.AbstractYarnClusterDescriptor - Waiting for the cluster to be allocated > 2018-07-18 12:47:08,076 INFO org.apache.flink.yarn.AbstractYarnClusterDescriptor - Deploying cluster, current state ACCEPTED > 2018-07-18 12:47:12,864 INFO org.apache.flink.yarn.AbstractYarnClusterDescriptor - YARN application has been deployed successfully. > {code} > Job Manager logs: > {code:java} > 2018-07-18 12:47:09,913 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - -------------------------------------------------------------------------------- > 2018-07-18 12:47:09,915 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - Starting YarnSessionClusterEntrypoint (Version: 1.5.1, Rev:3488f8b, Date:10.07.2018 @ 11:51:27 GMT) > ... > {code} > Issues: > # Flink job is running as a Flink session > # Ctrl+C or 'stop' doesn't stop a job and YARN cluster > # Cancel job via Job Maanager web ui doesn't stop Flink cluster. To kill the cluster we need to run: yarn application -kill > We also tried to run a flink job with 'mode: legacy' and we have the same issues: > # Add property 'mode: legacy' to ./conf/flink-conf.yaml > # Execute the following command: > {code:java} > /bin/flink run -m yarn-cluster -yn 1 -yqu flink -yjm 768 -ytm 2048 -j ./flink-quickstart-java-1.0-SNAPSHOT.jar -c org.test.WordCount > {code} > Flink CLI logs: > {code:java} > Setting HADOOP_CONF_DIR=/etc/hadoop/conf because no HADOOP_CONF_DIR was set. > SLF4J: Class path contains multiple SLF4J bindings. > SLF4J: Found binding in [jar:file:/opt/flink-streaming/flink-streaming-1.5.1-1.5.1-bin-hadoop27-scala_2.11-1531485329/lib/slf4j-log4j12-1.7.7.jar!/org/slf4j/impl/StaticLoggerBinder.class] > SLF4J: Found binding in [jar:file:/usr/hdp/2.4.2.10-1/hadoop/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class] > SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. > SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory] > 2018-07-18 16:07:13,820 INFO org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl - Timeline service address: http://hmaster-1.ipbl.rgcloud.net:8188/ws/v1/timeline/ > 2018-07-18 16:07:14,165 INFO org.apache.flink.yarn.cli.FlinkYarnSessionCli - No path for the flink jar passed. Using the location of class org.apache.flink.yarn.LegacyYarnClusterDescriptor to locate the jar > 2018-07-18 16:07:14,165 INFO org.apache.flink.yarn.cli.FlinkYarnSessionCli - No path for the flink jar passed. Using the location of class org.apache.flink.yarn.LegacyYarnClusterDescriptor to locate the jar > 2018-07-18 16:07:14,182 WARN org.apache.flink.yarn.AbstractYarnClusterDescriptor - Neither the HADOOP_CONF_DIR nor the YARN_CONF_DIR environment variable is set. The Flink YARN Client needs one of these to be set to properly load the Hadoop configuration for accessing YARN. > 2018-07-18 16:07:14,356 INFO org.apache.flink.yarn.AbstractYarnClusterDescriptor - Cluster specification: ClusterSpecification{masterMemoryMB=768, taskManagerMemoryMB=2048, numberTaskManagers=1, slotsPerTaskManager=1} > 2018-07-18 16:07:14,703 WARN org.apache.hadoop.hdfs.shortcircuit.DomainSocketFactory - The short-circuit local reads feature cannot be used because libhadoop cannot be loaded. > 2018-07-18 16:07:14,708 WARN org.apache.flink.yarn.AbstractYarnClusterDescriptor - The configuration directory ('/home/skrasovs/flink-conf') contains both LOG4J and Logback configuration files. Please delete or rename one of them. > 2018-07-18 16:07:17,678 INFO org.apache.flink.yarn.AbstractYarnClusterDescriptor - Submitting application master application_1531474158783_10843 > 2018-07-18 16:07:17,717 INFO org.apache.hadoop.yarn.client.api.impl.YarnClientImpl - Submitted application application_1531474158783_10843 > 2018-07-18 16:07:17,717 INFO org.apache.flink.yarn.AbstractYarnClusterDescriptor - Waiting for the cluster to be allocated > 2018-07-18 16:07:17,720 INFO org.apache.flink.yarn.AbstractYarnClusterDescriptor - Deploying cluster, current state ACCEPTED > 2018-07-18 16:07:23,527 INFO org.apache.flink.yarn.AbstractYarnClusterDescriptor - YARN application has been deployed successfully. > Using the parallelism provided by the remote cluster (1). To use another parallelism, set it at the ./bin/flink client. > Starting execution of program > 2018-07-18 16:07:23,551 INFO org.apache.flink.yarn.YarnClusterClient - Starting program in interactive mode (detached: false) > {code} > Job Manager logs: > {code:java} > 2018-07-18 16:07:19,831 INFO org.apache.flink.yarn.YarnApplicationMasterRunner - -------------------------------------------------------------------------------- > 2018-07-18 16:07:19,833 INFO org.apache.flink.yarn.YarnApplicationMasterRunner - Starting YARN ApplicationMaster / ResourceManager / JobManager (Version: 1.5.1, Rev:3488f8b, Date:10.07.2018 @ 11:51:27 GMT) > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)