flink-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Till Rohrmann (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (FLINK-9891) Flink cluster is not shutdown in YARN mode when Flink client is stopped
Date Sun, 22 Jul 2018 20:21:00 GMT

     [ https://issues.apache.org/jira/browse/FLINK-9891?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Till Rohrmann updated FLINK-9891:
---------------------------------
    Priority: Major  (was: Blocker)

> Flink cluster is not shutdown in YARN mode when Flink client is stopped
> -----------------------------------------------------------------------
>
>                 Key: FLINK-9891
>                 URL: https://issues.apache.org/jira/browse/FLINK-9891
>             Project: Flink
>          Issue Type: Bug
>    Affects Versions: 1.5.0, 1.5.1
>            Reporter: Sergey Krasovskiy
>            Assignee: Shuyi Chen
>            Priority: Major
>
> We are not using session mode and detached mode. The command to run Flink job on YARN
is:
> {code:java}
> <flink-1.5.1>/bin/flink run -m yarn-cluster -yn 1 -yqu flink -yjm 768 -ytm 2048
-j ./flink-quickstart-java-1.0-SNAPSHOT.jar -c org.test.WordCount
> {code}
> Flink CLI logs:
> {code:java}
> Setting HADOOP_CONF_DIR=/etc/hadoop/conf because no HADOOP_CONF_DIR was set.
> SLF4J: Class path contains multiple SLF4J bindings.
> SLF4J: Found binding in [jar:file:/opt/flink-streaming/flink-streaming-1.5.1-1.5.1-bin-hadoop27-scala_2.11-1531485329/lib/slf4j-log4j12-1.7.7.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: Found binding in [jar:file:/usr/hdp/2.4.2.10-1/hadoop/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
> SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
> 2018-07-18 12:47:03,747 INFO org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl
- Timeline service address: http://hmaster-1.ipbl.rgcloud.net:8188/ws/v1/timeline/
> 2018-07-18 12:47:04,222 INFO org.apache.flink.yarn.cli.FlinkYarnSessionCli - No path
for the flink jar passed. Using the location of class org.apache.flink.yarn.YarnClusterDescriptor
to locate the jar
> 2018-07-18 12:47:04,222 INFO org.apache.flink.yarn.cli.FlinkYarnSessionCli - No path
for the flink jar passed. Using the location of class org.apache.flink.yarn.YarnClusterDescriptor
to locate the jar
> 2018-07-18 12:47:04,248 WARN org.apache.flink.yarn.AbstractYarnClusterDescriptor - Neither
the HADOOP_CONF_DIR nor the YARN_CONF_DIR environment variable is set. The Flink YARN Client
needs one of these to be set to properly load the Hadoop configuration for accessing YARN.
> 2018-07-18 12:47:04,409 INFO org.apache.flink.yarn.AbstractYarnClusterDescriptor - Cluster
specification: ClusterSpecification{masterMemoryMB=768, taskManagerMemoryMB=2048, numberTaskManagers=1,
slotsPerTaskManager=1}
> 2018-07-18 12:47:04,783 WARN org.apache.hadoop.hdfs.shortcircuit.DomainSocketFactory
- The short-circuit local reads feature cannot be used because libhadoop cannot be loaded.
> 2018-07-18 12:47:04,788 WARN org.apache.flink.yarn.AbstractYarnClusterDescriptor - The
configuration directory ('/opt/flink-streaming/flink-streaming-1.5.1-1.5.1-bin-hadoop27-scala_2.11-1531485329/conf')
contains both LOG4J and Logback configuration files. Please delete or rename one of them.
> 2018-07-18 12:47:07,846 INFO org.apache.flink.yarn.AbstractYarnClusterDescriptor - Submitting
application master application_1531474158783_10814
> 2018-07-18 12:47:08,073 INFO org.apache.hadoop.yarn.client.api.impl.YarnClientImpl -
Submitted application application_1531474158783_10814
> 2018-07-18 12:47:08,074 INFO org.apache.flink.yarn.AbstractYarnClusterDescriptor - Waiting
for the cluster to be allocated
> 2018-07-18 12:47:08,076 INFO org.apache.flink.yarn.AbstractYarnClusterDescriptor - Deploying
cluster, current state ACCEPTED
> 2018-07-18 12:47:12,864 INFO org.apache.flink.yarn.AbstractYarnClusterDescriptor - YARN
application has been deployed successfully.
> {code}
> Job Manager logs:
> {code:java}
> 2018-07-18 12:47:09,913 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint -
--------------------------------------------------------------------------------
> 2018-07-18 12:47:09,915 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint -
Starting YarnSessionClusterEntrypoint (Version: 1.5.1, Rev:3488f8b, Date:10.07.2018 @ 11:51:27
GMT)
> ...
> {code}
> Issues:
>  # Flink job is running as a Flink session
>  # Ctrl+C or 'stop' doesn't stop a job and YARN cluster
>  # Cancel job via Job Maanager web ui doesn't stop Flink cluster. To kill the cluster
we need to run: yarn application -kill <id>
> We also tried to run a flink job with 'mode: legacy' and we have the same issues:
>  # Add property 'mode: legacy' to ./conf/flink-conf.yaml
>  # Execute the following command:
> {code:java}
> <flink-1.5.1>/bin/flink run -m yarn-cluster -yn 1 -yqu flink -yjm 768 -ytm 2048
-j ./flink-quickstart-java-1.0-SNAPSHOT.jar -c org.test.WordCount
> {code}
> Flink CLI logs:
> {code:java}
> Setting HADOOP_CONF_DIR=/etc/hadoop/conf because no HADOOP_CONF_DIR was set.
> SLF4J: Class path contains multiple SLF4J bindings.
> SLF4J: Found binding in [jar:file:/opt/flink-streaming/flink-streaming-1.5.1-1.5.1-bin-hadoop27-scala_2.11-1531485329/lib/slf4j-log4j12-1.7.7.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: Found binding in [jar:file:/usr/hdp/2.4.2.10-1/hadoop/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
> SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
> 2018-07-18 16:07:13,820 INFO org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl
- Timeline service address: http://hmaster-1.ipbl.rgcloud.net:8188/ws/v1/timeline/
> 2018-07-18 16:07:14,165 INFO org.apache.flink.yarn.cli.FlinkYarnSessionCli - No path
for the flink jar passed. Using the location of class org.apache.flink.yarn.LegacyYarnClusterDescriptor
to locate the jar
> 2018-07-18 16:07:14,165 INFO org.apache.flink.yarn.cli.FlinkYarnSessionCli - No path
for the flink jar passed. Using the location of class org.apache.flink.yarn.LegacyYarnClusterDescriptor
to locate the jar
> 2018-07-18 16:07:14,182 WARN org.apache.flink.yarn.AbstractYarnClusterDescriptor - Neither
the HADOOP_CONF_DIR nor the YARN_CONF_DIR environment variable is set. The Flink YARN Client
needs one of these to be set to properly load the Hadoop configuration for accessing YARN.
> 2018-07-18 16:07:14,356 INFO org.apache.flink.yarn.AbstractYarnClusterDescriptor - Cluster
specification: ClusterSpecification{masterMemoryMB=768, taskManagerMemoryMB=2048, numberTaskManagers=1,
slotsPerTaskManager=1}
> 2018-07-18 16:07:14,703 WARN org.apache.hadoop.hdfs.shortcircuit.DomainSocketFactory
- The short-circuit local reads feature cannot be used because libhadoop cannot be loaded.
> 2018-07-18 16:07:14,708 WARN org.apache.flink.yarn.AbstractYarnClusterDescriptor - The
configuration directory ('/home/skrasovs/flink-conf') contains both LOG4J and Logback configuration
files. Please delete or rename one of them.
> 2018-07-18 16:07:17,678 INFO org.apache.flink.yarn.AbstractYarnClusterDescriptor - Submitting
application master application_1531474158783_10843
> 2018-07-18 16:07:17,717 INFO org.apache.hadoop.yarn.client.api.impl.YarnClientImpl -
Submitted application application_1531474158783_10843
> 2018-07-18 16:07:17,717 INFO org.apache.flink.yarn.AbstractYarnClusterDescriptor - Waiting
for the cluster to be allocated
> 2018-07-18 16:07:17,720 INFO org.apache.flink.yarn.AbstractYarnClusterDescriptor - Deploying
cluster, current state ACCEPTED
> 2018-07-18 16:07:23,527 INFO org.apache.flink.yarn.AbstractYarnClusterDescriptor - YARN
application has been deployed successfully.
> Using the parallelism provided by the remote cluster (1). To use another parallelism,
set it at the ./bin/flink client.
> Starting execution of program
> 2018-07-18 16:07:23,551 INFO org.apache.flink.yarn.YarnClusterClient - Starting program
in interactive mode (detached: false)
> {code}
> Job Manager logs:
> {code:java}
> 2018-07-18 16:07:19,831 INFO org.apache.flink.yarn.YarnApplicationMasterRunner - --------------------------------------------------------------------------------
> 2018-07-18 16:07:19,833 INFO org.apache.flink.yarn.YarnApplicationMasterRunner - Starting
YARN ApplicationMaster / ResourceManager / JobManager (Version: 1.5.1, Rev:3488f8b, Date:10.07.2018
@ 11:51:27 GMT)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message