flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Encho Mishinev <encho.mishi...@gmail.com>
Subject Re: JobGraphs not cleaned up in HA mode
Date Wed, 29 Aug 2018 13:31:25 GMT
Hi Till,

Those are actually the full logs except the two parts I shortened (pipeline
construction and execution). As I said - accessing the UI for Jobmanager 2
redirects to Jobmanager 1 so it seems like he is aware that he is not the
leader. Jobmanager 2 has no other logs than what I sent. Here is the full
end-to-end log of Jobmanager 2 after repeating the experiment again:

Starting Job Manager
sed: cannot rename /opt/flink/conf/sediVa6XS: Device or resource busy
config file:
jobmanager.rpc.address: flink-jobmanager-2
jobmanager.rpc.port: 6123
jobmanager.heap.size: 8192
taskmanager.heap.size: 8192
taskmanager.numberOfTaskSlots: 4
high-availability: zookeeper
high-availability.storageDir:
hdfs://hdfs-namenode-0.hdfs-namenode.default.svc.cluster.local:8020/flink/high-availability
high-availability.zookeeper.quorum: zk-cs:2181
high-availability.zookeeper.path.root: /flink
high-availability.jobmanager.port: 50010
state.backend: filesystem
state.checkpoints.dir:
hdfs://hdfs-namenode-0.hdfs-namenode.default.svc.cluster.local:8020/flink/checkpoints
state.savepoints.dir:
hdfs://hdfs-namenode-0.hdfs-namenode.default.svc.cluster.local:8020/flink/savepoints
state.backend.incremental: false
fs.default-scheme:
hdfs://hdfs-namenode-0.hdfs-namenode.default.svc.cluster.local:8020
rest.port: 8081
web.upload.dir: /opt/flink/upload
query.server.port: 6125
taskmanager.numberOfTaskSlots: 4
classloader.parent-first-patterns.additional: org.apache.xerces.
blob.storage.directory: /opt/flink/blob-server
blob.server.port: 6124
blob.server.port: 6124
query.server.port: 6125
Starting standalonesession as a console application on host
flink-jobmanager-2-7844b78c9-zwdqv.
2018-08-29 13:19:24,047 INFO
org.apache.flink.runtime.entrypoint.ClusterEntrypoint         -
--------------------------------------------------------------------------------
2018-08-29 13:19:24,049 INFO
org.apache.flink.runtime.entrypoint.ClusterEntrypoint         -  Starting
StandaloneSessionClusterEntrypoint (Version: 1.5.3, Rev:614f216,
Date:16.08.2018 @ 06:39:50 GMT)
2018-08-29 13:19:24,049 INFO
org.apache.flink.runtime.entrypoint.ClusterEntrypoint         -  OS current
user: flink
2018-08-29 13:19:24,367 WARN  org.apache.hadoop.util.NativeCodeLoader
                 - Unable to load native-hadoop library for your
platform... using builtin-java classes where applicable
2018-08-29 13:19:24,431 INFO
org.apache.flink.runtime.entrypoint.ClusterEntrypoint         -  Current
Hadoop/Kerberos user: flink
2018-08-29 13:19:24,431 INFO
org.apache.flink.runtime.entrypoint.ClusterEntrypoint         -  JVM:
OpenJDK 64-Bit Server VM - Oracle Corporation - 1.8/25.181-b13
2018-08-29 13:19:24,431 INFO
org.apache.flink.runtime.entrypoint.ClusterEntrypoint         -  Maximum
heap size: 6702 MiBytes
2018-08-29 13:19:24,431 INFO
org.apache.flink.runtime.entrypoint.ClusterEntrypoint         -  JAVA_HOME:
/docker-java-home/jre
2018-08-29 13:19:24,434 INFO
org.apache.flink.runtime.entrypoint.ClusterEntrypoint         -  Hadoop
version: 2.7.5
2018-08-29 13:19:24,434 INFO
org.apache.flink.runtime.entrypoint.ClusterEntrypoint         -  JVM
Options:
2018-08-29 13:19:24,434 INFO
org.apache.flink.runtime.entrypoint.ClusterEntrypoint         -
 -Dlog4j.configuration=file:/opt/flink/conf/log4j-console.properties
2018-08-29 13:19:24,434 INFO
org.apache.flink.runtime.entrypoint.ClusterEntrypoint         -
 -Dlogback.configurationFile=file:/opt/flink/conf/logback-console.xml
2018-08-29 13:19:24,434 INFO
org.apache.flink.runtime.entrypoint.ClusterEntrypoint         -  Program
Arguments:
2018-08-29 13:19:24,434 INFO
org.apache.flink.runtime.entrypoint.ClusterEntrypoint         -
 --configDir
2018-08-29 13:19:24,434 INFO
org.apache.flink.runtime.entrypoint.ClusterEntrypoint         -
 /opt/flink/conf
2018-08-29 13:19:24,434 INFO
org.apache.flink.runtime.entrypoint.ClusterEntrypoint         -
 --executionMode
2018-08-29 13:19:24,434 INFO
org.apache.flink.runtime.entrypoint.ClusterEntrypoint         -     cluster
2018-08-29 13:19:24,434 INFO
org.apache.flink.runtime.entrypoint.ClusterEntrypoint         -     --host
2018-08-29 13:19:24,434 INFO
org.apache.flink.runtime.entrypoint.ClusterEntrypoint         -     cluster
2018-08-29 13:19:24,435 INFO
org.apache.flink.runtime.entrypoint.ClusterEntrypoint         -  Classpath:
/opt/flink/lib/flink-python_2.11-1.5.3.jar:/opt/flink/lib/flink-shaded-hadoop2-uber-1.5.3.jar:/opt/flink/lib/log4j-1.2.17.jar:/opt/flink/lib/slf4j-log4j12-1.7.7.jar:/opt/flink/lib/flink-dist_2.11-1.5.3.jar:::
2018-08-29 13:19:24,435 INFO
org.apache.flink.runtime.entrypoint.ClusterEntrypoint         -
--------------------------------------------------------------------------------
2018-08-29 13:19:24,436 INFO
org.apache.flink.runtime.entrypoint.ClusterEntrypoint         - Registered
UNIX signal handlers for [TERM, HUP, INT]
2018-08-29 13:19:24,442 INFO
org.apache.flink.configuration.GlobalConfiguration            - Loading
configuration property: jobmanager.rpc.address, flink-jobmanager-2
2018-08-29 13:19:24,442 INFO
org.apache.flink.configuration.GlobalConfiguration            - Loading
configuration property: jobmanager.rpc.port, 6123
2018-08-29 13:19:24,442 INFO
org.apache.flink.configuration.GlobalConfiguration            - Loading
configuration property: jobmanager.heap.size, 8192
2018-08-29 13:19:24,442 INFO
org.apache.flink.configuration.GlobalConfiguration            - Loading
configuration property: taskmanager.heap.size, 8192
2018-08-29 13:19:24,442 INFO
org.apache.flink.configuration.GlobalConfiguration            - Loading
configuration property: taskmanager.numberOfTaskSlots, 4
2018-08-29 13:19:24,442 INFO
org.apache.flink.configuration.GlobalConfiguration            - Loading
configuration property: high-availability, zookeeper
2018-08-29 13:19:24,443 INFO
org.apache.flink.configuration.GlobalConfiguration            - Loading
configuration property: high-availability.storageDir,
hdfs://hdfs-namenode-0.hdfs-namenode.default.svc.cluster.local:8020/flink/high-availability
2018-08-29 13:19:24,443 INFO
org.apache.flink.configuration.GlobalConfiguration            - Loading
configuration property: high-availability.zookeeper.quorum, zk-cs:2181
2018-08-29 13:19:24,443 INFO
org.apache.flink.configuration.GlobalConfiguration            - Loading
configuration property: high-availability.zookeeper.path.root, /flink
2018-08-29 13:19:24,443 INFO
org.apache.flink.configuration.GlobalConfiguration            - Loading
configuration property: high-availability.jobmanager.port, 50010
2018-08-29 13:19:24,443 INFO
org.apache.flink.configuration.GlobalConfiguration            - Loading
configuration property: state.backend, filesystem
2018-08-29 13:19:24,443 INFO
org.apache.flink.configuration.GlobalConfiguration            - Loading
configuration property: state.checkpoints.dir,
hdfs://hdfs-namenode-0.hdfs-namenode.default.svc.cluster.local:8020/flink/checkpoints
2018-08-29 13:19:24,444 INFO
org.apache.flink.configuration.GlobalConfiguration            - Loading
configuration property: state.savepoints.dir,
hdfs://hdfs-namenode-0.hdfs-namenode.default.svc.cluster.local:8020/flink/savepoints
2018-08-29 13:19:24,444 INFO
org.apache.flink.configuration.GlobalConfiguration            - Loading
configuration property: state.backend.incremental, false
2018-08-29 13:19:24,444 INFO
org.apache.flink.configuration.GlobalConfiguration            - Loading
configuration property: fs.default-scheme,
hdfs://hdfs-namenode-0.hdfs-namenode.default.svc.cluster.local:8020
2018-08-29 13:19:24,444 INFO
org.apache.flink.configuration.GlobalConfiguration            - Loading
configuration property: rest.port, 8081
2018-08-29 13:19:24,444 INFO
org.apache.flink.configuration.GlobalConfiguration            - Loading
configuration property: web.upload.dir, /opt/flink/upload
2018-08-29 13:19:24,444 INFO
org.apache.flink.configuration.GlobalConfiguration            - Loading
configuration property: query.server.port, 6125
2018-08-29 13:19:24,445 INFO
org.apache.flink.configuration.GlobalConfiguration            - Loading
configuration property: taskmanager.numberOfTaskSlots, 4
2018-08-29 13:19:24,445 INFO
org.apache.flink.configuration.GlobalConfiguration            - Loading
configuration property: classloader.parent-first-patterns.additional,
org.apache.xerces.
2018-08-29 13:19:24,445 INFO
org.apache.flink.configuration.GlobalConfiguration            - Loading
configuration property: blob.storage.directory, /opt/flink/blob-server
2018-08-29 13:19:24,445 INFO
org.apache.flink.configuration.GlobalConfiguration            - Loading
configuration property: blob.server.port, 6124
2018-08-29 13:19:24,445 INFO
org.apache.flink.configuration.GlobalConfiguration            - Loading
configuration property: blob.server.port, 6124
2018-08-29 13:19:24,445 INFO
org.apache.flink.configuration.GlobalConfiguration            - Loading
configuration property: query.server.port, 6125
2018-08-29 13:19:24,461 INFO
org.apache.flink.runtime.entrypoint.ClusterEntrypoint         - Starting
StandaloneSessionClusterEntrypoint.
2018-08-29 13:19:24,461 INFO
org.apache.flink.runtime.entrypoint.ClusterEntrypoint         - Install
default filesystem.
2018-08-29 13:19:24,472 INFO
org.apache.flink.runtime.entrypoint.ClusterEntrypoint         - Install
security context.
2018-08-29 13:19:24,506 INFO
org.apache.flink.runtime.security.modules.HadoopModule        - Hadoop user
set to flink (auth:SIMPLE)
2018-08-29 13:19:24,522 INFO
org.apache.flink.runtime.entrypoint.ClusterEntrypoint         -
Initializing cluster services.
2018-08-29 13:19:24,532 INFO
org.apache.flink.runtime.entrypoint.ClusterEntrypoint         - Trying to
start actor system at flink-jobmanager-2:50010
2018-08-29 13:19:24,996 INFO  akka.event.slf4j.Slf4jLogger
                - Slf4jLogger started
2018-08-29 13:19:25,050 INFO  akka.remote.Remoting
                - Starting remoting
2018-08-29 13:19:25,209 INFO  akka.remote.Remoting
                - Remoting started; listening on addresses
:[akka.tcp://flink@flink-jobmanager-2:50010]
2018-08-29 13:19:25,216 INFO
org.apache.flink.runtime.entrypoint.ClusterEntrypoint         - Actor
system started at akka.tcp://flink@flink-jobmanager-2:50010
2018-08-29 13:19:25,648 INFO
org.apache.flink.runtime.blob.FileSystemBlobStore             - Creating
highly available BLOB storage directory at
hdfs://hdfs-namenode-0.hdfs-namenode.default.svc.cluster.local:8020/flink/high-availability//default/blob
2018-08-29 13:19:25,702 INFO  org.apache.flink.runtime.util.ZooKeeperUtils
                - Enforcing default ACL for ZK connections
2018-08-29 13:19:25,703 INFO  org.apache.flink.runtime.util.ZooKeeperUtils
                - Using '/flink/default' as Zookeeper namespace.
2018-08-29 13:19:25,750 INFO
org.apache.flink.shaded.curator.org.apache.curator.framework.imps.CuratorFrameworkImpl
- Starting
2018-08-29 13:19:25,756 INFO
org.apache.flink.shaded.zookeeper.org.apache.zookeeper.ZooKeeper  - Client
environment:zookeeper.version=3.4.10-39d3a4f269333c922ed3db283be479f9deacaa0f,
built on 03/23/2017 10:13 GMT
2018-08-29 13:19:25,756 INFO
org.apache.flink.shaded.zookeeper.org.apache.zookeeper.ZooKeeper  - Client
environment:host.name=flink-jobmanager-2-7844b78c9-zwdqv
2018-08-29 13:19:25,757 INFO
org.apache.flink.shaded.zookeeper.org.apache.zookeeper.ZooKeeper  - Client
environment:java.version=1.8.0_181
2018-08-29 13:19:25,757 INFO
org.apache.flink.shaded.zookeeper.org.apache.zookeeper.ZooKeeper  - Client
environment:java.vendor=Oracle Corporation
2018-08-29 13:19:25,757 INFO
org.apache.flink.shaded.zookeeper.org.apache.zookeeper.ZooKeeper  - Client
environment:java.home=/usr/lib/jvm/java-8-openjdk-amd64/jre
2018-08-29 13:19:25,757 INFO
org.apache.flink.shaded.zookeeper.org.apache.zookeeper.ZooKeeper  - Client
environment:java.class.path=/opt/flink/lib/flink-python_2.11-1.5.3.jar:/opt/flink/lib/flink-shaded-hadoop2-uber-1.5.3.jar:/opt/flink/lib/log4j-1.2.17.jar:/opt/flink/lib/slf4j-log4j12-1.7.7.jar:/opt/flink/lib/flink-dist_2.11-1.5.3.jar:::
2018-08-29 13:19:25,757 INFO
org.apache.flink.shaded.zookeeper.org.apache.zookeeper.ZooKeeper  - Client
environment:java.library.path=/usr/java/packages/lib/amd64:/usr/lib/x86_64-linux-gnu/jni:/lib/x86_64-linux-gnu:/usr/lib/x86_64-linux-gnu:/usr/lib/jni:/lib:/usr/lib
2018-08-29 13:19:25,757 INFO
org.apache.flink.shaded.zookeeper.org.apache.zookeeper.ZooKeeper  - Client
environment:java.io.tmpdir=/tmp
2018-08-29 13:19:25,757 INFO
org.apache.flink.shaded.zookeeper.org.apache.zookeeper.ZooKeeper  - Client
environment:java.compiler=<NA>
2018-08-29 13:19:25,757 INFO
org.apache.flink.shaded.zookeeper.org.apache.zookeeper.ZooKeeper  - Client
environment:os.name=Linux
2018-08-29 13:19:25,757 INFO
org.apache.flink.shaded.zookeeper.org.apache.zookeeper.ZooKeeper  - Client
environment:os.arch=amd64
2018-08-29 13:19:25,757 INFO
org.apache.flink.shaded.zookeeper.org.apache.zookeeper.ZooKeeper  - Client
environment:os.version=4.4.0-1027-gke
2018-08-29 13:19:25,757 INFO
org.apache.flink.shaded.zookeeper.org.apache.zookeeper.ZooKeeper  - Client
environment:user.name=flink
2018-08-29 13:19:25,757 INFO
org.apache.flink.shaded.zookeeper.org.apache.zookeeper.ZooKeeper  - Client
environment:user.home=/opt/flink
2018-08-29 13:19:25,757 INFO
org.apache.flink.shaded.zookeeper.org.apache.zookeeper.ZooKeeper  - Client
environment:user.dir=/opt/flink
2018-08-29 13:19:25,758 INFO
org.apache.flink.shaded.zookeeper.org.apache.zookeeper.ZooKeeper  -
Initiating client connection, connectString=zk-cs:2181 sessionTimeout=60000
watcher=org.apache.flink.shaded.curator.org.apache.curator.ConnectionState@17ae7628
2018-08-29 13:19:25,775 WARN
org.apache.flink.shaded.zookeeper.org.apache.zookeeper.ClientCnxn  - SASL
configuration failed: javax.security.auth.login.LoginException: No JAAS
configuration section named 'Client' was found in specified JAAS
configuration file: '/tmp/jaas-5000339768628554676.conf'. Will continue
connection to Zookeeper server without SASL authentication, if Zookeeper
server allows it.
2018-08-29 13:19:25,776 INFO
org.apache.flink.shaded.zookeeper.org.apache.zookeeper.ClientCnxn  -
Opening socket connection to server zk-cs.default.svc.cluster.local/
10.27.248.104:2181
2018-08-29 13:19:25,777 ERROR
org.apache.flink.shaded.curator.org.apache.curator.ConnectionState  -
Authentication failed
2018-08-29 13:19:25,777 INFO  org.apache.flink.runtime.blob.BlobServer
                - Created BLOB server storage directory
/opt/flink/blob-server/blobStore-40cefeee-e8d1-4522-aea3-957d9f7fbeee
2018-08-29 13:19:25,777 INFO
org.apache.flink.shaded.zookeeper.org.apache.zookeeper.ClientCnxn  - Socket
connection established to zk-cs.default.svc.cluster.local/10.27.248.104:2181,
initiating session
2018-08-29 13:19:25,778 INFO  org.apache.flink.runtime.blob.BlobServer
                - Started BLOB server at 0.0.0.0:6124 - max concurrent
requests: 50 - max backlog: 1000
2018-08-29 13:19:25,788 INFO
org.apache.flink.shaded.zookeeper.org.apache.zookeeper.ClientCnxn  -
Session establishment complete on server zk-cs.default.svc.cluster.local/
10.27.248.104:2181, sessionid = 0x26584fd55690009, negotiated timeout =
40000
2018-08-29 13:19:25,789 INFO
org.apache.flink.shaded.curator.org.apache.curator.framework.state.ConnectionStateManager
- State change: CONNECTED
2018-08-29 13:19:25,793 INFO
org.apache.flink.runtime.metrics.MetricRegistryImpl           - No metrics
reporter configured, no metrics will be exposed/reported.
2018-08-29 13:19:25,798 INFO
org.apache.flink.runtime.dispatcher.FileArchivedExecutionGraphStore  -
Initializing FileArchivedExecutionGraphStore: Storage directory
/tmp/executionGraphStore-76cce4e7-84ea-4624-a847-bbd7fdc4f109, expiration
time 3600000, maximum cache size 52428800 bytes.
2018-08-29 13:19:25,824 INFO
org.apache.flink.runtime.blob.TransientBlobCache              - Created
BLOB cache storage directory
/opt/flink/blob-server/blobStore-7c6c2db0-f7ab-4cb6-909d-6c9cbfd78215
2018-08-29 13:19:25,838 WARN  org.apache.flink.configuration.Configuration
                - Config uses deprecated configuration key
'jobmanager.rpc.address' instead of proper key 'rest.address'
2018-08-29 13:19:25,839 WARN
org.apache.flink.runtime.dispatcher.DispatcherRestEndpoint    - Upload
directory /opt/flink/upload/flink-web-upload does not exist, or has been
deleted externally. Previously uploaded files are no longer available.
2018-08-29 13:19:25,840 INFO
org.apache.flink.runtime.dispatcher.DispatcherRestEndpoint    - Created
directory /opt/flink/upload/flink-web-upload for file uploads.
2018-08-29 13:19:25,843 INFO
org.apache.flink.runtime.dispatcher.DispatcherRestEndpoint    - Starting
rest endpoint.
2018-08-29 13:19:26,143 WARN
org.apache.flink.runtime.webmonitor.WebMonitorUtils           - Log file
environment variable 'log.file' is not set.
2018-08-29 13:19:26,143 WARN
org.apache.flink.runtime.webmonitor.WebMonitorUtils           - JobManager
log files are unavailable in the web dashboard. Log file location not found
in environment variable 'log.file' or configuration key 'Key:
'web.log.path' , default: null (deprecated keys:
[jobmanager.web.log.path])'.
2018-08-29 13:19:26,216 INFO
org.apache.flink.runtime.dispatcher.DispatcherRestEndpoint    - Rest
endpoint listening at flink-jobmanager-2:8081
2018-08-29 13:19:26,217 INFO
org.apache.flink.runtime.leaderelection.ZooKeeperLeaderElectionService  -
Starting ZooKeeperLeaderElectionService
ZooKeeperLeaderElectionService{leaderPath='/leader/rest_server_lock'}.
2018-08-29 13:19:26,236 INFO
org.apache.flink.runtime.dispatcher.DispatcherRestEndpoint    - Web
frontend listening at http://flink-jobmanager-2:8081.
2018-08-29 13:19:26,248 INFO
org.apache.flink.runtime.rpc.akka.AkkaRpcService              - Starting
RPC endpoint for
org.apache.flink.runtime.resourcemanager.StandaloneResourceManager at
akka://flink/user/resourcemanager .
2018-08-29 13:19:26,323 INFO
org.apache.flink.runtime.rpc.akka.AkkaRpcService              - Starting
RPC endpoint for org.apache.flink.runtime.dispatcher.StandaloneDispatcher
at akka://flink/user/dispatcher .
2018-08-29 13:19:26,335 INFO
org.apache.flink.runtime.leaderelection.ZooKeeperLeaderElectionService  -
Starting ZooKeeperLeaderElectionService
ZooKeeperLeaderElectionService{leaderPath='/leader/resource_manager_lock'}.
2018-08-29 13:19:26,336 INFO
org.apache.flink.runtime.leaderretrieval.ZooKeeperLeaderRetrievalService  -
Starting ZooKeeperLeaderRetrievalService /leader/resource_manager_lock.
2018-08-29 13:19:26,338 INFO
org.apache.flink.runtime.leaderelection.ZooKeeperLeaderElectionService  -
Starting ZooKeeperLeaderElectionService
ZooKeeperLeaderElectionService{leaderPath='/leader/dispatcher_lock'}.
2018-08-29 13:19:26,339 INFO
org.apache.flink.runtime.leaderretrieval.ZooKeeperLeaderRetrievalService  -
Starting ZooKeeperLeaderRetrievalService /leader/dispatcher_lock.
2018-08-29 13:23:21,513 INFO
org.apache.flink.runtime.jobmanager.ZooKeeperSubmittedJobGraphStore  -
Recovered SubmittedJobGraph(836b29f7c66bbeb6ed8bae41cb9b316c, null).

Thanks,
Encho

On Wed, Aug 29, 2018 at 3:59 PM Till Rohrmann <trohrmann@apache.org> wrote:

> Hi Encho,
>
> thanks for sending the first part of the logs. What I would actually be
> interested in are the complete logs because somewhere in the jobmanager-2
> logs there must be a log statement saying that the respective dispatcher
> gained leadership. I would like to see why this happens but for this to
> debug the complete logs are necessary. It would be awesome if you could
> send them to me. Thanks a lot!
>
> Cheers,
> Till
>
> On Wed, Aug 29, 2018 at 2:00 PM Encho Mishinev <encho.mishinev@gmail.com>
> wrote:
>
>> Hi Till,
>>
>> I will use the approach with a k8s deployment and HA mode with a single
>> job manager. Nonetheless, here are the logs I just produced by repeating
>> the aforementioned experiment, hope they help in debugging:
>>
>> *- Starting Jobmanager-1:*
>>
>> Starting Job Manager
>> sed: cannot rename /opt/flink/conf/sedR98XPn: Device or resource busy
>> config file:
>> jobmanager.rpc.address: flink-jobmanager-1
>> jobmanager.rpc.port: 6123
>> jobmanager.heap.size: 8192
>> taskmanager.heap.size: 8192
>> taskmanager.numberOfTaskSlots: 4
>> high-availability: zookeeper
>> high-availability.storageDir:
>> hdfs://hdfs-namenode-0.hdfs-namenode.default.svc.cluster.local:8020/flink/high-availability
>> high-availability.zookeeper.quorum: zk-cs:2181
>> high-availability.zookeeper.path.root: /flink
>> high-availability.jobmanager.port: 50010
>> state.backend: filesystem
>> state.checkpoints.dir:
>> hdfs://hdfs-namenode-0.hdfs-namenode.default.svc.cluster.local:8020/flink/checkpoints
>> state.savepoints.dir:
>> hdfs://hdfs-namenode-0.hdfs-namenode.default.svc.cluster.local:8020/flink/savepoints
>> state.backend.incremental: false
>> fs.default-scheme:
>> hdfs://hdfs-namenode-0.hdfs-namenode.default.svc.cluster.local:8020
>> rest.port: 8081
>> web.upload.dir: /opt/flink/upload
>> query.server.port: 6125
>> taskmanager.numberOfTaskSlots: 4
>> classloader.parent-first-patterns.additional: org.apache.xerces.
>> blob.storage.directory: /opt/flink/blob-server
>> blob.server.port: 6124
>> blob.server.port: 6124
>> query.server.port: 6125
>> Starting standalonesession as a console application on host
>> flink-jobmanager-1-f76fd4df8-ftwt9.
>> 2018-08-29 11:41:48,806 INFO
>> org.apache.flink.runtime.entrypoint.ClusterEntrypoint         -
>> --------------------------------------------------------------------------------
>> 2018-08-29 11:41:48,807 INFO
>> org.apache.flink.runtime.entrypoint.ClusterEntrypoint         -  Starting
>> StandaloneSessionClusterEntrypoint (Version: 1.5.3, Rev:614f216,
>> Date:16.08.2018 @ 06:39:50 GMT)
>> 2018-08-29 11:41:48,807 INFO
>> org.apache.flink.runtime.entrypoint.ClusterEntrypoint         -  OS current
>> user: flink
>> 2018-08-29 11:41:49,134 WARN  org.apache.hadoop.util.NativeCodeLoader
>>                    - Unable to load native-hadoop library for your
>> platform... using builtin-java classes where applicable
>> 2018-08-29 11:41:49,210 INFO
>> org.apache.flink.runtime.entrypoint.ClusterEntrypoint         -  Current
>> Hadoop/Kerberos user: flink
>> 2018-08-29 11:41:49,210 INFO
>> org.apache.flink.runtime.entrypoint.ClusterEntrypoint         -  JVM:
>> OpenJDK 64-Bit Server VM - Oracle Corporation - 1.8/25.181-b13
>> 2018-08-29 11:41:49,210 INFO
>> org.apache.flink.runtime.entrypoint.ClusterEntrypoint         -  Maximum
>> heap size: 6702 MiBytes
>> 2018-08-29 11:41:49,210 INFO
>> org.apache.flink.runtime.entrypoint.ClusterEntrypoint         -  JAVA_HOME:
>> /docker-java-home/jre
>> 2018-08-29 11:41:49,213 INFO
>> org.apache.flink.runtime.entrypoint.ClusterEntrypoint         -  Hadoop
>> version: 2.7.5
>> 2018-08-29 11:41:49,213 INFO
>> org.apache.flink.runtime.entrypoint.ClusterEntrypoint         -  JVM
>> Options:
>> 2018-08-29 11:41:49,213 INFO
>> org.apache.flink.runtime.entrypoint.ClusterEntrypoint         -
>>  -Dlog4j.configuration=file:/opt/flink/conf/log4j-console.properties
>> 2018-08-29 11:41:49,213 INFO
>> org.apache.flink.runtime.entrypoint.ClusterEntrypoint         -
>>  -Dlogback.configurationFile=file:/opt/flink/conf/logback-console.xml
>> 2018-08-29 11:41:49,213 INFO
>> org.apache.flink.runtime.entrypoint.ClusterEntrypoint         -  Program
>> Arguments:
>> 2018-08-29 11:41:49,213 INFO
>> org.apache.flink.runtime.entrypoint.ClusterEntrypoint         -
>>  --configDir
>> 2018-08-29 11:41:49,213 INFO
>> org.apache.flink.runtime.entrypoint.ClusterEntrypoint         -
>>  /opt/flink/conf
>> 2018-08-29 11:41:49,213 INFO
>> org.apache.flink.runtime.entrypoint.ClusterEntrypoint         -
>>  --executionMode
>> 2018-08-29 11:41:49,213 INFO
>> org.apache.flink.runtime.entrypoint.ClusterEntrypoint         -     cluster
>> 2018-08-29 11:41:49,214 INFO
>> org.apache.flink.runtime.entrypoint.ClusterEntrypoint         -     --host
>> 2018-08-29 11:41:49,214 INFO
>> org.apache.flink.runtime.entrypoint.ClusterEntrypoint         -     cluster
>> 2018-08-29 11:41:49,214 INFO
>> org.apache.flink.runtime.entrypoint.ClusterEntrypoint         -  Classpath:
>> /opt/flink/lib/flink-python_2.11-1.5.3.jar:/opt/flink/lib/flink-shaded-hadoop2-uber-1.5.3.jar:/opt/flink/lib/log4j-1.2.17.jar:/opt/flink/lib/slf4j-log4j12-1.7.7.jar:/opt/flink/lib/flink-dist_2.11-1.5.3.jar:::
>> 2018-08-29 11:41:49,214 INFO
>> org.apache.flink.runtime.entrypoint.ClusterEntrypoint         -
>> --------------------------------------------------------------------------------
>> 2018-08-29 11:41:49,215 INFO
>> org.apache.flink.runtime.entrypoint.ClusterEntrypoint         - Registered
>> UNIX signal handlers for [TERM, HUP, INT]
>> 2018-08-29 11:41:49,221 INFO
>> org.apache.flink.configuration.GlobalConfiguration            - Loading
>> configuration property: jobmanager.rpc.address, flink-jobmanager-1
>> 2018-08-29 11:41:49,221 INFO
>> org.apache.flink.configuration.GlobalConfiguration            - Loading
>> configuration property: jobmanager.rpc.port, 6123
>> 2018-08-29 11:41:49,221 INFO
>> org.apache.flink.configuration.GlobalConfiguration            - Loading
>> configuration property: jobmanager.heap.size, 8192
>> 2018-08-29 11:41:49,221 INFO
>> org.apache.flink.configuration.GlobalConfiguration            - Loading
>> configuration property: taskmanager.heap.size, 8192
>> 2018-08-29 11:41:49,221 INFO
>> org.apache.flink.configuration.GlobalConfiguration            - Loading
>> configuration property: taskmanager.numberOfTaskSlots, 4
>> 2018-08-29 11:41:49,222 INFO
>> org.apache.flink.configuration.GlobalConfiguration            - Loading
>> configuration property: high-availability, zookeeper
>> 2018-08-29 11:41:49,222 INFO
>> org.apache.flink.configuration.GlobalConfiguration            - Loading
>> configuration property: high-availability.storageDir,
>> hdfs://hdfs-namenode-0.hdfs-namenode.default.svc.cluster.local:8020/flink/high-availability
>> 2018-08-29 11:41:49,222 INFO
>> org.apache.flink.configuration.GlobalConfiguration            - Loading
>> configuration property: high-availability.zookeeper.quorum, zk-cs:2181
>> 2018-08-29 11:41:49,222 INFO
>> org.apache.flink.configuration.GlobalConfiguration            - Loading
>> configuration property: high-availability.zookeeper.path.root, /flink
>> 2018-08-29 11:41:49,223 INFO
>> org.apache.flink.configuration.GlobalConfiguration            - Loading
>> configuration property: high-availability.jobmanager.port, 50010
>> 2018-08-29 11:41:49,223 INFO
>> org.apache.flink.configuration.GlobalConfiguration            - Loading
>> configuration property: state.backend, filesystem
>> 2018-08-29 11:41:49,223 INFO
>> org.apache.flink.configuration.GlobalConfiguration            - Loading
>> configuration property: state.checkpoints.dir,
>> hdfs://hdfs-namenode-0.hdfs-namenode.default.svc.cluster.local:8020/flink/checkpoints
>> 2018-08-29 11:41:49,223 INFO
>> org.apache.flink.configuration.GlobalConfiguration            - Loading
>> configuration property: state.savepoints.dir,
>> hdfs://hdfs-namenode-0.hdfs-namenode.default.svc.cluster.local:8020/flink/savepoints
>> 2018-08-29 11:41:49,223 INFO
>> org.apache.flink.configuration.GlobalConfiguration            - Loading
>> configuration property: state.backend.incremental, false
>> 2018-08-29 11:41:49,224 INFO
>> org.apache.flink.configuration.GlobalConfiguration            - Loading
>> configuration property: fs.default-scheme,
>> hdfs://hdfs-namenode-0.hdfs-namenode.default.svc.cluster.local:8020
>> 2018-08-29 11:41:49,224 INFO
>> org.apache.flink.configuration.GlobalConfiguration            - Loading
>> configuration property: rest.port, 8081
>> 2018-08-29 11:41:49,224 INFO
>> org.apache.flink.configuration.GlobalConfiguration            - Loading
>> configuration property: web.upload.dir, /opt/flink/upload
>> 2018-08-29 11:41:49,224 INFO
>> org.apache.flink.configuration.GlobalConfiguration            - Loading
>> configuration property: query.server.port, 6125
>> 2018-08-29 11:41:49,225 INFO
>> org.apache.flink.configuration.GlobalConfiguration            - Loading
>> configuration property: taskmanager.numberOfTaskSlots, 4
>> 2018-08-29 11:41:49,225 INFO
>> org.apache.flink.configuration.GlobalConfiguration            - Loading
>> configuration property: classloader.parent-first-patterns.additional,
>> org.apache.xerces.
>> 2018-08-29 11:41:49,225 INFO
>> org.apache.flink.configuration.GlobalConfiguration            - Loading
>> configuration property: blob.storage.directory, /opt/flink/blob-server
>> 2018-08-29 11:41:49,225 INFO
>> org.apache.flink.configuration.GlobalConfiguration            - Loading
>> configuration property: blob.server.port, 6124
>> 2018-08-29 11:41:49,225 INFO
>> org.apache.flink.configuration.GlobalConfiguration            - Loading
>> configuration property: blob.server.port, 6124
>> 2018-08-29 11:41:49,225 INFO
>> org.apache.flink.configuration.GlobalConfiguration            - Loading
>> configuration property: query.server.port, 6125
>> 2018-08-29 11:41:49,239 INFO
>> org.apache.flink.runtime.entrypoint.ClusterEntrypoint         - Starting
>> StandaloneSessionClusterEntrypoint.
>> 2018-08-29 11:41:49,239 INFO
>> org.apache.flink.runtime.entrypoint.ClusterEntrypoint         - Install
>> default filesystem.
>> 2018-08-29 11:41:49,250 INFO
>> org.apache.flink.runtime.entrypoint.ClusterEntrypoint         - Install
>> security context.
>> 2018-08-29 11:41:49,282 INFO
>> org.apache.flink.runtime.security.modules.HadoopModule        - Hadoop user
>> set to flink (auth:SIMPLE)
>> 2018-08-29 11:41:49,298 INFO
>> org.apache.flink.runtime.entrypoint.ClusterEntrypoint         -
>> Initializing cluster services.
>> 2018-08-29 11:41:49,309 INFO
>> org.apache.flink.runtime.entrypoint.ClusterEntrypoint         - Trying to
>> start actor system at flink-jobmanager-1:50010
>> 2018-08-29 11:41:49,768 INFO  akka.event.slf4j.Slf4jLogger
>>                   - Slf4jLogger started
>> 2018-08-29 11:41:49,823 INFO  akka.remote.Remoting
>>                   - Starting remoting
>> 2018-08-29 11:41:49,974 INFO  akka.remote.Remoting
>>                   - Remoting started; listening on addresses
>> :[akka.tcp://flink@flink-jobmanager-1:50010]
>> 2018-08-29 11:41:49,981 INFO
>> org.apache.flink.runtime.entrypoint.ClusterEntrypoint         - Actor
>> system started at akka.tcp://flink@flink-jobmanager-1:50010
>> 2018-08-29 11:41:50,444 INFO
>> org.apache.flink.runtime.blob.FileSystemBlobStore             - Creating
>> highly available BLOB storage directory at
>> hdfs://hdfs-namenode-0.hdfs-namenode.default.svc.cluster.local:8020/flink/high-availability//default/blob
>> 2018-08-29 11:41:50,509 INFO
>> org.apache.flink.runtime.util.ZooKeeperUtils                  - Enforcing
>> default ACL for ZK connections
>> 2018-08-29 11:41:50,509 INFO
>> org.apache.flink.runtime.util.ZooKeeperUtils                  - Using
>> '/flink/default' as Zookeeper namespace.
>> 2018-08-29 11:41:50,568 INFO
>> org.apache.flink.shaded.curator.org.apache.curator.framework.imps.CuratorFrameworkImpl
>> - Starting
>> 2018-08-29 11:41:50,577 INFO
>> org.apache.flink.shaded.zookeeper.org.apache.zookeeper.ZooKeeper  - Client
>> environment:zookeeper.version=3.4.10-39d3a4f269333c922ed3db283be479f9deacaa0f,
>> built on 03/23/2017 10:13 GMT
>> 2018-08-29 11:41:50,577 INFO
>> org.apache.flink.shaded.zookeeper.org.apache.zookeeper.ZooKeeper  - Client
>> environment:host.name=flink-jobmanager-1-f76fd4df8-ftwt9
>> 2018-08-29 11:41:50,577 INFO
>> org.apache.flink.shaded.zookeeper.org.apache.zookeeper.ZooKeeper  - Client
>> environment:java.version=1.8.0_181
>> 2018-08-29 11:41:50,577 INFO
>> org.apache.flink.shaded.zookeeper.org.apache.zookeeper.ZooKeeper  - Client
>> environment:java.vendor=Oracle Corporation
>> 2018-08-29 11:41:50,577 INFO
>> org.apache.flink.shaded.zookeeper.org.apache.zookeeper.ZooKeeper  - Client
>> environment:java.home=/usr/lib/jvm/java-8-openjdk-amd64/jre
>> 2018-08-29 11:41:50,577 INFO
>> org.apache.flink.shaded.zookeeper.org.apache.zookeeper.ZooKeeper  - Client
>> environment:java.class.path=/opt/flink/lib/flink-python_2.11-1.5.3.jar:/opt/flink/lib/flink-shaded-hadoop2-uber-1.5.3.jar:/opt/flink/lib/log4j-1.2.17.jar:/opt/flink/lib/slf4j-log4j12-1.7.7.jar:/opt/flink/lib/flink-dist_2.11-1.5.3.jar:::
>> 2018-08-29 11:41:50,577 INFO
>> org.apache.flink.shaded.zookeeper.org.apache.zookeeper.ZooKeeper  - Client
>> environment:java.library.path=/usr/java/packages/lib/amd64:/usr/lib/x86_64-linux-gnu/jni:/lib/x86_64-linux-gnu:/usr/lib/x86_64-linux-gnu:/usr/lib/jni:/lib:/usr/lib
>> 2018-08-29 11:41:50,577 INFO
>> org.apache.flink.shaded.zookeeper.org.apache.zookeeper.ZooKeeper  - Client
>> environment:java.io.tmpdir=/tmp
>> 2018-08-29 11:41:50,577 INFO
>> org.apache.flink.shaded.zookeeper.org.apache.zookeeper.ZooKeeper  - Client
>> environment:java.compiler=<NA>
>> 2018-08-29 11:41:50,577 INFO
>> org.apache.flink.shaded.zookeeper.org.apache.zookeeper.ZooKeeper  - Client
>> environment:os.name=Linux
>> 2018-08-29 11:41:50,577 INFO
>> org.apache.flink.shaded.zookeeper.org.apache.zookeeper.ZooKeeper  - Client
>> environment:os.arch=amd64
>> 2018-08-29 11:41:50,577 INFO
>> org.apache.flink.shaded.zookeeper.org.apache.zookeeper.ZooKeeper  - Client
>> environment:os.version=4.4.0-1027-gke
>> 2018-08-29 11:41:50,578 INFO
>> org.apache.flink.shaded.zookeeper.org.apache.zookeeper.ZooKeeper  - Client
>> environment:user.name=flink
>> 2018-08-29 11:41:50,578 INFO
>> org.apache.flink.shaded.zookeeper.org.apache.zookeeper.ZooKeeper  - Client
>> environment:user.home=/opt/flink
>> 2018-08-29 11:41:50,578 INFO
>> org.apache.flink.shaded.zookeeper.org.apache.zookeeper.ZooKeeper  - Client
>> environment:user.dir=/opt/flink
>> 2018-08-29 11:41:50,578 INFO
>> org.apache.flink.shaded.zookeeper.org.apache.zookeeper.ZooKeeper  -
>> Initiating client connection, connectString=zk-cs:2181 sessionTimeout=60000
>> watcher=org.apache.flink.shaded.curator.org.apache.curator.ConnectionState@17ae7628
>> 2018-08-29 11:41:50,605 INFO  org.apache.flink.runtime.blob.BlobServer
>>                   - Created BLOB server storage directory
>> /opt/flink/blob-server/blobStore-d408cea8-2ed0-461a-a30a-a62b70fd332a
>> 2018-08-29 11:41:50,605 WARN
>> org.apache.flink.shaded.zookeeper.org.apache.zookeeper.ClientCnxn  - SASL
>> configuration failed: javax.security.auth.login.LoginException: No JAAS
>> configuration section named 'Client' was found in specified JAAS
>> configuration file: '/tmp/jaas-5372401662150571998.conf'. Will continue
>> connection to Zookeeper server without SASL authentication, if Zookeeper
>> server allows it.
>> 2018-08-29 11:41:50,607 INFO  org.apache.flink.runtime.blob.BlobServer
>>                   - Started BLOB server at 0.0.0.0:6124 - max concurrent
>> requests: 50 - max backlog: 1000
>> 2018-08-29 11:41:50,607 INFO
>> org.apache.flink.shaded.zookeeper.org.apache.zookeeper.ClientCnxn  -
>> Opening socket connection to server zk-cs.default.svc.cluster.local/
>> 10.27.248.104:2181
>> 2018-08-29 11:41:50,608 ERROR
>> org.apache.flink.shaded.curator.org.apache.curator.ConnectionState  -
>> Authentication failed
>> 2018-08-29 11:41:50,609 INFO
>> org.apache.flink.shaded.zookeeper.org.apache.zookeeper.ClientCnxn  - Socket
>> connection established to zk-cs.default.svc.cluster.local/
>> 10.27.248.104:2181, initiating session
>> 2018-08-29 11:41:50,618 INFO
>> org.apache.flink.shaded.zookeeper.org.apache.zookeeper.ClientCnxn  -
>> Session establishment complete on server zk-cs.default.svc.cluster.local/
>> 10.27.248.104:2181, sessionid = 0x26584fd55690005, negotiated timeout =
>> 40000
>> 2018-08-29 11:41:50,619 INFO
>> org.apache.flink.shaded.curator.org.apache.curator.framework.state.ConnectionStateManager
>> - State change: CONNECTED
>> 2018-08-29 11:41:50,627 INFO
>> org.apache.flink.runtime.metrics.MetricRegistryImpl           - No metrics
>> reporter configured, no metrics will be exposed/reported.
>> 2018-08-29 11:41:50,633 INFO
>> org.apache.flink.runtime.dispatcher.FileArchivedExecutionGraphStore  -
>> Initializing FileArchivedExecutionGraphStore: Storage directory
>> /tmp/executionGraphStore-c5df0b39-86f3-4fba-bdda-aacca4f86086, expiration
>> time 3600000, maximum cache size 52428800 bytes.
>> 2018-08-29 11:41:50,659 INFO
>> org.apache.flink.runtime.blob.TransientBlobCache              - Created
>> BLOB cache storage directory
>> /opt/flink/blob-server/blobStore-c12d55af-3c2d-4fc2-8ee8-6de642522184
>> 2018-08-29 11:41:50,674 WARN
>> org.apache.flink.configuration.Configuration                  - Config uses
>> deprecated configuration key 'jobmanager.rpc.address' instead of proper key
>> 'rest.address'
>> 2018-08-29 11:41:50,675 WARN
>> org.apache.flink.runtime.dispatcher.DispatcherRestEndpoint    - Upload
>> directory /opt/flink/upload/flink-web-upload does not exist, or has been
>> deleted externally. Previously uploaded files are no longer available.
>> 2018-08-29 11:41:50,676 INFO
>> org.apache.flink.runtime.dispatcher.DispatcherRestEndpoint    - Created
>> directory /opt/flink/upload/flink-web-upload for file uploads.
>> 2018-08-29 11:41:50,679 INFO
>> org.apache.flink.runtime.dispatcher.DispatcherRestEndpoint    - Starting
>> rest endpoint.
>> 2018-08-29 11:41:50,995 WARN
>> org.apache.flink.runtime.webmonitor.WebMonitorUtils           - Log file
>> environment variable 'log.file' is not set.
>> 2018-08-29 11:41:50,995 WARN
>> org.apache.flink.runtime.webmonitor.WebMonitorUtils           - JobManager
>> log files are unavailable in the web dashboard. Log file location not found
>> in environment variable 'log.file' or configuration key 'Key:
>> 'web.log.path' , default: null (deprecated keys:
>> [jobmanager.web.log.path])'.
>> 2018-08-29 11:41:51,071 INFO
>> org.apache.flink.runtime.dispatcher.DispatcherRestEndpoint    - Rest
>> endpoint listening at flink-jobmanager-1:8081
>> 2018-08-29 11:41:51,071 INFO
>> org.apache.flink.runtime.leaderelection.ZooKeeperLeaderElectionService  -
>> Starting ZooKeeperLeaderElectionService
>> ZooKeeperLeaderElectionService{leaderPath='/leader/rest_server_lock'}.
>> 2018-08-29 11:41:51,091 WARN
>> org.apache.flink.shaded.curator.org.apache.curator.utils.ZKPaths  - The
>> version of ZooKeeper being used doesn't support Container nodes.
>> CreateMode.PERSISTENT will be used instead.
>> 2018-08-29 11:41:51,101 INFO
>> org.apache.flink.runtime.dispatcher.DispatcherRestEndpoint    - Web
>> frontend listening at http://flink-jobmanager-1:8081.
>> 2018-08-29 11:41:51,114 INFO
>> org.apache.flink.runtime.rpc.akka.AkkaRpcService              - Starting
>> RPC endpoint for
>> org.apache.flink.runtime.resourcemanager.StandaloneResourceManager at
>> akka://flink/user/resourcemanager .
>> 2018-08-29 11:41:51,141 INFO
>> org.apache.flink.runtime.dispatcher.DispatcherRestEndpoint    -
>> http://flink-jobmanager-1:8081 was granted leadership with
>> leaderSessionID=bb0d4dfd-c2c4-480b-bc86-62e231a606dd
>> 2018-08-29 11:41:51,214 INFO
>> org.apache.flink.runtime.rpc.akka.AkkaRpcService              - Starting
>> RPC endpoint for org.apache.flink.runtime.dispatcher.StandaloneDispatcher
>> at akka://flink/user/dispatcher .
>> 2018-08-29 11:41:51,230 INFO
>> org.apache.flink.runtime.leaderelection.ZooKeeperLeaderElectionService  -
>> Starting ZooKeeperLeaderElectionService
>> ZooKeeperLeaderElectionService{leaderPath='/leader/resource_manager_lock'}.
>> 2018-08-29 11:41:51,232 INFO
>> org.apache.flink.runtime.leaderretrieval.ZooKeeperLeaderRetrievalService  -
>> Starting ZooKeeperLeaderRetrievalService /leader/resource_manager_lock.
>> 2018-08-29 11:41:51,234 INFO
>> org.apache.flink.runtime.leaderelection.ZooKeeperLeaderElectionService  -
>> Starting ZooKeeperLeaderElectionService
>> ZooKeeperLeaderElectionService{leaderPath='/leader/dispatcher_lock'}.
>> 2018-08-29 11:41:51,235 INFO
>> org.apache.flink.runtime.leaderretrieval.ZooKeeperLeaderRetrievalService  -
>> Starting ZooKeeperLeaderRetrievalService /leader/dispatcher_lock.
>> 2018-08-29 11:41:51,253 INFO
>> org.apache.flink.runtime.resourcemanager.StandaloneResourceManager  -
>> ResourceManager akka.tcp://flink@flink-jobmanager-1:50010/user/resourcemanager
>> was granted leadership with fencing token ba47ed8daa8ff16bea6fc355c13f4d49
>> 2018-08-29 11:41:51,254 INFO
>> org.apache.flink.runtime.resourcemanager.slotmanager.SlotManager  -
>> Starting the SlotManager.
>> 2018-08-29 11:41:51,263 INFO
>> org.apache.flink.runtime.dispatcher.StandaloneDispatcher      - Dispatcher
>> akka.tcp://flink@flink-jobmanager-1:50010/user/dispatcher was granted
>> leadership with fencing token 703301bf-85e7-4464-990f-ad39128a7b4d
>> 2018-08-29 11:41:51,263 INFO
>> org.apache.flink.runtime.dispatcher.StandaloneDispatcher      - Recovering
>> all persisted jobs.
>> 2018-08-29 11:41:51,468 INFO
>> org.apache.flink.runtime.resourcemanager.slotmanager.SlotManager  -
>> Registering TaskManager c8a3201d58d87dbbe16f8eb352b5c5b6 under
>> 1c5bf0bc3848bd384b6f032ff7213754 at the SlotManager.
>> 2018-08-29 11:41:51,471 INFO
>> org.apache.flink.runtime.resourcemanager.slotmanager.SlotManager  -
>> Registering TaskManager 104d18b72fed054620e58e120a1ea083 under
>> e9d3e8ad3b477dd2e58bcb88a2c0d061 at the SlotManager.
>>
>> *Starting Jobmanager-2:*
>>
>> Starting Job Manager
>> sed: cannot rename /opt/flink/conf/sedH2ZiSu: Device or resource busy
>> config file:
>> jobmanager.rpc.address: flink-jobmanager-2
>> jobmanager.rpc.port: 6123
>> jobmanager.heap.size: 8192
>> taskmanager.heap.size: 8192
>> taskmanager.numberOfTaskSlots: 4
>> high-availability: zookeeper
>> high-availability.storageDir:
>> hdfs://hdfs-namenode-0.hdfs-namenode.default.svc.cluster.local:8020/flink/high-availability
>> high-availability.zookeeper.quorum: zk-cs:2181
>> high-availability.zookeeper.path.root: /flink
>> high-availability.jobmanager.port: 50010
>> state.backend: filesystem
>> state.checkpoints.dir:
>> hdfs://hdfs-namenode-0.hdfs-namenode.default.svc.cluster.local:8020/flink/checkpoints
>> state.savepoints.dir:
>> hdfs://hdfs-namenode-0.hdfs-namenode.default.svc.cluster.local:8020/flink/savepoints
>> state.backend.incremental: false
>> fs.default-scheme:
>> hdfs://hdfs-namenode-0.hdfs-namenode.default.svc.cluster.local:8020
>> rest.port: 8081
>> web.upload.dir: /opt/flink/upload
>> query.server.port: 6125
>> taskmanager.numberOfTaskSlots: 4
>> classloader.parent-first-patterns.additional: org.apache.xerces.
>> blob.storage.directory: /opt/flink/blob-server
>> blob.server.port: 6124
>> blob.server.port: 6124
>> query.server.port: 6125
>> Starting standalonesession as a console application on host
>> flink-jobmanager-2-7844b78c9-kmvw9.
>> 2018-08-29 11:41:51,688 INFO
>> org.apache.flink.runtime.entrypoint.ClusterEntrypoint         -
>> --------------------------------------------------------------------------------
>> 2018-08-29 11:41:51,690 INFO
>> org.apache.flink.runtime.entrypoint.ClusterEntrypoint         -  Starting
>> StandaloneSessionClusterEntrypoint (Version: 1.5.3, Rev:614f216,
>> Date:16.08.2018 @ 06:39:50 GMT)
>> 2018-08-29 11:41:51,690 INFO
>> org.apache.flink.runtime.entrypoint.ClusterEntrypoint         -  OS current
>> user: flink
>> 2018-08-29 11:41:52,018 WARN  org.apache.hadoop.util.NativeCodeLoader
>>                    - Unable to load native-hadoop library for your
>> platform... using builtin-java classes where applicable
>> 2018-08-29 11:41:52,088 INFO
>> org.apache.flink.runtime.entrypoint.ClusterEntrypoint         -  Current
>> Hadoop/Kerberos user: flink
>> 2018-08-29 11:41:52,088 INFO
>> org.apache.flink.runtime.entrypoint.ClusterEntrypoint         -  JVM:
>> OpenJDK 64-Bit Server VM - Oracle Corporation - 1.8/25.181-b13
>> 2018-08-29 11:41:52,088 INFO
>> org.apache.flink.runtime.entrypoint.ClusterEntrypoint         -  Maximum
>> heap size: 6702 MiBytes
>> 2018-08-29 11:41:52,088 INFO
>> org.apache.flink.runtime.entrypoint.ClusterEntrypoint         -  JAVA_HOME:
>> /docker-java-home/jre
>> 2018-08-29 11:41:52,091 INFO
>> org.apache.flink.runtime.entrypoint.ClusterEntrypoint         -  Hadoop
>> version: 2.7.5
>> 2018-08-29 11:41:52,091 INFO
>> org.apache.flink.runtime.entrypoint.ClusterEntrypoint         -  JVM
>> Options:
>> 2018-08-29 11:41:52,091 INFO
>> org.apache.flink.runtime.entrypoint.ClusterEntrypoint         -
>>  -Dlog4j.configuration=file:/opt/flink/conf/log4j-console.properties
>> 2018-08-29 11:41:52,091 INFO
>> org.apache.flink.runtime.entrypoint.ClusterEntrypoint         -
>>  -Dlogback.configurationFile=file:/opt/flink/conf/logback-console.xml
>> 2018-08-29 11:41:52,091 INFO
>> org.apache.flink.runtime.entrypoint.ClusterEntrypoint         -  Program
>> Arguments:
>> 2018-08-29 11:41:52,091 INFO
>> org.apache.flink.runtime.entrypoint.ClusterEntrypoint         -
>>  --configDir
>> 2018-08-29 11:41:52,091 INFO
>> org.apache.flink.runtime.entrypoint.ClusterEntrypoint         -
>>  /opt/flink/conf
>> 2018-08-29 11:41:52,091 INFO
>> org.apache.flink.runtime.entrypoint.ClusterEntrypoint         -
>>  --executionMode
>> 2018-08-29 11:41:52,091 INFO
>> org.apache.flink.runtime.entrypoint.ClusterEntrypoint         -     cluster
>> 2018-08-29 11:41:52,091 INFO
>> org.apache.flink.runtime.entrypoint.ClusterEntrypoint         -     --host
>> 2018-08-29 11:41:52,091 INFO
>> org.apache.flink.runtime.entrypoint.ClusterEntrypoint         -     cluster
>> 2018-08-29 11:41:52,091 INFO
>> org.apache.flink.runtime.entrypoint.ClusterEntrypoint         -  Classpath:
>> /opt/flink/lib/flink-python_2.11-1.5.3.jar:/opt/flink/lib/flink-shaded-hadoop2-uber-1.5.3.jar:/opt/flink/lib/log4j-1.2.17.jar:/opt/flink/lib/slf4j-log4j12-1.7.7.jar:/opt/flink/lib/flink-dist_2.11-1.5.3.jar:::
>> 2018-08-29 11:41:52,091 INFO
>> org.apache.flink.runtime.entrypoint.ClusterEntrypoint         -
>> --------------------------------------------------------------------------------
>> 2018-08-29 11:41:52,092 INFO
>> org.apache.flink.runtime.entrypoint.ClusterEntrypoint         - Registered
>> UNIX signal handlers for [TERM, HUP, INT]
>> 2018-08-29 11:41:52,103 INFO
>> org.apache.flink.configuration.GlobalConfiguration            - Loading
>> configuration property: jobmanager.rpc.address, flink-jobmanager-2
>> 2018-08-29 11:41:52,103 INFO
>> org.apache.flink.configuration.GlobalConfiguration            - Loading
>> configuration property: jobmanager.rpc.port, 6123
>> 2018-08-29 11:41:52,103 INFO
>> org.apache.flink.configuration.GlobalConfiguration            - Loading
>> configuration property: jobmanager.heap.size, 8192
>> 2018-08-29 11:41:52,104 INFO
>> org.apache.flink.configuration.GlobalConfiguration            - Loading
>> configuration property: taskmanager.heap.size, 8192
>> 2018-08-29 11:41:52,104 INFO
>> org.apache.flink.configuration.GlobalConfiguration            - Loading
>> configuration property: taskmanager.numberOfTaskSlots, 4
>> 2018-08-29 11:41:52,104 INFO
>> org.apache.flink.configuration.GlobalConfiguration            - Loading
>> configuration property: high-availability, zookeeper
>> 2018-08-29 11:41:52,104 INFO
>> org.apache.flink.configuration.GlobalConfiguration            - Loading
>> configuration property: high-availability.storageDir,
>> hdfs://hdfs-namenode-0.hdfs-namenode.default.svc.cluster.local:8020/flink/high-availability
>> 2018-08-29 11:41:52,104 INFO
>> org.apache.flink.configuration.GlobalConfiguration            - Loading
>> configuration property: high-availability.zookeeper.quorum, zk-cs:2181
>> 2018-08-29 11:41:52,104 INFO
>> org.apache.flink.configuration.GlobalConfiguration            - Loading
>> configuration property: high-availability.zookeeper.path.root, /flink
>> 2018-08-29 11:41:52,105 INFO
>> org.apache.flink.configuration.GlobalConfiguration            - Loading
>> configuration property: high-availability.jobmanager.port, 50010
>> 2018-08-29 11:41:52,105 INFO
>> org.apache.flink.configuration.GlobalConfiguration            - Loading
>> configuration property: state.backend, filesystem
>> 2018-08-29 11:41:52,105 INFO
>> org.apache.flink.configuration.GlobalConfiguration            - Loading
>> configuration property: state.checkpoints.dir,
>> hdfs://hdfs-namenode-0.hdfs-namenode.default.svc.cluster.local:8020/flink/checkpoints
>> 2018-08-29 11:41:52,105 INFO
>> org.apache.flink.configuration.GlobalConfiguration            - Loading
>> configuration property: state.savepoints.dir,
>> hdfs://hdfs-namenode-0.hdfs-namenode.default.svc.cluster.local:8020/flink/savepoints
>> 2018-08-29 11:41:52,105 INFO
>> org.apache.flink.configuration.GlobalConfiguration            - Loading
>> configuration property: state.backend.incremental, false
>> 2018-08-29 11:41:52,106 INFO
>> org.apache.flink.configuration.GlobalConfiguration            - Loading
>> configuration property: fs.default-scheme,
>> hdfs://hdfs-namenode-0.hdfs-namenode.default.svc.cluster.local:8020
>> 2018-08-29 11:41:52,106 INFO
>> org.apache.flink.configuration.GlobalConfiguration            - Loading
>> configuration property: rest.port, 8081
>> 2018-08-29 11:41:52,106 INFO
>> org.apache.flink.configuration.GlobalConfiguration            - Loading
>> configuration property: web.upload.dir, /opt/flink/upload
>> 2018-08-29 11:41:52,106 INFO
>> org.apache.flink.configuration.GlobalConfiguration            - Loading
>> configuration property: query.server.port, 6125
>> 2018-08-29 11:41:52,106 INFO
>> org.apache.flink.configuration.GlobalConfiguration            - Loading
>> configuration property: taskmanager.numberOfTaskSlots, 4
>> 2018-08-29 11:41:52,107 INFO
>> org.apache.flink.configuration.GlobalConfiguration            - Loading
>> configuration property: classloader.parent-first-patterns.additional,
>> org.apache.xerces.
>> 2018-08-29 11:41:52,107 INFO
>> org.apache.flink.configuration.GlobalConfiguration            - Loading
>> configuration property: blob.storage.directory, /opt/flink/blob-server
>> 2018-08-29 11:41:52,107 INFO
>> org.apache.flink.configuration.GlobalConfiguration            - Loading
>> configuration property: blob.server.port, 6124
>> 2018-08-29 11:41:52,107 INFO
>> org.apache.flink.configuration.GlobalConfiguration            - Loading
>> configuration property: blob.server.port, 6124
>> 2018-08-29 11:41:52,107 INFO
>> org.apache.flink.configuration.GlobalConfiguration            - Loading
>> configuration property: query.server.port, 6125
>> 2018-08-29 11:41:52,122 INFO
>> org.apache.flink.runtime.entrypoint.ClusterEntrypoint         - Starting
>> StandaloneSessionClusterEntrypoint.
>> 2018-08-29 11:41:52,123 INFO
>> org.apache.flink.runtime.entrypoint.ClusterEntrypoint         - Install
>> default filesystem.
>> 2018-08-29 11:41:52,133 INFO
>> org.apache.flink.runtime.entrypoint.ClusterEntrypoint         - Install
>> security context.
>> 2018-08-29 11:41:52,173 INFO
>> org.apache.flink.runtime.security.modules.HadoopModule        - Hadoop user
>> set to flink (auth:SIMPLE)
>> 2018-08-29 11:41:52,188 INFO
>> org.apache.flink.runtime.entrypoint.ClusterEntrypoint         -
>> Initializing cluster services.
>> 2018-08-29 11:41:52,198 INFO
>> org.apache.flink.runtime.entrypoint.ClusterEntrypoint         - Trying to
>> start actor system at flink-jobmanager-2:50010
>> 2018-08-29 11:41:52,753 INFO  akka.event.slf4j.Slf4jLogger
>>                   - Slf4jLogger started
>> 2018-08-29 11:41:52,822 INFO  akka.remote.Remoting
>>                   - Starting remoting
>> 2018-08-29 11:41:53,038 INFO  akka.remote.Remoting
>>                   - Remoting started; listening on addresses
>> :[akka.tcp://flink@flink-jobmanager-2:50010]
>> 2018-08-29 11:41:53,046 INFO
>> org.apache.flink.runtime.entrypoint.ClusterEntrypoint         - Actor
>> system started at akka.tcp://flink@flink-jobmanager-2:50010
>> 2018-08-29 11:41:53,500 INFO
>> org.apache.flink.runtime.blob.FileSystemBlobStore             - Creating
>> highly available BLOB storage directory at
>> hdfs://hdfs-namenode-0.hdfs-namenode.default.svc.cluster.local:8020/flink/high-availability//default/blob
>> 2018-08-29 11:41:53,558 INFO
>> org.apache.flink.runtime.util.ZooKeeperUtils                  - Enforcing
>> default ACL for ZK connections
>> 2018-08-29 11:41:53,559 INFO
>> org.apache.flink.runtime.util.ZooKeeperUtils                  - Using
>> '/flink/default' as Zookeeper namespace.
>> 2018-08-29 11:41:53,616 INFO
>> org.apache.flink.shaded.curator.org.apache.curator.framework.imps.CuratorFrameworkImpl
>> - Starting
>> 2018-08-29 11:41:53,624 INFO
>> org.apache.flink.shaded.zookeeper.org.apache.zookeeper.ZooKeeper  - Client
>> environment:zookeeper.version=3.4.10-39d3a4f269333c922ed3db283be479f9deacaa0f,
>> built on 03/23/2017 10:13 GMT
>> 2018-08-29 11:41:53,625 INFO
>> org.apache.flink.shaded.zookeeper.org.apache.zookeeper.ZooKeeper  - Client
>> environment:host.name=flink-jobmanager-2-7844b78c9-kmvw9
>> 2018-08-29 11:41:53,625 INFO
>> org.apache.flink.shaded.zookeeper.org.apache.zookeeper.ZooKeeper  - Client
>> environment:java.version=1.8.0_181
>> 2018-08-29 11:41:53,625 INFO
>> org.apache.flink.shaded.zookeeper.org.apache.zookeeper.ZooKeeper  - Client
>> environment:java.vendor=Oracle Corporation
>> 2018-08-29 11:41:53,625 INFO
>> org.apache.flink.shaded.zookeeper.org.apache.zookeeper.ZooKeeper  - Client
>> environment:java.home=/usr/lib/jvm/java-8-openjdk-amd64/jre
>> 2018-08-29 11:41:53,625 INFO
>> org.apache.flink.shaded.zookeeper.org.apache.zookeeper.ZooKeeper  - Client
>> environment:java.class.path=/opt/flink/lib/flink-python_2.11-1.5.3.jar:/opt/flink/lib/flink-shaded-hadoop2-uber-1.5.3.jar:/opt/flink/lib/log4j-1.2.17.jar:/opt/flink/lib/slf4j-log4j12-1.7.7.jar:/opt/flink/lib/flink-dist_2.11-1.5.3.jar:::
>> 2018-08-29 11:41:53,625 INFO
>> org.apache.flink.shaded.zookeeper.org.apache.zookeeper.ZooKeeper  - Client
>> environment:java.library.path=/usr/java/packages/lib/amd64:/usr/lib/x86_64-linux-gnu/jni:/lib/x86_64-linux-gnu:/usr/lib/x86_64-linux-gnu:/usr/lib/jni:/lib:/usr/lib
>> 2018-08-29 11:41:53,625 INFO
>> org.apache.flink.shaded.zookeeper.org.apache.zookeeper.ZooKeeper  - Client
>> environment:java.io.tmpdir=/tmp
>> 2018-08-29 11:41:53,625 INFO
>> org.apache.flink.shaded.zookeeper.org.apache.zookeeper.ZooKeeper  - Client
>> environment:java.compiler=<NA>
>> 2018-08-29 11:41:53,625 INFO
>> org.apache.flink.shaded.zookeeper.org.apache.zookeeper.ZooKeeper  - Client
>> environment:os.name=Linux
>> 2018-08-29 11:41:53,625 INFO
>> org.apache.flink.shaded.zookeeper.org.apache.zookeeper.ZooKeeper  - Client
>> environment:os.arch=amd64
>> 2018-08-29 11:41:53,625 INFO
>> org.apache.flink.shaded.zookeeper.org.apache.zookeeper.ZooKeeper  - Client
>> environment:os.version=4.4.0-1027-gke
>> 2018-08-29 11:41:53,625 INFO
>> org.apache.flink.shaded.zookeeper.org.apache.zookeeper.ZooKeeper  - Client
>> environment:user.name=flink
>> 2018-08-29 11:41:53,625 INFO
>> org.apache.flink.shaded.zookeeper.org.apache.zookeeper.ZooKeeper  - Client
>> environment:user.home=/opt/flink
>> 2018-08-29 11:41:53,626 INFO
>> org.apache.flink.shaded.zookeeper.org.apache.zookeeper.ZooKeeper  - Client
>> environment:user.dir=/opt/flink
>> 2018-08-29 11:41:53,626 INFO
>> org.apache.flink.shaded.zookeeper.org.apache.zookeeper.ZooKeeper  -
>> Initiating client connection, connectString=zk-cs:2181 sessionTimeout=60000
>> watcher=org.apache.flink.shaded.curator.org.apache.curator.ConnectionState@17ae7628
>> 2018-08-29 11:41:53,644 WARN
>> org.apache.flink.shaded.zookeeper.org.apache.zookeeper.ClientCnxn  - SASL
>> configuration failed: javax.security.auth.login.LoginException: No JAAS
>> configuration section named 'Client' was found in specified JAAS
>> configuration file: '/tmp/jaas-8238466329925822361.conf'. Will continue
>> connection to Zookeeper server without SASL authentication, if Zookeeper
>> server allows it.
>> 2018-08-29 11:41:53,646 INFO
>> org.apache.flink.shaded.zookeeper.org.apache.zookeeper.ClientCnxn  -
>> Opening socket connection to server zk-cs.default.svc.cluster.local/
>> 10.27.248.104:2181
>> 2018-08-29 11:41:53,646 INFO  org.apache.flink.runtime.blob.BlobServer
>>                   - Created BLOB server storage directory
>> /opt/flink/blob-server/blobStore-61cdb645-5d0c-47fd-bcf6-84ad16fadade
>> 2018-08-29 11:41:53,646 ERROR
>> org.apache.flink.shaded.curator.org.apache.curator.ConnectionState  -
>> Authentication failed
>> 2018-08-29 11:41:53,647 INFO
>> org.apache.flink.shaded.zookeeper.org.apache.zookeeper.ClientCnxn  - Socket
>> connection established to zk-cs.default.svc.cluster.local/
>> 10.27.248.104:2181, initiating session
>> 2018-08-29 11:41:53,649 INFO  org.apache.flink.runtime.blob.BlobServer
>>                   - Started BLOB server at 0.0.0.0:6124 - max concurrent
>> requests: 50 - max backlog: 1000
>> 2018-08-29 11:41:53,655 INFO
>> org.apache.flink.shaded.zookeeper.org.apache.zookeeper.ClientCnxn  -
>> Session establishment complete on server zk-cs.default.svc.cluster.local/
>> 10.27.248.104:2181, sessionid = 0x26584fd55690006, negotiated timeout =
>> 40000
>> 2018-08-29 11:41:53,656 INFO
>> org.apache.flink.shaded.curator.org.apache.curator.framework.state.ConnectionStateManager
>> - State change: CONNECTED
>> 2018-08-29 11:41:53,667 INFO
>> org.apache.flink.runtime.metrics.MetricRegistryImpl           - No metrics
>> reporter configured, no metrics will be exposed/reported.
>> 2018-08-29 11:41:53,673 INFO
>> org.apache.flink.runtime.dispatcher.FileArchivedExecutionGraphStore  -
>> Initializing FileArchivedExecutionGraphStore: Storage directory
>> /tmp/executionGraphStore-8b236c14-79ee-4a84-b23f-437408c4661a, expiration
>> time 3600000, maximum cache size 52428800 bytes.
>> 2018-08-29 11:41:53,699 INFO
>> org.apache.flink.runtime.blob.TransientBlobCache              - Created
>> BLOB cache storage directory
>> /opt/flink/blob-server/blobStore-80c519df-cc6f-4e9c-9cd5-da4077c826f0
>> 2018-08-29 11:41:53,717 WARN
>> org.apache.flink.configuration.Configuration                  - Config uses
>> deprecated configuration key 'jobmanager.rpc.address' instead of proper key
>> 'rest.address'
>> 2018-08-29 11:41:53,718 WARN
>> org.apache.flink.runtime.dispatcher.DispatcherRestEndpoint    - Upload
>> directory /opt/flink/upload/flink-web-upload does not exist, or has been
>> deleted externally. Previously uploaded files are no longer available.
>> 2018-08-29 11:41:53,719 INFO
>> org.apache.flink.runtime.dispatcher.DispatcherRestEndpoint    - Created
>> directory /opt/flink/upload/flink-web-upload for file uploads.
>> 2018-08-29 11:41:53,722 INFO
>> org.apache.flink.runtime.dispatcher.DispatcherRestEndpoint    - Starting
>> rest endpoint.
>> 2018-08-29 11:41:54,084 WARN
>> org.apache.flink.runtime.webmonitor.WebMonitorUtils           - Log file
>> environment variable 'log.file' is not set.
>> 2018-08-29 11:41:54,084 WARN
>> org.apache.flink.runtime.webmonitor.WebMonitorUtils           - JobManager
>> log files are unavailable in the web dashboard. Log file location not found
>> in environment variable 'log.file' or configuration key 'Key:
>> 'web.log.path' , default: null (deprecated keys:
>> [jobmanager.web.log.path])'.
>> 2018-08-29 11:41:54,160 INFO
>> org.apache.flink.runtime.dispatcher.DispatcherRestEndpoint    - Rest
>> endpoint listening at flink-jobmanager-2:8081
>> 2018-08-29 11:41:54,160 INFO
>> org.apache.flink.runtime.leaderelection.ZooKeeperLeaderElectionService  -
>> Starting ZooKeeperLeaderElectionService
>> ZooKeeperLeaderElectionService{leaderPath='/leader/rest_server_lock'}.
>> 2018-08-29 11:41:54,180 INFO
>> org.apache.flink.runtime.dispatcher.DispatcherRestEndpoint    - Web
>> frontend listening at http://flink-jobmanager-2:8081.
>> 2018-08-29 11:41:54,192 INFO
>> org.apache.flink.runtime.rpc.akka.AkkaRpcService              - Starting
>> RPC endpoint for
>> org.apache.flink.runtime.resourcemanager.StandaloneResourceManager at
>> akka://flink/user/resourcemanager .
>> 2018-08-29 11:41:54,273 INFO
>> org.apache.flink.runtime.rpc.akka.AkkaRpcService              - Starting
>> RPC endpoint for org.apache.flink.runtime.dispatcher.StandaloneDispatcher
>> at akka://flink/user/dispatcher .
>> 2018-08-29 11:41:54,286 INFO
>> org.apache.flink.runtime.leaderelection.ZooKeeperLeaderElectionService  -
>> Starting ZooKeeperLeaderElectionService
>> ZooKeeperLeaderElectionService{leaderPath='/leader/resource_manager_lock'}.
>> 2018-08-29 11:41:54,287 INFO
>> org.apache.flink.runtime.leaderretrieval.ZooKeeperLeaderRetrievalService  -
>> Starting ZooKeeperLeaderRetrievalService /leader/resource_manager_lock.
>> 2018-08-29 11:41:54,289 INFO
>> org.apache.flink.runtime.leaderelection.ZooKeeperLeaderElectionService  -
>> Starting ZooKeeperLeaderElectionService
>> ZooKeeperLeaderElectionService{leaderPath='/leader/dispatcher_lock'}.
>> 2018-08-29 11:41:54,289 INFO
>> org.apache.flink.runtime.leaderretrieval.ZooKeeperLeaderRetrievalService  -
>> Starting ZooKeeperLeaderRetrievalService /leader/dispatcher_lock.
>>
>> *Upon submitting a batch job on Jobmanager-1, we immediately get this log
>> on Jobmanager-2*
>> 2018-08-29 11:47:06,249 INFO
>> org.apache.flink.runtime.jobmanager.ZooKeeperSubmittedJobGraphStore  -
>> Recovered SubmittedJobGraph(d69b67e4d28a2d244b06d3f6d661bca1, null).
>>
>> *Meanwhile Jobmanager-1 gets:*
>> *-FlinkBatchPipelineTranslator pipeline logs- (we use Apache Beam)*
>>
>> 2018-08-29 11:47:06,006 INFO
>> org.apache.flink.runtime.dispatcher.StandaloneDispatcher      - Submitting
>> job d69b67e4d28a2d244b06d3f6d661bca1
>> (sicassandrawriterbeam-flink-0829114703-7d95fabd).
>> 2018-08-29 11:47:06,090 INFO
>> org.apache.flink.runtime.jobmanager.ZooKeeperSubmittedJobGraphStore  -
>> Added SubmittedJobGraph(d69b67e4d28a2d244b06d3f6d661bca1, null) to
>> ZooKeeper.
>>
>> *-loads of job execution info-*
>>
>> 2018-08-29 11:49:20,272 INFO
>> org.apache.flink.runtime.dispatcher.StandaloneDispatcher      - Job
>> d69b67e4d28a2d244b06d3f6d661bca1 reached globally terminal state FINISHED.
>> 2018-08-29 11:49:20,286 INFO
>> org.apache.flink.runtime.jobmaster.JobMaster                  - Stopping
>> the JobMaster for job
>> sicassandrawriterbeam-flink-0829114703-7d95fabd(d69b67e4d28a2d244b06d3f6d661bca1).
>> 2018-08-29 11:49:20,290 INFO
>> org.apache.flink.runtime.leaderretrieval.ZooKeeperLeaderRetrievalService  -
>> Stopping ZooKeeperLeaderRetrievalService /leader/resource_manager_lock.
>> 2018-08-29 11:49:20,292 INFO
>> org.apache.flink.runtime.jobmaster.JobMaster                  - Close
>> ResourceManager connection 827b94881bf7c94d8516907e04e3a564: JobManager is
>> shutting down..
>> 2018-08-29 11:49:20,292 INFO
>> org.apache.flink.runtime.jobmaster.slotpool.SlotPool          - Suspending
>> SlotPool.
>> 2018-08-29 11:49:20,293 INFO
>> org.apache.flink.runtime.jobmaster.slotpool.SlotPool          - Stopping
>> SlotPool.
>> 2018-08-29 11:49:20,293 INFO
>> org.apache.flink.runtime.resourcemanager.StandaloneResourceManager  -
>> Disconnect job manager a3dab0a0883c5f0f37943358d9104d79
>> @akka.tcp://flink@flink-jobmanager-1:50010/user/jobmanager_0 for job
>> d69b67e4d28a2d244b06d3f6d661bca1 from the resource manager.
>> 2018-08-29 11:49:20,293 INFO
>> org.apache.flink.runtime.leaderelection.ZooKeeperLeaderElectionService  -
>> Stopping ZooKeeperLeaderElectionService
>> ZooKeeperLeaderElectionService{leaderPath='/leader/d69b67e4d28a2d244b06d3f6d661bca1/job_manager_lock'}.
>> 2018-08-29 11:49:20,304 INFO
>> org.apache.flink.runtime.jobmanager.ZooKeeperSubmittedJobGraphStore  -
>> Removed job graph d69b67e4d28a2d244b06d3f6d661bca1 from ZooKeeper.
>>
>>
>> -------------------
>>
>> The result is:
>> HDFS has only a jobgraph and an empty default folder - everything else is
>> cleared
>> ZooKeeper has the jobgraph that Jobmanager-1 claims to have removed in
>> the last log still there.
>>
>> On Wed, Aug 29, 2018 at 12:14 PM Till Rohrmann <trohrmann@apache.org>
>> wrote:
>>
>>> Hi Encho,
>>>
>>> it sounds strange that the standby JobManager tries to recover a
>>> submitted job graph. This should only happen if it has been granted
>>> leadership. Thus, it seems as if the standby JobManager thinks that it is
>>> also the leader. Could you maybe share the logs of the two
>>> JobManagers/ClusterEntrypoints with us?
>>>
>>> Running only a single JobManager/ClusterEntrypoint in HA mode via a
>>> Kubernetes Deployment should do the trick and there is nothing wrong with
>>> it.
>>>
>>> Cheers,
>>> Till
>>>
>>> On Wed, Aug 29, 2018 at 11:05 AM Encho Mishinev <
>>> encho.mishinev@gmail.com> wrote:
>>>
>>>> Hello,
>>>>
>>>> Since two job managers don't seem to be working for me I was thinking
>>>> of just using a single job manager in Kubernetes in HA mode with a
>>>> deployment ensuring its restart whenever it fails. Is this approach viable?
>>>> The High-Availability page mentions that you use only one job manager in an
>>>> YARN cluster but does not specify such option for Kubernetes. Is there
>>>> anything that can go wrong with this approach?
>>>>
>>>> Thanks
>>>>
>>>> On Wed, Aug 29, 2018 at 11:10 AM Encho Mishinev <
>>>> encho.mishinev@gmail.com> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> Unfortunately the thing I described does indeed happen every time. As
>>>>> mentioned in the first email, I am running on Kubernetes so certain things
>>>>> could be different compared to just a standalone cluster.
>>>>>
>>>>> Any ideas for workarounds are welcome, as this problem basically
>>>>> prevents me from using HA.
>>>>>
>>>>> Thanks,
>>>>> Encho
>>>>>
>>>>> On Wed, Aug 29, 2018 at 5:15 AM vino yang <yanghua1127@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Hi Encho,
>>>>>>
>>>>>> From your description, I feel that there are extra bugs.
>>>>>>
>>>>>> About your description:
>>>>>>
>>>>>> *- Start both job managers*
>>>>>> *- Start a batch job in JobManager 1 and let it finish*
>>>>>> *The jobgraphs in both Zookeeper and HDFS remained.*
>>>>>>
>>>>>> Is it necessarily happening every time?
>>>>>>
>>>>>> In the Standalone cluster, the problems we encountered were sporadic.
>>>>>>
>>>>>> Thanks, vino.
>>>>>>
>>>>>> Encho Mishinev <encho.mishinev@gmail.com> 于2018年8月28日周二 下午8:07写道:
>>>>>>
>>>>>>> Hello Till,
>>>>>>>
>>>>>>> I spend a few more hours testing and looking at the logs and it
>>>>>>> seems like there's a more general problem here. While the two job managers
>>>>>>> are active neither of them can properly delete jobgraphs. The above problem
>>>>>>> I described comes from the fact that Kubernetes gets JobManager 1 quickly
>>>>>>> after I manually kill it, so when I stop the job on JobManager 2 both are
>>>>>>> alive.
>>>>>>>
>>>>>>> I did a very simple test:
>>>>>>>
>>>>>>> - Start both job managers
>>>>>>> - Start a batch job in JobManager 1 and let it finish
>>>>>>> The jobgraphs in both Zookeeper and HDFS remained.
>>>>>>>
>>>>>>> On the other hand if we do:
>>>>>>>
>>>>>>> - Start only JobManager 1 (again in HA mode)
>>>>>>> - Start a batch job and let it finish
>>>>>>> The jobgraphs in both Zookeeper and HDFS are deleted fine.
>>>>>>>
>>>>>>> It seems like the standby manager still leaves some kind of lock on
>>>>>>> the jobgraphs. Do you think that's possible? Have you seen a similar
>>>>>>> problem?
>>>>>>> The only logs that appear on the standby manager while waiting are
>>>>>>> of the type:
>>>>>>>
>>>>>>> 2018-08-28 11:54:10,789 INFO
>>>>>>> org.apache.flink.runtime.jobmanager.ZooKeeperSubmittedJobGraphStore  -
>>>>>>> Recovered SubmittedJobGraph(9e0a109b57511930c95d3b54574a66e3, null).
>>>>>>>
>>>>>>> Note that this log appears on the standby jobmanager immediately
>>>>>>> when a new job is submitted to the active jobmanager.
>>>>>>> Also note that the blobs and checkpoints are cleared fine. The
>>>>>>> problem is only for jobgraphs both in ZooKeeper and HDFS.
>>>>>>>
>>>>>>> Trying to access the UI of the standby manager redirects to the
>>>>>>> active one, so it is not a problem of them not knowing who the leader is.
>>>>>>> Do you have any ideas?
>>>>>>>
>>>>>>> Thanks a lot,
>>>>>>> Encho
>>>>>>>
>>>>>>> On Tue, Aug 28, 2018 at 10:27 AM Till Rohrmann <trohrmann@apache.org>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Hi Encho,
>>>>>>>>
>>>>>>>> thanks a lot for reporting this issue. The problem arises whenever
>>>>>>>> the old leader maintains the connection to ZooKeeper. If this is the case,
>>>>>>>> then ephemeral nodes which we create to protect against faulty delete
>>>>>>>> operations are not removed and consequently the new leader is not able to
>>>>>>>> delete the persisted job graph. So one thing to check is whether the old JM
>>>>>>>> still has an open connection to ZooKeeper. The next thing to check is the
>>>>>>>> session timeout of your ZooKeeper cluster. If you stop the job within the
>>>>>>>> session timeout, then it is also not guaranteed that ZooKeeper has detected
>>>>>>>> that the ephemeral nodes of the old JM must be deleted. In order to
>>>>>>>> understand this better it would be helpful if you could tell us the timing
>>>>>>>> of the different actions.
>>>>>>>>
>>>>>>>> Cheers,
>>>>>>>> Till
>>>>>>>>
>>>>>>>> On Tue, Aug 28, 2018 at 8:17 AM vino yang <yanghua1127@gmail.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Hi Encho,
>>>>>>>>>
>>>>>>>>> A temporary solution can be used to determine if it has been
>>>>>>>>> cleaned up by monitoring the specific JobID under Zookeeper's "/jobgraph".
>>>>>>>>> Another solution, modify the source code, rudely modify the
>>>>>>>>> cleanup mode to the synchronous form, but the flink operation Zookeeper's
>>>>>>>>> path needs to obtain the corresponding lock, so it is dangerous to do so,
>>>>>>>>> and it is not recommended.
>>>>>>>>> I think maybe this problem can be solved in the next version. It
>>>>>>>>> depends on Till.
>>>>>>>>>
>>>>>>>>> Thanks, vino.
>>>>>>>>>
>>>>>>>>> Encho Mishinev <encho.mishinev@gmail.com> 于2018年8月28日周二 下午1:17写道:
>>>>>>>>>
>>>>>>>>>> Thank you very much for the info! Will keep track of the
>>>>>>>>>> progress.
>>>>>>>>>>
>>>>>>>>>> In the meantime is there any viable workaround? It seems like HA
>>>>>>>>>> doesn't really work due to this bug.
>>>>>>>>>>
>>>>>>>>>> On Tue, Aug 28, 2018 at 4:52 AM vino yang <yanghua1127@gmail.com>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>> About some implementation mechanisms.
>>>>>>>>>>> Flink uses Zookeeper to store JobGraph (Job's description
>>>>>>>>>>> information and metadata) as a basis for Job recovery.
>>>>>>>>>>> However, previous implementations may cause this information to
>>>>>>>>>>> not be properly cleaned up because it is asynchronously deleted by a
>>>>>>>>>>> background thread.
>>>>>>>>>>>
>>>>>>>>>>> Thanks, vino.
>>>>>>>>>>>
>>>>>>>>>>> vino yang <yanghua1127@gmail.com> 于2018年8月28日周二 上午9:49写道:
>>>>>>>>>>>
>>>>>>>>>>>> Hi Encho,
>>>>>>>>>>>>
>>>>>>>>>>>> This is a problem already known to the Flink community, you can
>>>>>>>>>>>> track its progress through FLINK-10011[1], and currently Till is fixing
>>>>>>>>>>>> this issue.
>>>>>>>>>>>>
>>>>>>>>>>>> [1]: https://issues.apache.org/jira/browse/FLINK-10011
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks, vino.
>>>>>>>>>>>>
>>>>>>>>>>>> Encho Mishinev <encho.mishinev@gmail.com> 于2018年8月27日周一
>>>>>>>>>>>> 下午10:13写道:
>>>>>>>>>>>>
>>>>>>>>>>>>> I am running Flink 1.5.3 with two job managers and two task
>>>>>>>>>>>>> managers in Kubernetes along with HDFS and Zookeeper in high-availability
>>>>>>>>>>>>> mode.
>>>>>>>>>>>>>
>>>>>>>>>>>>> My problem occurs after the following actions:
>>>>>>>>>>>>> - Upload a .jar file to jobmanager-1
>>>>>>>>>>>>> - Run a streaming job from the jar on jobmanager-1
>>>>>>>>>>>>> - Wait for 1 or 2 checkpoints to succeed
>>>>>>>>>>>>> - Kill pod of jobmanager-1
>>>>>>>>>>>>> After a short delay, jobmanager-2 takes leadership and
>>>>>>>>>>>>> correctly restores the job and continues it
>>>>>>>>>>>>> - Stop job from jobmanager-2
>>>>>>>>>>>>>
>>>>>>>>>>>>> At this point all seems well, but the problem is that
>>>>>>>>>>>>> jobmanager-2 does not clean up anything that was left from jobmanager-1.
>>>>>>>>>>>>> This means that both in HDFS and in Zookeeper remain job graphs, which
>>>>>>>>>>>>> later on obstruct any work of both managers as after any reset they
>>>>>>>>>>>>> unsuccessfully try to restore a non-existent job and fail over and over
>>>>>>>>>>>>> again.
>>>>>>>>>>>>>
>>>>>>>>>>>>> I am quite certain that jobmanager-2 does not know about any
>>>>>>>>>>>>> of jobmanager-1’s files since the Zookeeper logs reveal that it tries to
>>>>>>>>>>>>> duplicate job folders:
>>>>>>>>>>>>>
>>>>>>>>>>>>> 2018-08-27 13:11:00,038 [myid:] - INFO  [ProcessThread(sid:0
>>>>>>>>>>>>> cport:2181)::PrepRequestProcessor@648] - Got user-level
>>>>>>>>>>>>> KeeperException when processing sessionid:0x1657aa15e480033 type:create
>>>>>>>>>>>>> cxid:0x46 zxid:0x1ab txntype:-1 reqpath:n/a Error
>>>>>>>>>>>>> Path:/flink/default/jobgraphs/83bfa359ca59ce1d4635e18e16651e15/bbb259fd-7826-4950-bc7c-c2be23346c77
>>>>>>>>>>>>> Error:KeeperErrorCode = NodeExists for
>>>>>>>>>>>>> /flink/default/jobgraphs/83bfa359ca59ce1d4635e18e16651e15/bbb259fd-7826-4950-bc7c-c2be23346c77
>>>>>>>>>>>>>
>>>>>>>>>>>>> 2018-08-27 13:11:02,296 [myid:] - INFO  [ProcessThread(sid:0
>>>>>>>>>>>>> cport:2181)::PrepRequestProcessor@648] - Got user-level
>>>>>>>>>>>>> KeeperException when processing sessionid:0x1657aa15e480033 type:create
>>>>>>>>>>>>> cxid:0x5c zxid:0x1ac txntype:-1 reqpath:n/a Error
>>>>>>>>>>>>> Path:/flink/default/checkpoint-counter/83bfa359ca59ce1d4635e18e16651e15
>>>>>>>>>>>>> Error:KeeperErrorCode = NodeExists for
>>>>>>>>>>>>> /flink/default/checkpoint-counter/83bfa359ca59ce1d4635e18e16651e15
>>>>>>>>>>>>>
>>>>>>>>>>>>> Also jobmanager-2 attempts to delete the jobgraphs folder in
>>>>>>>>>>>>> Zookeeper when the job is stopped, but fails since there are leftover files
>>>>>>>>>>>>> in it from jobmanager-1:
>>>>>>>>>>>>>
>>>>>>>>>>>>> 2018-08-27 13:12:13,406 [myid:] - INFO  [ProcessThread(sid:0
>>>>>>>>>>>>> cport:2181)::PrepRequestProcessor@648] - Got user-level
>>>>>>>>>>>>> KeeperException when processing sessionid:0x1657aa15e480033 type:delete
>>>>>>>>>>>>> cxid:0xa8 zxid:0x1bd txntype:-1 reqpath:n/a Error
>>>>>>>>>>>>> Path:/flink/default/jobgraphs/83bfa359ca59ce1d4635e18e16651e15
>>>>>>>>>>>>> Error:KeeperErrorCode = Directory not empty for
>>>>>>>>>>>>> /flink/default/jobgraphs/83bfa359ca59ce1d4635e18e16651e15
>>>>>>>>>>>>>
>>>>>>>>>>>>> I’ve noticed that when restoring the job, it seems like
>>>>>>>>>>>>> jobmanager-2 does not get anything more than jobID, while it perhaps needs
>>>>>>>>>>>>> some metadata? Here is the log that seems suspicious to me:
>>>>>>>>>>>>>
>>>>>>>>>>>>> 2018-08-27 13:09:18,113 INFO
>>>>>>>>>>>>> org.apache.flink.runtime.jobmanager.ZooKeeperSubmittedJobGraphStore  -
>>>>>>>>>>>>> Recovered SubmittedJobGraph(83bfa359ca59ce1d4635e18e16651e15, null).
>>>>>>>>>>>>>
>>>>>>>>>>>>> All other logs seem fine in jobmanager-2, it doesn’t seem to
>>>>>>>>>>>>> be aware that it’s overwriting anything or not deleting properly.
>>>>>>>>>>>>>
>>>>>>>>>>>>> My question is - what is the intended way for the job managers
>>>>>>>>>>>>> to correctly exchange metadata in HA mode and why is it not working for me?
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks in advance!
>>>>>>>>>>>>
>>>>>>>>>>>>

Mime
View raw message