flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stefano Baghino <stefano.bagh...@radicalbit.io>
Subject Issues testing Flink HA w/ ZooKeeper
Date Mon, 15 Feb 2016 11:35:46 GMT
Hello everyone,

last week I've ran some tests with Apache ZooKeeper to get a grip on Flink
HA features. My tests went bad so far and I can't sort out the reason.

My latest tests involved Flink 0.10.2, ran as a standalone cluster with 3
masters and 4 slaves. The 3 masters are also the ZooKeeper (3.4.6)
ensemble. I've started ZooKeeper on each machine, tested it's availability
and then started the Flink cluster. Since there's no reliable distributed
filesystem on the cluster, I had to use the local file system as the state

I then submitted a very simple streaming job that writes the timestamp on a
text file on the local file system each second and then went on to kill the
process running the job manager to verify that another job manager takes
over. However, the job just stopped. I still have to perform some checks on
the handover to the new job manager, but before digging deeper I wanted to
ask if my expectation of having the job going despite the job manager
failure is unreasonable.

Thanks in advance.

Stefano Baghino

Software Engineer @ Radicalbit

View raw message