flink-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Aljoscha Krettek <aljos...@apache.org>
Subject Re: Minimal HA Setup for Apache Flink
Date Tue, 24 Oct 2017 14:42:32 GMT
Hi,

Your assumptions are mostly correct.

1. This is correct, but you can also run a non-YARN setup where you only have one JobManager
if you have a system that will make sure to restart/keep alive this JobManager. This could
either be some supervisor, or Kubernetes, or Mesos. You also probably need to factor in the
distributed filesystem (or similar thing) that you need for state snapshots.

2. You can run Flink without HA but then a failure will bring the complete cluster down, meaning
any state checkpoints/snapshots will be lost. You can get around this by enabling externalised
checkpoints [1]. With this, you can restore from a checkpoint even after the cluster failed.

3. In order to recover from failures you always need state snapshots. HA only makes the JobManager
failure resilient. That being said, restarting the cluster after failure and recovering from
an externalised checkpoint should probably take a couple of minutes if you don't have too
many nodes.

[1] https://ci.apache.org/projects/flink/flink-docs-release-1.4/ops/state/checkpoints.html#externalized-checkpoints
<https://ci.apache.org/projects/flink/flink-docs-release-1.4/ops/state/checkpoints.html#externalized-checkpoints>

Best,
Aljoscha

> On 17. Oct 2017, at 11:53, Srinath Perera <hemapani@gmail.com> wrote:
> 
> Hi All,
> 
> I am trying to write an article comparing minimal HA(Highly available)
> deployments of different streaming processing systems.
> 
> Basically, the question is if an organization has a limited workload, such
> as 10k events per second, which might grow in the future, what is the
> minimal setup they can use to run a highly available Stream Processor?
> 
> Could someone help answer following questions?
> 
>   1. How many nodes minimal Apache Flink HA setup needs? As I understood
>   from [2], it is zookeeper nodes + 2 job managers without YARN and 1 job
>   manager with YARN + worker nodes? Is this correct?
>   2. As per [1], Zookeeper needs minimal 3 nodes to provide HA. Is there a
>   way to run Apache Flink without HA?
>   3. If someone runs Apache Flink without HA, but use state snapshots, how
>   fast it can recover after a failure? ( ballpark figure)
> 
> Thanks
> Srinath
> 
> 
>   1.
>   https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/HDFSHighAvailabilityWithNFS.html#Deploying_ZooKeeper
>   2.
>   https://ci.apache.org/projects/flink/flink-docs-release-1.3/setup/jobmanager_high_availability.html#standalone-cluster-high-availability
> 
> 
> -- 
> ============================
> Srinath Perera, Ph.D.
>   http://people.apache.org/~hemapani/
>   http://srinathsview.blogspot.com/


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message