Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 2F856200D24 for ; Tue, 24 Oct 2017 16:42:37 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 2E15A160BE0; Tue, 24 Oct 2017 14:42:37 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 75280160BDB for ; Tue, 24 Oct 2017 16:42:36 +0200 (CEST) Received: (qmail 23149 invoked by uid 500); 24 Oct 2017 14:42:35 -0000 Mailing-List: contact dev-help@flink.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@flink.apache.org Delivered-To: mailing list dev@flink.apache.org Received: (qmail 23138 invoked by uid 99); 24 Oct 2017 14:42:35 -0000 Received: from mail-relay.apache.org (HELO mail-relay.apache.org) (140.211.11.15) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 24 Oct 2017 14:42:35 +0000 Received: from aljoschas-mbp.fritz.box (ip-2-205-81-99.web.vodafone.de [2.205.81.99]) by mail-relay.apache.org (ASF Mail Server at mail-relay.apache.org) with ESMTPSA id BD0A81A0029 for ; Tue, 24 Oct 2017 14:42:34 +0000 (UTC) From: Aljoscha Krettek Content-Type: multipart/alternative; boundary="Apple-Mail=_F85B25E8-9C9F-4A16-8FCD-F51B2F938CFE" Mime-Version: 1.0 (Mac OS X Mail 11.0 \(3445.1.7\)) Subject: Re: Minimal HA Setup for Apache Flink Date: Tue, 24 Oct 2017 16:42:32 +0200 References: To: dev@flink.apache.org In-Reply-To: Message-Id: <48047EEF-DF93-4923-82C6-9A9F391A61FD@apache.org> X-Mailer: Apple Mail (2.3445.1.7) archived-at: Tue, 24 Oct 2017 14:42:37 -0000 --Apple-Mail=_F85B25E8-9C9F-4A16-8FCD-F51B2F938CFE Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=us-ascii Hi, Your assumptions are mostly correct. 1. This is correct, but you can also run a non-YARN setup where you only = have one JobManager if you have a system that will make sure to = restart/keep alive this JobManager. This could either be some = supervisor, or Kubernetes, or Mesos. You also probably need to factor in = the distributed filesystem (or similar thing) that you need for state = snapshots. 2. You can run Flink without HA but then a failure will bring the = complete cluster down, meaning any state checkpoints/snapshots will be = lost. You can get around this by enabling externalised checkpoints [1]. = With this, you can restore from a checkpoint even after the cluster = failed. 3. In order to recover from failures you always need state snapshots. HA = only makes the JobManager failure resilient. That being said, restarting = the cluster after failure and recovering from an externalised checkpoint = should probably take a couple of minutes if you don't have too many = nodes. [1] = https://ci.apache.org/projects/flink/flink-docs-release-1.4/ops/state/chec= kpoints.html#externalized-checkpoints = Best, Aljoscha > On 17. Oct 2017, at 11:53, Srinath Perera wrote: >=20 > Hi All, >=20 > I am trying to write an article comparing minimal HA(Highly available) > deployments of different streaming processing systems. >=20 > Basically, the question is if an organization has a limited workload, = such > as 10k events per second, which might grow in the future, what is the > minimal setup they can use to run a highly available Stream Processor? >=20 > Could someone help answer following questions? >=20 > 1. How many nodes minimal Apache Flink HA setup needs? As I = understood > from [2], it is zookeeper nodes + 2 job managers without YARN and 1 = job > manager with YARN + worker nodes? Is this correct? > 2. As per [1], Zookeeper needs minimal 3 nodes to provide HA. Is = there a > way to run Apache Flink without HA? > 3. If someone runs Apache Flink without HA, but use state snapshots, = how > fast it can recover after a failure? ( ballpark figure) >=20 > Thanks > Srinath >=20 >=20 > 1. > = https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/HDFS= HighAvailabilityWithNFS.html#Deploying_ZooKeeper > 2. > = https://ci.apache.org/projects/flink/flink-docs-release-1.3/setup/jobmanag= er_high_availability.html#standalone-cluster-high-availability >=20 >=20 > --=20 > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D > Srinath Perera, Ph.D. > http://people.apache.org/~hemapani/ > http://srinathsview.blogspot.com/ --Apple-Mail=_F85B25E8-9C9F-4A16-8FCD-F51B2F938CFE--