nifi-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mark Payne <>
Subject Re: Zookeeper issues mentioned in a talk about storm / heron
Date Thu, 07 Apr 2016 19:13:23 GMT

Certainly some good points here. I definitely can agree that there is some concern
about the resource contention between heart-beating and state management.

In testing the PR 323 for heart-beating to ZooKeeper, we did notice that if we lose
the quorum, the nodes are no longer able to know which nodes are sending heartbeats,
which results in the state of the cluster really being unknown. For example, if we have a
node NiFi cluster, all running an embedded ZooKeeper, if we lose 2 nodes, we are now
reporting that we have 2/3 nodes instead of 1/3 nodes because we can't read from ZooKeeper
to determine that the second node is missing.

While I guess I did realize that this was the way it would work, it wasn't clear how poor
usability would be here, when losing a quorum means that the state of the cluster really is

Also, as the heart-beating was refactored, we dramatically trimmed the size of Heartbeat messages
to a very small size (probably around 1-2 KB). Also, with no NCM running the show anymore,
all nodes
within a cluster will be required to be able to communicate with one another to send Cluster

To this end, I think the better solution regarding heart-beating is to simply have nodes send
their heartbeat
to all nodes in the cluster. This allows all nodes to know the current state. Since the heartbeats
are now
very small, the network chatter will be pretty limited.

This means that ZooKeeper will be required only for two things: Leader Election, and State
(with the ability to provide a different storage mechanism for State Management later, if
we see the need).
Leader Election still would be used to elect a 'Cluster Coordinator' who is responsible for
(among other things)
monitoring heartbeats. If that node does not receive a heartbeat from Node X in some amount
of time, it would
notify all nodes in the cluster that Node X is now disconnected. This way, even if the Leader
is unable to
communicate with Node X, all other nodes in the cluster know it is disconnected and will issue
Node X
a Disconnection Request the next time that Node X heartbeats.

I have created a JIRA [1] where we can track any further concerns that may develop in the


[1] <>

> On Apr 4, 2016, at 12:40 PM, Tony Kurc <> wrote:
> Mark,
> Fair points!
> Something an Apache Accumulo committer pointed out at meetup is the is that
> the scale issues may come sooner than a couple hundred nodes due to the
> size of writes and potential frequency of writes (Joe Percivall's
> demonstration on the windowing seemed like it could write much more
> frequently than a couple times a minute).
> Another point someone brought up is that if heartbeating and state
> management are competing for resources, bad things can happen.
> Tony
> On Fri, Apr 1, 2016 at 9:31 AM, Mark Payne <> wrote:
>> Guys,
>> Certainly some great points here and important concepts to keep in mind!
>> One thing to remember, though, that very much differentiates NIFi from
>> Storm
>> or HBase or Accumulo: those systems are typically expected to scale to
>> hundreds
>> or thousands of nodes (you mention a couple hundred node HBase cluster
>> being
>> "modest"). With NiFi, we are typically operating clusters on the order of
>> several
>> nodes to dozens of nodes. A couple hundred nodes would be a pretty massive
>> NiFi
>> cluster.
>> In terms of storing state, we could potentially get into more of a sticky
>> situation
>> if not done carefully. However, we generally expect the "State Management"
>> feature to be used for
>> occasionally storing small amounts of state, such as for ListHDFS storing
>> 2 timestamps
>> and we don't expect ListHDFS to be continually hammering HDFS asking for
>> a new listing. It may be scheduled to run once every few minutes for
>> example.
>> That said, we certainly have been designing everything using appropriate
>> interfaces
>> in such a way that if we do later decide that we want to use some other
>> mechanism
>> for storing state and/or heartbeats, it should be a very reasonable path to
>> take advantage of some other service for one or both of these tasks.
>> Thanks
>> -Mark
>>> On Mar 31, 2016, at 5:35 PM, Sean Busbey <> wrote:
>>> HBase has also had issues with ZK at modest (~couple hundred node)
>>> scale when using it to act as an intermediary for heartbeats and to do
>>> work assignment.
>>> On Thu, Mar 31, 2016 at 4:33 PM, Tony Kurc <> wrote:
>>>> My commentary that didn't accompany this - it appears Storm was using
>>>> zookeeper in a similar way as the road we're heading down, and it was
>> such
>>>> a major bottleneck that they moved key value storage and heartbeating
>> out
>>>> into separate services, and then re-engineering (i.e. built Heron).
>> Before
>>>> we get too dependent on zookeeper, may be worth learning some lessons
>> from
>>>> the crew that built Heron or from a team that learned zookeeper lessons
>>>> scale like accumulo.
>>>> On Thu, Mar 24, 2016 at 6:22 PM, Tony Kurc <> wrote:
>>>>> I mentioned slides I saw at the meetup about zookeeper perils at scale
>> in
>>>>> storm, here are slides, i couldn't find a video after some limited
>>>>> searching.

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message