nifi-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mark Payne <marka...@hotmail.com>
Subject Re: Shutdown of one Node in Cluster
Date Wed, 12 Apr 2017 15:32:32 GMT
Excellent! Glad it's all working now. And thanks for the follow-up to let us know!

-Mark

> On Apr 12, 2017, at 11:30 AM, Mark Bean <mark.o.bean@gmail.com> wrote:
> 
> Mark,
> 
> I believe you're right. Yesterday, I corrected a typo in the
> nifi.properties file related to the FQDN name. I thought it was only the
> site-to-site property (nifi.remote.input.host). However, when I
> intentionally introduced a typo to one of the three ZK servers in the
> nifi.zookeeper.connect.string today, I was able to reproduce the symptoms.
> I'm sure that must have been it. Without the typo, all is working well.
> 
> Thanks,
> Mark
> 
> On Wed, Apr 12, 2017 at 10:36 AM, Mark Payne <markap14@hotmail.com> wrote:
> 
>> Mark,
>> 
>> I haven't seen this behavior personally, so I can't be sure why exactly it
>> would change state
>> to SUSPENDED and not then re-connect. In your nifi.properties, do you have
>> the
>> "nifi.zookeeper.connect.string" property setup to point to all 3 of the
>> nodes, also? If so, it should
>> be able to connect to one of the other two nodes listed.
>> 
>> Thanks
>> -Mark
>> 
>>> On Apr 11, 2017, at 2:37 PM, Mark Bean <mark.o.bean@gmail.com> wrote:
>>> 
>>> Ok, will keep the standalone ZooKeeper in mind.
>>> 
>>> Back to the original issue, any idea why ZooKeeper went to a PENDING
>> state
>>> making the cluster unavailable?
>>> 
>>> 
>>> On Tue, Apr 11, 2017 at 2:10 PM, Mark Payne <markap14@hotmail.com>
>> wrote:
>>> 
>>>> Mark,
>>>> 
>>>> Yes, 2 out of 3 should be sufficient. For testing purposes, a single
>>>> zookeeper instance
>>>> is fine, as well. For production, I would not actually recommend using
>> an
>>>> embedded
>>>> ZooKeeper at all and instead use a standalone ZooKeeper. ZooKeeper tends
>>>> not to be
>>>> very happy when running on a box on which there is already heavy
>> resource
>>>> load, so if
>>>> your cluster starts getting busy, you'll see far more stable performance
>>>> from a standalone
>>>> ZooKeeper.
>>>> 
>>>> 
>>>>> On Apr 11, 2017, at 2:06 PM, Mark Bean <mark.o.bean@gmail.com>
wrote:
>>>>> 
>>>>> All 3 nodes are running embedded ZooKeeper. And, the Admin Guide states
>>>>> "ZooKeeper requires a majority of nodes be active in order to
>> function".
>>>>> So, I assumed 2/3 being active was ok. Perhaps not.
>>>>> 
>>>>> Related: can a Cluster be setup with only 1 ZooKeeper node? Clearly,
in
>>>>> production, one would not want to do this. But when testing, this
>> should
>>>> be
>>>>> acceptable, yes?
>>>>> 
>>>>> 
>>>>> 
>>>>> On Tue, Apr 11, 2017 at 1:56 PM, Mark Payne <markap14@hotmail.com>
>>>> wrote:
>>>>> 
>>>>>> Mark,
>>>>>> 
>>>>>> Are all of your nodes running an embedded ZooKeeper, or only 1 or
2 of
>>>>>> them?
>>>>>> 
>>>>>> Thanks
>>>>>> -Mark
>>>>>> 
>>>>>>> On Apr 11, 2017, at 1:19 PM, Mark Bean <mark.o.bean@gmail.com>
>> wrote:
>>>>>>> 
>>>>>>> I have a 3-node Cluster with each Node hosting the embedded
>> zookeeper.
>>>>>> When
>>>>>>> one Node is shutdown (and the Node is not the Cluster Coordinator),
>> the
>>>>>>> Cluster becomes unavailable. The UI indicates "Action cannot
be
>>>> performed
>>>>>>> because there is currently no Cluster Coordinator elected. The
>> request
>>>>>>> should be tried again after a moment, after a Cluster Coordinator
has
>>>>>> been
>>>>>>> automatically elected."
>>>>>>> 
>>>>>>> The app.log indicates "ConnectionStateManager State change:
>> SUSPENDED".
>>>>>>> And, there are an endless number of "CuratorFrameworkImpl Background
>>>>>> retry
>>>>>>> gave up" messages; the surviving Nodes are not able to allow
the
>>>> Cluster
>>>>>> to
>>>>>>> survive.
>>>>>>> 
>>>>>>> I would have thought since 2/3 Nodes are surviving, there wouldn't
>> be a
>>>>>>> problem. In addition, since the Node that was shutdown was not
the
>>>>>> Cluster
>>>>>>> Coordinator nor Primary node, no Cluster state changes were required.
>>>>>>> 
>>>>>>> nifi.cluster.flow.election.max.wait.time=2 mins
>>>>>>> nifi.cluster.flow.election.max.candidates=
>>>>>>> 
>>>>>>> The same behavior was observed when max.candidates was set to
2.
>>>>>>> 
>>>>>>> NiFi 1.1.2
>>>>>> 
>>>>>> 
>>>> 
>>>> 
>> 
>> 


Mime
View raw message