nifi-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michal Klempa <michal.kle...@gmail.com>
Subject Re: NiFi Clustering
Date Tue, 10 Jan 2017 08:50:18 GMT
Hi,
we have been doing some tests with NiFi cluster and similar questions arised.
Our configuration is as follows:
NiFi ClusterA:
172.31.12.232 nifi-cluster-04 (sample configuration
nifi-cluster-04.properties in attachment)
172.31.5.194 nifi-cluster-05
172.31.15.84 nifi-cluster-06
Standalone ZooKeeper, 3 instances, sample configuration
nifi-cluster-04.zoo.cfg in attachment.

NiFi ClusterB:
172.31.9.147 nifi-cluster-01 (sample configuration
nifi-cluster-01.properties in attachment)
172.31.24.77 nifi-cluster-02
172.31.8.152 nifi-cluster-03
Standalone ZooKeeper, 3 instances, sample configuration
nifi-cluster-01.zoo.cfg in attachment.

We have done testing the following:
ClusterA_flow (template in attachment):
GenerateFlowFile -> output_port ("to_clusterB" - the port to be
imported as RemoteProcessGroup from ClusterB)
                                 -> PutFile ("/tmp/clusterA", create
missing dirs: false)

ClusterB_flow (template in attachment):
RemoteProcessGroup (attached to 172.31.12.232:8081/nifi, remote ports:
"to_clusterB") > PutFile ("/tmp/clusterB", create missing dirs: false)

Testing scenario is:
GenerateFlowFile in ClusterA, send the file to output port
"to_clusterB" and also PutFile ("/tmp/clusterA"). Receive the FlowFile
from RemoteProcessGtroup in ClusterB, save the file to "/tmp/clusterB"
on ClusterB machines.

Now following situations were tested:
Situation1: all the nodes are up and running - three FlowFiles are
generated in ClusterA, one on each node, all three files are
transferred to ClusterB, although the distribution is not even on
ClusterB, When we rerun the GenerateFlowFile (e.g. every 10 sec) 4
times, we get 12 FlowFiles generated in ClusterA (4 on each node), but
the have got 6 on node nifi-cluster-01, 2 on node nifi-cluster-02 and
4 flow files on node nifi-cluster-03. Although the distribution is not
even, the flowfiles are properly transferred to clusterB and that is
important.
Conclusion: is everything is green, everything works as expected (and
same as separate nifi instances)

Situation2: We have run GenerateFlowFile 2 times on ClusterA,
FlowFiles were succesfuly transferred to ClusterB. Then we removed the
target directory "/tmp/clusterB" on node nifi-cluster-01 node.  We
have executed GenerateFlowFile two more times. As the PutFile there,
is configured to NOT created target directiories, we expected errors.
But the key point is, how can nifi cluster help in resolution.
Although the Failure relationship from PutFile is directed again as
the input to PutFile, the result is: 12 FlowFiles generated in
ClusterA (4 on each node), But after directory removal on node
nifi-cluster-01, 6 flow files remained stucked on node
nifi-cluster-01, circling around PutFile with Target directory not
exists error.
Conclusion: From this, we can see, although we have cluster setup, the
nodes do balance somewhere inside RemoteProcessGroup but do not
rebalance the FlowFiles stucked on relationaships once they enter the
flow, even after they are penalized by processor. Is this the desired
behavior? Are there any plans, to improve on this?

Situation3: We have run GenerateFlowFile 2 times on ClusterA,
FlowFiles were succesfuly transferred to ClusterB. Then we shielded
node nifi-cluster-01 (ClusterB) using iptables, so that NiFi would be
unreachable, and ZooKeeper would become unreachable on this node.
Iptables commands used:
```
iptables -A INPUT -p tcp --sport 513:65535 --dport 22 -m state --state
NEW,ESTABLISHED -j ACCEPT
iptables -A OUTPUT -p tcp --sport 22 --dport 513:65535 -m state
--state ESTABLISHED -j ACCEPT
iptables -A INPUT -j DROP
iptables -A OUTPUT -j DROP
```
This should simulate HW failure from NiFi and ZooKeepers point of view.
We have executed GenerateFlowFile two more times. The result was: 6
FlowFiles generated in ClusterA (4 on each node), after shielding
nifi-cluster-01 node, 6 more flow files were transferred to ClusterB
(distributed unevenly on nodes nifi-cluster-02 and nifi-cluster-03).
Conclusion: From this, we can see, that NiFi cluster setup does help
us in transfer of FlowFiles, if one of the destination nodes becomes
unavailable. For separate nifi instances, we are currently trying to
figure out how to arrange the flows to achieve this behavior.Any
ideas?

Situation4: We have run GenerateFlowFile 2 times on ClusterA,
FlowFiles were succesfuly transferred to ClusterB. Then we shielded
node nifi-cluster-04 (ClusterA) using iptables, so that NiFi would be
unreachable, and ZooKeeper would become unreachable on this node.
Iptables commands used:
```
iptables -A INPUT -p tcp --sport 513:65535 --dport 22 -m state --state
NEW,ESTABLISHED -j ACCEPT
iptables -A OUTPUT -p tcp --sport 22 --dport 513:65535 -m state
--state ESTABLISHED -j ACCEPT
iptables -A INPUT -j DROP
iptables -A OUTPUT -j DROP
```
This should simulate HW failure from NiFi and ZooKeepers point of view.

The GenerateFlowFile remained executing in timely manner and we were
unable to stop it. As the UI became unavailable on ClusterA. After
shielding nifi-cluster-04 node, remaining 2 nodes in ClusterA were
generating flow files and these were transferred to ClusterB, so the
flow was running. But it was unmanageable as the UI became
unavailable.
Conclusion: From this, we can see, that NiFi cluster setup does help
us in transfer of FlowFiles, if one of the source nodes becomes
unavailable. Unfortunately we experienced UI issues. For separate nifi
instances, we are currently trying to figure out how to arrange the
flows to achieve this behavior.Any ideas?

* * *

Moreover, we tested upgrade process of the flow.xml.gz. Currently, we
are using separate NiFi instances managed by Ansible(+Jenkins). The
job of flow.xml.gz upgrade consists basically of
1. service nifi stop
2. backup old and place new flow.xml.gz file into conf/ nifi directory
3. service nifi start
As our flows are pre-tested in staging environment, we have never
experienced issues in production, like nifi wouldn't start cause of
damaged flow.xml.gz. Everything works ok.
Even if something would break, we have other separate hot production
NiFi instances with the old flow.xml.gz running, so the overall flow
is running through the other nodes (with performance hit of course).
We can still revert to original flow.xml.gz on a single node we are
upgrading at once.

Now the question is, if are going to use NiFi cluster feature, how can
we achieve rolling upgrades of the flow.xml.gz? Should we run a
separate NiFi Cluster and switch between two clusters?
We experienced this behavior: NiFi instance does not join the NiFi
cluster if the flow.xml.gz differs. We had to turn off all NiFi
instances in a cluster for a while to start a single one with new
flow.xml.gz to populate the flow pool with the new version. Then, we
have been forced to deploy new flow.xml.gz to other 2 nodes, as they
rejected to join cluster :)

* * *

To our use-cases for now, we find using separate nifi instances
superior to using Nifi cluster. Mainly cause of flow.xml.gz upgrade
(unless somebody gives us advice on this! thank you).
Regarding the flow balance and setup of inter-cluster communication,
we do not know how to achieve this without nifi cluster setup. As for
now, our flow is very simple and can basically run in parallel in
multiple single instances, the separate nifi instances work well (even
our source system supports balancing using more IPs so we do not even
have to bother in setting up balanced IP on routers).

Any comments are welcome. Thanks.
Michal Klempa

On Sat, Dec 10, 2016 at 9:03 AM, Caton, Nigel <nigel.caton@cgi.com> wrote:
> Thanks Bryan.
>
> On 2016-12-09 15:32 (-0000), Bryan Bende <b...@gmail.com> wrote:
>> Nigel,>
>>
>> The advantage of using a cluster is that whenever you change something in>
>> the UI, it will be changed on all nodes, and you also get a central view of>
>> the metrics/stats across all nodes.  If you use standalone nodes you would>
>> have to go to each node and make the same changes.>
>>
>> It sounds like you are probably doing automatic deployments of a flow that>
>> you setup else where and aren't planning to ever modify the production>
>> nodes so maybe the above is a non-issue for you.>
>>
>> The rolling deployment scenario depends on whether you are updating the>
>> flow, or just code. For example, if you are just updating code then you>
>> should be able to do a rolling deployment in a cluster, but if you are>
>> updating the flow then I don't think it will work because the a node will>
>> come up with the new flow and attempt to join the cluster, and the cluster>
>> won't accept it because the flow is different.>
>>
>> Hope that helps.>
>>
>> -Bryan>
>>
>>
>> On Fri, Dec 9, 2016 at 9:33 AM, Caton, Nigel <ni...@cgi.com> wrote:>
>>
>> > Are there any views of the pros/cons of running a native NiFi cluster>
>> > versus a cluster of standalone

Mime
View raw message