nifi-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tamas Palfy (Jira)" <j...@apache.org>
Subject [jira] [Commented] (NIFI-7081) Improve handling of Load Balanced Connections when one node is slow
Date Thu, 06 Feb 2020 18:10:00 GMT

    [ https://issues.apache.org/jira/browse/NIFI-7081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17031835#comment-17031835
] 

Tamas Palfy commented on NIFI-7081:
-----------------------------------

[~markap14], [~joewitt]

I have tested a setup that is similar to my previous idea. The main differences are:
 # New balancing strategy instead changing round robin (Doesn't really affect functionality)
 # When checking if a connection is full, it takes into consideration what the balancing strategy
is

The result is working. Everything is the same as before with the existing balancing strategies
and the new one balances among the available nodes. Backpressure is also applied... except
the thresholds are probably not where we would want them with the new strategy.

Given
N=number or nodes
Q="Back Pressure Object Threshold" set on the connection
Consumer processor is not running

If the producer processor runs on all nodes, backpressure kicks in at N*N*Q.
If the producer processor runs on primary only, backpressure kicks in at (2N-1)*Q.

I guess it makes sense as in the first case all N nodes have a Q-sized buffer for all N nodes
- themselves (local partition) and the sibling nodes (remote partitions).
In the second case if I understand correctly the primary node can actually send over Q number
of flowfiles to the sibling nodes (which will be stored in their local partition I presume)
- that's the (N-1)*Q - and also has it's own N*Q local buffers (the 1 local- and N-1 remote
partitions).

(These are not just theoretical values btw, I did some measurements.)

Not sure if those increased thresholds could work for us.
As for me, I think running a processor on all nodes with a load-balanced connection hardly
makes sense (why not handle each node their own loads like normal) and (2N-1)Q instead of
N*Q in case of a primary only processor doesn't sound that terrible with only a constant 2
factor increase.

> Improve handling of Load Balanced Connections when one node is slow
> -------------------------------------------------------------------
>
>                 Key: NIFI-7081
>                 URL: https://issues.apache.org/jira/browse/NIFI-7081
>             Project: Apache NiFi
>          Issue Type: Improvement
>          Components: Core Framework
>            Reporter: Mark Payne
>            Assignee: Tamas Palfy
>            Priority: Major
>
> When a connection is configured to use Round Robin load balancing, the FlowFIle Queue
works by queuing up one FlowFile to be processed locally, one to be sent to Node 2, one to
be sent to Node 3, the next one to be locally processed, etc. (in this case, assuming a 3-node
cluster).
> If one node in a cluster is slow, though, we can have a situation where the local partition
is empty and the partition for Node 2 is empty. But Node 3's partition is full, because Node
3 is not processing the data quickly enough. As a result, on Node 1, the queue ends up applying
backpressure, with all FlowFiles in the queue waiting to be pushed to Node 3.
> In such a situation, we end up preventing any data from being processed by Node 1 or
Node 2. It would be advantageous to improve this so that Node 1 and Node 2 could still be
busy processing data.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Mime
View raw message