nifi-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Joe Witt <>
Subject Re: NiFi data HA in cluster mode
Date Tue, 09 Jan 2018 06:39:00 GMT
That is a fair point Brett - i wasnt thinking of that when I answer
but that is a good point.  Then again we should create those
connections lazily so if we don't i'd call that a bug :)


Yeah there is definitely intent to provide distributed data durability
across nodes.  This is especially important as it serves as a great
way to support elastic clustering behavior.

I'm not sure HDFS as the backing store is best and we all have to keep
in mind we must ensure distributed durability of flowfile, content,
and provenance.  That might mean application level replication similar
to what Apache Kafka does.  That might mean distributed durable block
storage and then deciding which node is responsible for processing a
given set of data at a time.  There are a lot of ways to slice this
and they all offer different tradeoffs.

On Mon, Jan 8, 2018 at 11:37 PM, Brett Ryan <> wrote:
> I had someone from Hortonworks suggest to me that I should also set any PutSQL processors
to only execute on primary. The reasoning was due to flooding of the JDBC pool.
>> On 9 Jan 2018, at 17:25, Joe Witt <> wrote:
>> I'd avoid setting any processor to primary node only unless it is a
>> source processor (something that brings data into the system).
>> But, yes, I believe your description is accurate as of now.
>> Thanks
>>> On Mon, Jan 8, 2018 at 11:21 PM, 尹文才 <> wrote:
>>> Thanks Joe, so you mean for example, if I set one processor to run only on
>>> primary node in the cluster and there're 100 FlowFiles in the incoming
>>> queue of the processor
>>> waiting to be processed by this processor, and the processor suddenly goes
>>> down and then another node is elected as the primary node, those 100
>>> FlowFiles will be kept locally
>>> in the node that went down and will continue to be processed by the node
>>> when it goes back online, these FlowFiles will not be available to the new
>>> primary node and other nodes,
>>> am I correct?
>>> Regards,
>>> Ben
>>> 2018-01-09 14:08 GMT+08:00 Joe Witt <>:
>>>> Ben,
>>>> Data already mid-flow within a node will be kept on the node and
>>>> processed when the node is back on-line.  All other data coming into
>>>> the cluster can fail-over to other nodes provided you're sourcing data
>>>> with queuing semantics or automated load balancing or fail-over as-is
>>>> present in the Apache NiFi Site to Site protocol.
>>>> Thanks
>>>> Joe
>>>>> On Mon, Jan 8, 2018 at 11:05 PM, 尹文才 <>
>>>>> Hi guys, I have a question about data HA when NiFi is run in clustered
>>>>> mode, if one node goes down, will the flowfiles owned by this node taken
>>>>> over and processed by another node?
>>>>> Or will the flowfiles be kept locally to that node and will only be
>>>>> processed when that node is back online? Thanks.
>>>>> Regards,
>>>>> Ben

View raw message