nifi-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Joe Witt <joe.w...@gmail.com>
Subject Re: guaranteed delivery
Date Mon, 02 Jan 2017 20:52:56 GMT
Hello

NiFi's data durability model can be well understood by reading here
[1].  This addresses how data is handled once it is under NiFi's
control.

So then there is the matter of considering the safety of data as it is
coming 'into nifi' and 'being sent out of nifi'.

When exchanging data between processes you have generally the
following things to consider and these are well discussed on-line.
1) At least once
2) At most once
3) Exactly once.

So when bring data 'into nifi' and 'out of nifi' we do generally
provide by default for 'at least once'.  It is the case that
duplicates can be received and duplicates can be sent then.  But it is
also the case that generally speaking you will not lose data
(obviously nifi cannot control what external processes do to the
data).

Achieving exactly once behavior requires the two processes to have
some agreed upon protocol which offers this behavior.  It could be
something formal like 'two phase commit' or it could be something
informal where you as an administrator have fixed all the variables to
the best you can to ensure there will be no issues.  So lets look at
your example.  You want to pull files via FTP.

There are two critical steps to this:
1) Copy bytes from remote system to nifi using FTP
2) Change the state of the remote system so we don't keep pulling the
file (either removal or rename).

If you do step 1 then step 2 you are basically implementing at least
once behavior.  You wont lose data using such a protocol but you could
have duplicates.

If you do step 2 then step 1 you are implementing at least once (not
exactly once).  It is at least once because what if there was some
error in nifi when committing the state of things.

The other thing you mentioned was sequence of processing.  NiFi will
by default process data in the order it received and as it goes
between processes it is placed in a prioritized queue.  Obviously if
you're running multiple threads and such then ordering can vary.

There is a lot here to discuss and cases to consider so happy to keep
the discussion going.  Hopefully this helps you see the difference
between 'nifi the framework and its durability model' versus 'nifi
processors and patterns of using them together'.

[1] https://nifi.apache.org/docs/nifi-docs/html/nifi-in-depth.html

Thanks
Joe

On Mon, Jan 2, 2017 at 3:19 PM,  <ywika@yahoo.com> wrote:
> I am reviewing NiFi as a possible replacement for an internally developed
> data transfer tool.
>
> NiFi appears to be of the "the data is guaranteed to be delivered at least
> once" variety. Where my needs are "the data is guaranteed to be delivered
> once"; to the point that I'm willing to manually review and resolve failures
> that occur "beyond the point of no return" to ensure 1x delivery and no data
> loss.
> Some of my data transfers are of the sequential transactional type, where
> they must be transferred and processed, in sequence.
>
> Take for instance GetFTP, I see holes in the commit model (from the
> perspective of what I'm trying to accomplish).
> Looking at GetFTP (via FetchFileTransfer.java), session.commit() occurs
> before deleting or renaming the source file. So, if that step fails the file
> will be retrieved and processed again and again.
> PutSQL appears to have similar issues as it relates to updating a database
> more than once should the transfer die before the db commit is recognized by
> NiFi, so the the FlowFile(s) get rolled back.
>
> Are my needs outside of Nifi's objectives?
>

Mime
View raw message