nifi-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Joe Witt <joe.w...@gmail.com>
Subject Re: GetSFTP backpressure question
Date Fri, 28 Oct 2016 11:53:02 GMT
Great questions and discussion points here and I agree with your
statement about the importance of honoring back pressure targets the
user believes they set.

The way back pressure works is that before a processor is given a
thread to execute (each onTrigger cycle) the framework checks all
possible output relationships and ensures that at that moment in time
all of them have space available according to the limits set on those
connection (size or number of things).  Once that processor is given
the thread to execute its onTrigger cycle it is up to that processor
to be a good steward and the framework does offer a method for that
processor to check if all destinations have space available which is
important if for efficiency reasons it chooses to do more than one
thing at a time.  The processor doesn't get to know how close or how
full the queues are that it writes to so that is important to
understand as well.  To the processor the destinations are either full
or have space available.

This sort of back pressure is an optimistic approach and really means
these are enforced as soft limits and as you point out can be exceeded
in some cases.  It basically means that the back pressure target can
be exceeded by however much data could be produced by a processor in a
single execution cycle once it is given a thread.

I believe the user's expectation is well articulated via the current
mechanism of setting the max values on the connections and it is then
important that processors get written or improved to better honor that
or that they document for the user under what conditions they could
exceed the backpressure target.

Thanks
Joe

On Fri, Oct 28, 2016 at 7:30 AM, Joe Gresock <jgresock@gmail.com> wrote:
> I have a NiFilosophical question that came up when I had a GetSFTP
> processor running to a back-pressured connection.
>
> My GetSFTP is configured with max selects = 100, and the files in the
> remote directory are nearly 1GB each.  The queue has a backpressure of 2GB,
> and I assumed each run of GetSFTP would stop feeding files once it hit
> backpressure.
>
> I was initially puzzled when I started periodically seeing huge backlogs
> (71GB) on each worker in the cluster in this particular queue, until I
> looked at the queued count/bytes stats (very useful tool, btw):
>
> Queued bytes statistics <https://imagebin.ca/v/301KDHEa1lCk>
> Queued count statistics <https://imagebin.ca/v/301JqnUcGXLF>
>
> Now it's evident that GetSFTP continues to emit files until it hits the max
> selects, regardless of backpressure.  I think I understand why backpressure
> couldn't necessarily trump this behavior (e.g., what if a processor needed
> to emit a query result set in batches.. what would you do with the flow
> files it wanted to emit if you suddenly hit backpressure?)
>
> So my questions are:
> - Do you think it's the user's responsibility to be aware of cases when
> backpressure is overridden by a processor's implementation?  I think this
> is important to understand, because backpressure is usually in place to
> prevent a full disk, which is a fairly critical requirement.
> - Is there something we can do to document this so it's more universally
> understood?
> - Perhaps the GetSFTP Max Selects property can indicate that it will
> override backpressure?  In which case, are there other processors that
> would need similar documentation?
> - Or do we want a more universal approach, like putting this caveat in the
> general documentation?
>
> Joe
>
> --
> I know what it is to be in need, and I know what it is to have plenty.  I
> have learned the secret of being content in any and every situation,
> whether well fed or hungry, whether living in plenty or in want.  I can do
> all this through him who gives me strength.    *-Philippians 4:12-13*

Mime
View raw message