nifi-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mark Payne <marka...@hotmail.com>
Subject Re: Need help understanding backpressure
Date Tue, 19 Apr 2016 12:44:44 GMT
Chris,

I like the idea of providing a way to enforce backpressure based on how full the content or
FlowFIle repository is.
I would imagine that this would be something that we would also configure on a connection,
just like the other backpressure
is configured, so that we could allow, for example, more "important" or more time-sensitive
data to come into the flow
even if the repository is 90% full whereas other data may not be allowed to enter once the
repo hits 60% full, so that we
ensure that we have room for the other data.

Is this what you had in mind?

Thanks
-Mark


> On Apr 15, 2016, at 2:38 PM, McDermott, Chris Kevin (MSDU - STaTS/StorefrontRemote) <chris.mcdermott@hpe.com>
wrote:
> 
> Thanks for the clarification and explanation of the design philosophy. It does make sense.
I think it comes down to me trying to use back pressure for a purpose for which it was not
designed.
> 
> What if there was a way to configure a processor to be paused based on available disk
space dropping below some threshold.  That way ingress processors, as identified by the user,
could be prevented from flooding the system with too much data.  Thoughts?
> 
> Chris
> 
> 
> 
> 
> On 4/15/16, 1:12 PM, "Mark Payne" <markap14@hotmail.com> wrote:
> 
>> Chris,
>> 
>> When you apply backpressure to that connection, it will cause the processor that
is
>> the source of the connection to stop being scheduled to run until the queue clears
out.
>> However, as you noted, data will still queue up in that processor's incoming connections.
>> So to force backpressure to propagate all the way back to the source, you would
>> need to configure each of the connections in the flow to have backpressure applied.
>> 
>> The reason behind this is that we can have many different source, each routing data
to
>> many different destinations. So if the queue before a 'terminal processor' is filled,
>> we won't want to prevent data from coming in from some source if only some portion
of
>> that data will go to that processor.
>> 
>> For example, consider the following flow:
>> 
>> A --> B --> C --> D
>>                 ^
>> E --> F -----|
>>                 v
>>                 G
>> 
>> Where Processor A sends 100% of data to B and then C and D.
>> Maybe only 1% of data from Processor E makes its way to D, though,
>> and 99% of its data goes to G instead.
>> 
>> If the queue from C to D fills up, we may not want to stop the data flowing
>> in from E because most of its data is going to G. Or we may want to stop data
>> coming in from E only if the queue from F to C backs up to say 100,000 FlowFiles.
>> 
>> By ensuring that backpressure is applied only to that one connection, we can leverage
>> this to control which sources stop bringing in data when.
>> 
>> Hopefully this provided some clarification of how this works and why it was done
this way
>> rather than confusing you more :)
>> 
>> However, I can see the benefit in setting a backpressure threshold only once. And
I think
>> there are a couple of possible improvements here:
>> 
>> (1) We could allow the user to select multiple connections and then configure backpressure

>> and have that applied to all selected connections.
>> 
>> (2) We could allow the user to set the backpressure and indicate that it should be
propagated back
>> to all upstream connections.  This feels a little more dangerous, though, because
it would be easy
>> to change configurations inadvertently.
>> 
>> Hopefully this help!
>> 
>> Thanks
>> -Mark
>> 
>> 
>> 
>>> On Apr 15, 2016, at 12:52 PM, McDermott, Chris Kevin (MSDU - STaTS/StorefrontRemote)
<chris.mcdermott@hpe.com> wrote:
>>> 
>>> Can anyone point me to some documentation, or just explain to me, how back pressure
is supposed to work.
>>> 
>>> I am trying to limit the amount of storage used for queued files in my flow.
 To that end I have a connection near the end of the flow that I’ve put a limit on.  When
that limit is reached I assumed that back pressure would limit the output of the processors
all the way back up stream.  I find that that is not the case and large numbers of files are
being queued in upstream connections.
>>> 
>>> Given this can someone explain how back pressure can be employed to achieve my
goal of limiting storage usage for in flight files?
>>> 
>>> Thanks,
>>> Chris
>> 


Mime
View raw message