nifi-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Joseph Niemiec (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (MINIFI-356) Create repository failure policy
Date Tue, 01 Aug 2017 12:47:00 GMT

    [ https://issues.apache.org/jira/browse/MINIFI-356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16108832#comment-16108832
] 

Joseph Niemiec commented on MINIFI-356:
---------------------------------------

1) While I have been using it I do think we are so early on that my experiences are not going
to be reminiscent. I used alot of shell scripts, in time one would hope we use more native
processors. I agree we need to remain skeptical of how it will be used until we have more
light from many. 
2) That makes sense. I guess my time in Hadoop leads my IO Wait question to be more of a 'partial
failure' that leads to a worse state then a total failure. But for now it makes sense to limit
the scope to a simple containable idea so I like the FS-API coupling to trigger volatile storage.

3) I see you opened MINIFI-360 so we can talk about the OOMness there. I guess the question
now is it a % of available memory that each connection gets in this volatile mode.  Or more
to your point do we need to decide now or can we wait for more c2 metrics to roll in on memory
use?  

My concern with watching minifi devices is that we are no doubt going to venture into many
devices of many differing use cases and will what we collect be of any use? The rabbit hole
starts to run deep when we consider 1 sensor on a device may have more significance than another;
this in my mind leads to having to assign priority or more explicit connection %'s for the
volatile storage to ensure that this high value sensor is getting saved and we ignore the
other sensors. Now if we wanted to just call it a limited degraded mode of operation then
perhaps we dont need to run down this hole. 

Its probably time to email thread this one :D you wanna start it as you opened the Jira? 
###
Not to change direction but perhaps we should classify the failure modes as i could see wanting
to do this for more than just failed storage. Just a quick brainstorm on possible ones -
* Complete Failure - No Notice
* Partial Unknown Failure - Notice but Stop Processing
* Partial Disk Failure - Notice and Continue with Limited Volatile Storage
* Partial Network Failure -  What is this? Failure to send via the API a number of times.
We could have a failure policy to try to batch more per send, or compress aggressively before
sending? Or send using REED SOLOMON Encoding...  
* Partial CPU Failure - Could reduce concurrency of processors above 1 ? 
* Partial Ram Failure - Could resize connections, use swap? 



> Create repository failure policy
> --------------------------------
>
>                 Key: MINIFI-356
>                 URL: https://issues.apache.org/jira/browse/MINIFI-356
>             Project: Apache NiFi MiNiFi
>          Issue Type: Improvement
>          Components: C++
>            Reporter: marco polo
>            Assignee: marco polo
>              Labels: Durability
>             Fix For: cpp-0.3.0
>
>
> Create a failure policy for continuing operations if a repo failure occurs. 
> I.e. If writing to disk fails above a threshold ( 100 % for example ), we can move to
a volatile repo where we can continue operations and report that we have a failure. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message