Mailing-List: contact commits-help@nifi.apache.org; run by ezmlm
Precedence: bulk
Reply-To: dev@nifi.apache.org
Date: Tue, 1 Aug 2017 12:47:00 +0000 (UTC)
From: "Joseph Niemiec (JIRA)" <jira@apache.org>
To: commits@nifi.apache.org
Message-ID: <JIRA.13090846.1501264651000.61098.1501591620320@Atlassian.JIRA>
In-Reply-To: <JIRA.13090846.1501264651000@Atlassian.JIRA>
References: <JIRA.13090846.1501264651000@Atlassian.JIRA> <JIRA.13090846.1501264651569@jira-lw-us.apache.org>
Subject: [jira] [Commented] (MINIFI-356) Create repository failure policy
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit
archived-at: Tue, 01 Aug 2017 12:47:05 -0000


    [ https://issues.apache.org/jira/browse/MINIFI-356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16108832#comment-16108832 ] 

Joseph Niemiec commented on MINIFI-356:
---------------------------------------

1) While I have been using it I do think we are so early on that my experiences are not going to be reminiscent. I used alot of shell scripts, in time one would hope we use more native processors. I agree we need to remain skeptical of how it will be used until we have more light from many. 
2) That makes sense. I guess my time in Hadoop leads my IO Wait question to be more of a 'partial failure' that leads to a worse state then a total failure. But for now it makes sense to limit the scope to a simple containable idea so I like the FS-API coupling to trigger volatile storage. 
3) I see you opened MINIFI-360 so we can talk about the OOMness there. I guess the question now is it a % of available memory that each connection gets in this volatile mode.  Or more to your point do we need to decide now or can we wait for more c2 metrics to roll in on memory use?  

My concern with watching minifi devices is that we are no doubt going to venture into many devices of many differing use cases and will what we collect be of any use? The rabbit hole starts to run deep when we consider 1 sensor on a device may have more significance than another; this in my mind leads to having to assign priority or more explicit connection %'s for the volatile storage to ensure that this high value sensor is getting saved and we ignore the other sensors. Now if we wanted to just call it a limited degraded mode of operation then perhaps we dont need to run down this hole. 

Its probably time to email thread this one :D you wanna start it as you opened the Jira? 
###
Not to change direction but perhaps we should classify the failure modes as i could see wanting to do this for more than just failed storage. Just a quick brainstorm on possible ones -
* Complete Failure - No Notice
* Partial Unknown Failure - Notice but Stop Processing
* Partial Disk Failure - Notice and Continue with Limited Volatile Storage
* Partial Network Failure -  What is this? Failure to send via the API a number of times. We could have a failure policy to try to batch more per send, or compress aggressively before sending? Or send using REED SOLOMON Encoding...  
* Partial CPU Failure - Could reduce concurrency of processors above 1 ? 
* Partial Ram Failure - Could resize connections, use swap? 


> Create repository failure policy
> --------------------------------
>
>                 Key: MINIFI-356
>                 URL: https://issues.apache.org/jira/browse/MINIFI-356
>             Project: Apache NiFi MiNiFi
>          Issue Type: Improvement
>          Components: C++
>            Reporter: marco polo
>            Assignee: marco polo
>              Labels: Durability
>             Fix For: cpp-0.3.0
>
>
> Create a failure policy for continuing operations if a repo failure occurs. 
> I.e. If writing to disk fails above a threshold ( 100 % for example ), we can move to a volatile repo where we can continue operations and report that we have a failure. 


--
This message was sent by Atlassian JIRA
(v6.4.14#64029)