Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 9530D200CD7 for ; Tue, 1 Aug 2017 14:47:05 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 9363D166C70; Tue, 1 Aug 2017 12:47:05 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id DA10C166C3C for ; Tue, 1 Aug 2017 14:47:04 +0200 (CEST) Received: (qmail 97469 invoked by uid 500); 1 Aug 2017 12:47:04 -0000 Mailing-List: contact commits-help@nifi.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@nifi.apache.org Delivered-To: mailing list commits@nifi.apache.org Received: (qmail 97460 invoked by uid 99); 1 Aug 2017 12:47:04 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd4-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 01 Aug 2017 12:47:04 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd4-us-west.apache.org (ASF Mail Server at spamd4-us-west.apache.org) with ESMTP id 8B1DCC00CB for ; Tue, 1 Aug 2017 12:47:03 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd4-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -100.002 X-Spam-Level: X-Spam-Status: No, score=-100.002 tagged_above=-999 required=6.31 tests=[RP_MATCHES_RCVD=-0.001, SPF_PASS=-0.001, USER_IN_WHITELIST=-100] autolearn=disabled Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd4-us-west.apache.org [10.40.0.11]) (amavisd-new, port 10024) with ESMTP id yqgnQNs0qMRI for ; Tue, 1 Aug 2017 12:47:02 +0000 (UTC) Received: from mailrelay1-us-west.apache.org (mailrelay1-us-west.apache.org [209.188.14.139]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with ESMTP id 8A6225FCE2 for ; Tue, 1 Aug 2017 12:47:01 +0000 (UTC) Received: from jira-lw-us.apache.org (unknown [207.244.88.139]) by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id C0704E0DDB for ; Tue, 1 Aug 2017 12:47:00 +0000 (UTC) Received: from jira-lw-us.apache.org (localhost [127.0.0.1]) by jira-lw-us.apache.org (ASF Mail Server at jira-lw-us.apache.org) with ESMTP id 4ED8524651 for ; Tue, 1 Aug 2017 12:47:00 +0000 (UTC) Date: Tue, 1 Aug 2017 12:47:00 +0000 (UTC) From: "Joseph Niemiec (JIRA)" To: commits@nifi.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (MINIFI-356) Create repository failure policy MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Tue, 01 Aug 2017 12:47:05 -0000 [ https://issues.apache.org/jira/browse/MINIFI-356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16108832#comment-16108832 ] Joseph Niemiec commented on MINIFI-356: --------------------------------------- 1) While I have been using it I do think we are so early on that my experiences are not going to be reminiscent. I used alot of shell scripts, in time one would hope we use more native processors. I agree we need to remain skeptical of how it will be used until we have more light from many. 2) That makes sense. I guess my time in Hadoop leads my IO Wait question to be more of a 'partial failure' that leads to a worse state then a total failure. But for now it makes sense to limit the scope to a simple containable idea so I like the FS-API coupling to trigger volatile storage. 3) I see you opened MINIFI-360 so we can talk about the OOMness there. I guess the question now is it a % of available memory that each connection gets in this volatile mode. Or more to your point do we need to decide now or can we wait for more c2 metrics to roll in on memory use? My concern with watching minifi devices is that we are no doubt going to venture into many devices of many differing use cases and will what we collect be of any use? The rabbit hole starts to run deep when we consider 1 sensor on a device may have more significance than another; this in my mind leads to having to assign priority or more explicit connection %'s for the volatile storage to ensure that this high value sensor is getting saved and we ignore the other sensors. Now if we wanted to just call it a limited degraded mode of operation then perhaps we dont need to run down this hole. Its probably time to email thread this one :D you wanna start it as you opened the Jira? ### Not to change direction but perhaps we should classify the failure modes as i could see wanting to do this for more than just failed storage. Just a quick brainstorm on possible ones - * Complete Failure - No Notice * Partial Unknown Failure - Notice but Stop Processing * Partial Disk Failure - Notice and Continue with Limited Volatile Storage * Partial Network Failure - What is this? Failure to send via the API a number of times. We could have a failure policy to try to batch more per send, or compress aggressively before sending? Or send using REED SOLOMON Encoding... * Partial CPU Failure - Could reduce concurrency of processors above 1 ? * Partial Ram Failure - Could resize connections, use swap? > Create repository failure policy > -------------------------------- > > Key: MINIFI-356 > URL: https://issues.apache.org/jira/browse/MINIFI-356 > Project: Apache NiFi MiNiFi > Issue Type: Improvement > Components: C++ > Reporter: marco polo > Assignee: marco polo > Labels: Durability > Fix For: cpp-0.3.0 > > > Create a failure policy for continuing operations if a repo failure occurs. > I.e. If writing to disk fails above a threshold ( 100 % for example ), we can move to a volatile repo where we can continue operations and report that we have a failure. -- This message was sent by Atlassian JIRA (v6.4.14#64029)