nifi-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Joseph Niemiec <josephx...@gmail.com>
Subject Re: [DISCUSS] Increasing durability in MiNiFi C++
Date Tue, 01 Aug 2017 15:02:23 GMT
I feel that MINIFI-356 is pretty key in all things, when I think of jagged
edge use cases that are missing connectivity for days but have large mass
storage devices this feels really limiting.  When I consider the variety of
devices I have tested with thus far most of them only have a single storage
media mount. RasPi's right now seem to be the de-typical entry IoT and most
are only using the single media mount this to me represents an area where
even operating in degraded mode won't help us as the OS will fail on its
own eventually without its disk.

With that said is it more valuable to use the storage media we have
initially then it is to find a way to run without it?


No doubt there are other scenarios where this is very useful and I see more
of them initially in the 'non-jagged' space. For example a factory line PC
within the Enterprise network is always connected, it may never experience
backpressure soley because it can send as fast as it collects the data. If
we assume that the OS disks and Repo disks are not the same, and the repo
did fail there would be value in continuing to operate collecting and
sending data, but for all intents we dont care about backpressure here
becuase we can still send it as fast as its collected.

~~Kevins' Response's

2. Logging and readme documentation will be important to assist
troubleshooting / debugging. If an agent is configured to use a persistent
repository, and it has degraded to a volatile repository, that could be
really confusing to a novice user/admin who is trying to figure out how the
agent is working. Therefore we need to make sure changes to agent behavior
that occur as part of continuing operations are logged at some level.

I would also expect initially its default off, and has to be manually
enabled.


3. Testing

Just intally thinking I can re-use a RasPi but attach an ESATA, a hard
failure of removing the drive itself, or unmounting it at the OS level may
do this. While leaving the OS drive (SD card) still plugged in.



On Tue, Aug 1, 2017 at 9:59 AM, Marc <phrocker@apache.org> wrote:

> Good Morning,
>
>   I've begun capturing some details in a ticket for durability and
> reliability of MiNiFi C++ clients [1]. The scope of this ticket is
> continuing operations despite failure within specific components. There is
> a linked ticket [2] attempts to address some of the concerns brought up in
> MINIFI-356, focusing no memory usage.
>
>   The spirit of the ticket was meant to capture conditions of known
> failure; however, given that more discussion has blossomed, I'd like to
> assess the experience of the mailing list. Continuing operations in any
> environment is difficult, particularly one in which we likely have little
> to no control. Simply gathering information to know when a failure is
> occurring is a major part of the battle. According to the tickets, there
> needs to be some discussion of how we classify failure.
>
>   The ticket addressed the low hanging fruit, but there are certainly more
> conditions of failure. If a disk switches to read/write mode, disks becomes
> full and/or out of inode entries etc, we know a complete failure occurred
> and thus can switch our type of write activity to use a volatile repo. I
> recognize that partial failures may occur, but how do we classify these?
> Should we classify these at all or would this be venturing into a rabbit
> hole?
>
>    For memory we can likely throttle queue sizes as needed. For networking
> and other components we could likely find other measures of failure. The
> goal, no matter the component, is to continue operations without human
> intervention -- with the hope that the configuration makes the bounds of
> the client obvious.
>
>    My gut reaction is to separate partial failure as the low hanging fruit
> of complete failure is much easier to address, but would love to hear the
> reaction of this list. Further, any input on the types of failures to
> address would be appreciated. Look forward to any and all responses.
>
>   Best Regards,
>   Marc
>
> [1] https://issues.apache.org/jira/browse/MINIFI-356
> [2] https://issues.apache.org/jira/browse/MINIFI-360
>



-- 
Joseph

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message