nifi-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Simon Lucy <simon.l...@gravit.as>
Subject Re: Kafkaesque Output port
Date Thu, 16 Mar 2017 13:29:29 GMT
There are two different classes of queues really and you can't mix them 
semantically.

The pubsub model

  * where ordering isn't guaranteed,
  * messages may appear at least once but can be duplicates
  * messages need to be explicitly deleted or aged
  * messages may or may not be persisted

The event model

  * ordering is necessary and guaranteed
  * messages appear only once
  * once read a message is discarded from the queue
  * messages are probably persisted

Kafka can be used for both models but Nifi Output Ports are meant to be 
Event Queues. What you could do though is have an external message 
broker connect to the Output Port and distribute that to subscribers. It 
could be Kafka, Rabbit, AMQ AWS's SQS/SNS whatever makes sense in the 
context.

There's no need to modify or extend Nifi then.

S



> Aldrin Piri <mailto:aldrinpiri@gmail.com>
> Thursday, March 16, 2017 1:07 PMvia Postbox 
> <https://www.postbox-inc.com/?utm_source=email&utm_medium=sumlink&utm_campaign=reach>
> Hey Andre,
>
> Interesting scenario and certainly can understand the need for such
> functionality. As a bit of background, my mind traditionally goes to
> custom controller services used for referencing datasets typically served
> up via some service. This means we don't get the Site to Site goodness and
> may be duplicating effort in terms of transiting information. Aspects of
> this need are emerging in some of the initial C2 efforts where we have 
> this
> 1-N dispersion of flows/updates to instances, initial approaches are
> certainly reminiscent of the above. I am an advocate for Site to Site
> being "the way" data transport is done amongst NiFi/MiNiFi instances and
> this is interlaced, in part, with the JIRA to make this an extension point
> [1]. Perhaps, more simply and to frame in the sense of messaging, we have
> no way of providing topic semantics between instances and only support
> queueing whether that is push/pull. This new component or mode would be
> very compelling in conjunction with the introduction of new protocols each
> with their own operational guarantees/caveats.
>
> Some of the first thoughts/questions that come to mind are:
> * what buffering/age off looks like in context of a connection. In part,
> I think we have the framework functionality already in place, but requires
> a slightly different though process and context.
> * management of progress through the "queue", for lack of a better word,
> on a given topic and how/where that gets stored/managed. this would be the
> analog of offsets
> * is prioritization still a possibility? at first blush, it seems like
> this would no longer make sense and/or be applicable
> * what impact does this have on provenance? seems like it would still map
> correctly; just many child send events for a given piece of data
> * what would the sequence of receiving input port look like? just use run
> schedule we have currently? Presumably this would be used for updates, so
> I schedule it to check every N minutes and get all the updates since then?
> (this could potentially be mitigated with backpressure/expiration
> configuration on the associated connection).
>
> I agree there is a certain need to fulfill that seems applicable to a
> number of situations and finding a way to support this general data
> exchange pattern in a framework level would be most excellent. Look
> forward to discussing and exploring a bit more.
>
> --aldrin
>
> [1] https://issues.apache.org/jira/browse/NIFI-1820
>
>
> Andre <mailto:andre-lists@fucs.org>
> Thursday, March 16, 2017 12:27 PMvia Postbox 
> <https://www.postbox-inc.com/?utm_source=email&utm_medium=sumlink&utm_campaign=reach>
> dev,
>
> I recently created a demo environment where two remote MiNiFi 
> instances (m1
> and m2) were sending diverse range of security telemetry (suspicious email
> attachments, syslog streams, individual session honeypot logs, merged
> honeypot session logs, etc) from edge to DC via S2S Input ports
>
> Once some of this data was processed at the hub I then used Output 
> ports to
> send contents back to the spokes, where the minifi instances use the
> flowfiles contents as arguments of OS commands (called via Gooovy
> String.execute().text via ExecuteScript).
>
> The idea being to show how NiFi can be used in basic security 
> orchestration
> (in this case updating m1's firewall tables with malicious IPs observed in
> m2 and vice versa).
>
>
> While crafting the demo I noticed the Output ports operate like queues,
> therefore if one client consumed data from the port, the other was unable
> to obtain the same flowfiles.
>
> This is obviously not an issue when using 2 minifi clients (where I can
> just create another output port and clone to content) but wouldn't flow
> very well with hundred of clients.
>
> I wonder if anyone would have a suggestion of how to achieve a N to 1
> Output port like that? And if not, I wonder if we should create one?
>
> Cheers
>

-- 
Simon Lucy
Technologist
G30 Consultants Ltd
+44 77 20 29 4658
<https://www.postbox-inc.com/?utm_source=email&utm_medium=siglink&utm_campaign=reach>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message