nifi-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mark Payne <marka...@hotmail.com>
Subject Re: NiFi code re-use
Date Sun, 13 May 2018 20:19:35 GMT
So I think we have a lot of different concepts going on here. I’ll try to provide my thoughts
on each one as I’ve spent a good bit of timing thinking about each of them over the last
year or two :)

Wormhole connections: these would be very nice to have because it would allow us to avoid
having lots of ports to go further up and down the stacks of process groups. But I don’t
know that it would adequate scratch the itch for functional groups.

Functional groups: I was very gung-ho about implementing these a while back. Then I realized
2 really big issues with this. Firstly, if one group suddenly floods the functional group
with data, then it can cause backlogs that could hinder processing of the rest of the flow,
though they are otherwise completely independent. Not the end of the world and similar to
how a single microservice, if overwhelmed would do the same thing in a microservice architecture.
More importantly is the idea of “what happens if we try to merge data?” So a MergeRecord
processor, for instance. It’s not a 1-in-1-out type of thing. Can flowfiles from different
sources be merged? Should we allow merging at all? I would be a bit worried that this would
lead to a lot of confusion. Doesn’t mean that it can’t be done but we would have to figure
out what the semantics are for such a thing and how that would be conveyed clearly in the
UI.

Load-Balanced connections (aka spread the flowfiles across all nodes in the cluster on a given
connection): very much agree and think we should do this.

Non-root-group remote ports: absolutely agree that this is a good idea and we should do this
as well.

Auto Updates of flows from flow registry: definitely all for this as well. I believe that
if we do this, then it would subsume the need for the functional groups and would be much
easier to understand and configure from the UI. It would also provide far more power and flexibility
by providing the ability to upgrade all instances of a flow across many different clusters
if desired, not just the cluster that you’re working on.

Hopefully this provides some color into some of the design choices that have been made and
will help to spur more thoughts on the subjects.

-Mark

Sent from my iPhone

On May 13, 2018, at 3:32 PM, Ed B <bdesert@gmail.com<mailto:bdesert@gmail.com>>
wrote:

Joe, Aldrin,
Wormholes is pretty interesting thing. I played around with that and could make it working.
Though, this approach has downsides.
I'll create an article for this, but you can take a look at it now (attaching template for
root canvas).

So, what I've found while playing around this topic, is that removing restrictions for remote
input/output port being on root canvas only would be nice, but not sufficient.
When we distribute flowfiles over the nodes within the same cluster - we need to make it easy
to indicate, so RPG will be using properties of the cluster, instead of manually provided
ones. I would even go further to add distribution capabilities on relationship level. That
would really reduce amount of entities we put into our flows, and reduce complexity.


On Sun, May 13, 2018 at 1:20 PM Aldrin Piri <aldrinpiri@gmail.com<mailto:aldrinpiri@gmail.com>>
wrote:
I think what you highlighted is kind of how I had it worked out in my
mind.  Although maybe I read too much into the description of the proposal
about the framework managing context.  In terms of what we have now, I
think I pictured this to be "Tag this data as from this source" and then
when leaving such a group, the framework would send it back to that "tag."

I will avoid showing my blissful ignorance of all the internals by saying
how it could work but will try to draw the analogs from functionality
currently in place.  I imagined feeding the reference-able group similar to
a virtual funnel of sorts where we use framework knowledge of the
connection to it (and perhaps said connection's source) to track that state
in shipping it back via some slightly smarter port that is, in effect, a
router back to virtual ports (wormholes?) to where the data came from.  Or,
perhaps, in more concrete terms:

We have
* a Process Group has several input ports (source processors),
* that all feed an UpdateAttribute which tags each flowfile as the source
via EL,
* carry out the functions of the referenceable group,
* with the end of this "block" feeding a  RouteOnAttribute on this tag to
an equivalent number of output ports.

On Sun, May 13, 2018 at 12:20 PM, Joe Witt <joe.witt@gmail.com<mailto:joe.witt@gmail.com>>
wrote:

> Aldrin
>
> Referencable groups would have to work like a single instance of a PG in
> terms of flow definition but caller specific instances in reality.
> Otherwise youd have no way to avoid cross contaminating flowfiles from
> various callers as thered be no caller specific stack (in our case caller
> specific queues and other resources).
>
> The point about keeping versions of instances up to date with registry
> based versioned instances is true but can be addressed with auto updating
> instances of versioned flows which we will need to add anyway.
>
> In either case having PG operate like a callable function reusable across
> flows will likely need to operate as mentioned above.  The former being
> less consistent with the user experience and more work than the latter.
>
> Do you see some other way to make referencable groups work.
>
> Wormhole connections need to be implemented for sure to help keep flows
> concise.
>
> Thanks
> Joe
>
> On Sun, May 13, 2018, 11:42 AM Aldrin Piri <aldrinpiri@gmail.com<mailto:aldrinpiri@gmail.com>>
wrote:
>
> > I think the Registry solves part of the issue but even that would lead to
> > duplication of units where we are "copying and pasting" the "code."
> > Versioning would aid in keeping all components in lock step, but will not
> > remedy manual intervention with n-many instances of them.  After one was
> > altered, there would still be the manual process where the PGs would each
> > need to be updated when that change was committed and changes were
> realized
> > after some time delta.
> >
> > I think the previously discussed Reference-able Process Groups [1] are
> > likely better aligned in conjunction with the Wormhole Connections [2].
> >
> > [1] https://cwiki.apache.org/confluence/display/NIFI/
> > Reference-able+Process+Groups
> > [2] https://cwiki.apache.org/confluence/display/NIFI/Wormhole+Co
> nnections
> >
> >
> >
> > On Sat, May 12, 2018 at 10:19 PM, Joe Witt <joe.witt@gmail.com<mailto:joe.witt@gmail.com>>
wrote:
> >
> > > Scott
> > >
> > > Youre very right there must be a better way.  The flow registry with
> > > versioned flows is the answer.  You can version control the common
> logic
> > > and reuse it in as many instances as you need.
> > >
> > > This is like having a flow Class in java terms where you can
> instantiate
> > as
> > > many objects of that type Class you need.
> > >
> > > It was definitely a long missing solution that was addressed in nifi
> > 1.5.0
> > > and with the flow registry.
> > >
> > > Also, we should just remove the root group remote port limitation.  It
> > was
> > > an implementation choice long before we had multi tenant auth and now
> it
> > no
> > > longer makes sense to force root group only.  Still though the above
> > > scenario of versioned flows and the flow registry solves the main
> > problem.
> > >
> > >
> > > thanks
> > >
> > > On Sat, May 12, 2018, 9:22 PM Charlie Meyer <
> > > charlie.meyer@civitaslearning.com<mailto:charlie.meyer@civitaslearning.com>>
wrote:
> > >
> > > > We do this often by leveraging the variable registery and the
> > expression
> > > > language to make components be more dynamic and reusable
> > > >
> > > > -Charlie
> > > >
> > > > On Sat, May 12, 2018, 20:01 scott <tcots8888@gmail.com<mailto:tcots8888@gmail.com>>
wrote:
> > > >
> > > > > Hi Devs,
> > > > >
> > > > > I've got a question about an observation I've had while working
> with
> > > > > NiFi. Is there a better way to re-use process groups similar to how
> > > > > programming languages reference functions, libraries, classes, or
> > > > > pointers. I know about remote process groups and templates, but
> > neither
> > > > > do exactly what I was thinking. RPGs are great, but I think the
> > output
> > > > > goes to the root canvas level, and you have to have have connectors
> > all
> > > > > the way back up your flow hierarchy, and that's not practical.
> > > > > Ultimately, I'm looking for an easy way to re-use process groups
> that
> > > > > contain common logic in many of my flows, so that I reduce the
> amount
> > > of
> > > > > places I have to change.
> > > > >
> > > > > Hopefully that made sense. Appreciate your thoughts.
> > > > >
> > > > > Scott
> > > > >
> > > > >
> > > >
> > >
> >
>
<Wormholes_in_NIFI.xml>
Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message