nifi-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tijo Thomas <tijopara...@gmail.com>
Subject Re: enforce run only in promary node $ multiple primary node
Date Mon, 07 Nov 2016 17:26:02 GMT
Hi Mark,

Some how I missed this mail . We have implemented it on the similar lines
 based on 1.0  code base.
But there are some contention which happened on  zookeeper side.  We will
get back to the community once it is stabilised as we have release pressure
now.

I will keep you posted on this by end of this week .

Thanks & Regards
Tijo Thomas

On Tue, Oct 4, 2016 at 6:54 PM, Mark Payne <markap14@hotmail.com> wrote:

> Tijo,
>
> Sure, I would be happy to elaborate some. Sorry it's taken me a while to
> get back to you.
>
> The idea would be to create some "named thing." Let's call it the
> Processing Locality.
> Perhaps a better name can be used, but I'll use this term for this email.
>
> The idea is that through the UI, a user with appropriate permissions is
> able to create a new
> Processing Locality. Once created, a user can go to a Processor's
> configuration and go to
> the Scheduling Tab. Currently, there are 3 options available for the
> Scheduling Strategy:
> Timer-Driven (always available), Event-Driven (available for some
> processors), and
> Primary Node (available when running in clustered mode).
>
> My proposal is that we first remove the Primary Node scheduling strategy,
> so that we have
> only two scheduling strategies: Timer-Driven and Event-Driven. We then add
> a Processing
> Locality field to the Scheduling tab. The available options would be "All
> Nodes" (which would
> be the default) or any of the named Processing Localities that users have
> added. For backward
> compatibility purposes, we would always have a "Primary Node" Processing
> Locality.
>
> If a Processing Locality other than "All Nodes" is selected, then the
> processor would run only on
> a single node, just as Primary Node works today. The difference, though,
> is that all nodes that have
> the same Processing Locality would run on the same node but processors
> with a different
> Processing Locality would potentially run on a different node. Which node
> a given Processing Locality
> is run on would be determined via ZooKeeper, just as Primary Node is. This
> allows us automatic
> failover if the node running a specific Processing Locality fails.
>
> For example, say we have 5 Processors: A, B, C, D, E, and F. And we have 3
> Processing Localities:
> Locality 1, Locality 2, Locality 3.
> We configure B and E to run at Locality 1, A and C to run at Locality 2,
> and D to run at Locality 3.
>
> Now we know that Processor B and E will run on the same node. Processors A
> and C will run on
> the same node. It's possible that B, E, A, and C will all run on the same
> node (if one node is elected
> to run both Locality 1 and Locality 2). Or they may be different nodes.
> But we know that B & E will
> run on the same node and A & C will run on the same node. Processor D is
> again in its own Processing
> Locality, so it may run on any given node. But if another Processor is
> added and configured to run on
> Processing Locality 3, it will definitely be co-located with Processor D.
>
> Does all of this sound reasonable to you and make sense? Would love to
> hear any ideas that you or
> the others on your team have!
>
> Thanks
> -Mark
>
>
>
> > On Oct 1, 2016, at 1:43 AM, Tijo Thomas <tijoparacka@yahoo.in.INVALID>
> wrote:
> >
> > Hi Mark , In your earlier mail you have mention about some appoach on
> named grouping construct.  Is it possible to discuss further about this.
> > I am thinking of some thing like a  node labeling concept in Yarn  . If
> Nifi can support this feature it will be good.  Me and my team is willing
> to contribute if we can implement this feature.
> > Please let me know your opinion.
> > Thanks & RegardsTijo Thomas
> >
> >    On Wednesday, 21 September 2016 10:20 PM, Tijo Thomas
> <tijoparacka@yahoo.in.INVALID> wrote:
> >
> >
> > Mark,
> > Changing the concept of "Run on Primary Node" to " Run on Only one node"
> will not solve the problem .  Name Grouping constructs would be better
> option .
> >
> > Nijel,
> > Our usecase is also similar.  We have many tasks to run only in one node
> and wanted to distribute the load . If we can have a list of primary node
> to distribute the load it will solve our problem .
> >
> > Tijo
> >
> >     On Wednesday, 21 September 2016 6:01 PM, "markap14@hotmail.com" <
> markap14@hotmail.com> wrote:
> >
> >
> > Nijel,
> >
> > I'd like to hear more about your use case, as from the description
> given, I'm not sure that this all would need to run on a primary node.
> Generally, you want only "source processors" to run on primary node.
> >
> > One thing that I've been thinking about, though, is changing the concept
> of "Run on Primary Node" to a "Run on Only One Node." The concern there is
> that we will have cases where a few processors have to run on the same
> node. So we would need a mechanism for supporting that. Perhaps some sort
> of named grouping construct.
> >
> > Thoughts?
> >
> > Sent from my iPhone
> >
> >> On Sep 21, 2016, at 5:07 AM, Nijel s f <nijel.sf@huawei.com> wrote:
> >>
> >> Hi all
> >>
> >>                 Supporting to Tijo’s thought, have one scenario.
> >>
> >> we are trying to use Nifi for a data pipeline solution. The scenario is
> to coordinate between various services and provide a solution for big data
> analysis
> >>                 In our scenario many of the activities are kind of "run
> on primary" mode processors. These are being implemented on top of various
> components like Yarn, Hbase, Spark, DB etc.
> >>
> >>                 One issue we are seeing is all these processors to be
> run on primary node  [like spark execution, yarn/mr job execution etc.. ]
> and it is only one.
> >>                 We are thinking of having multiple primary nodes and
> assign the activities using some distribution algorithm.
> >>                 The idea is to handle the coordination and failover
> mechanism using zookeeper.
> >>
> >>                 Any thoughts on this ?
> >>
> >> Regards
> >> Nijel
> >>
> >> From: Jeff [mailto:jtswork@gmail.com]
> >> Sent: Monday, September 19, 2016 11:17 PM
> >> To: Tijo Thomas; users@nifi.apache.org
> >> Subject: Re: enforce run only in promary node $ multiple primary node
> >>
> >> Tijo,
> >>
> >> To give you some information on your second question, you can design
> your flow to redistribute the flowfiles coming out of your processors to
> other nodes in the cluster for processing.  There are several examples on
> how this on various blogs/email lists/etc, and I just grabbed one for
> reference, written by Apache NiFi's own Bryan Bende:
> http://apache-nifi.1125220.n5.nabble.com/How-to-configure-
> site-to-site-communication-between-nodes-in-one-cluster-td8528.html
> >>
> >> Please review that thread and let us know if you have further questions!
> >>
> >> On Mon, Sep 19, 2016 at 1:19 PM Tijo Thomas <tijoparacka@yahoo.in
> <mailto:tijoparacka@yahoo.in>> wrote:
> >>
> >> Hi ,
> >>
> >> 1. While writing a processor is it possible to enforce to run only in
> primary node. I saw a Jira for this but appears to unresolved.
> >>
> >> [NIFI-543] Provide extensions a way to indicate that they can run only
> on primary node, if clustered - ASF JIRA<https://issues.apache.
> org/jira/browse/NIFI-543>
> >>
> >>
> >>
> >>
> >>
> >> [NIFI-543] Provide extensions a way to indicate that they can run only
> on p...
> >>
> >>
> >>
> >>
> >> 2. Currently my Primary node is heavily loaded  as i have many
> processor which will run only in Primary node.  Is it possible to define
> multiple primary nodes . or is it possible to configure processors not to
> run in primary node.
> >>
> >> Tijo
> >
> >
> >
> >
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message