nifi-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chakrader Dewaragatla <Chakrader.Dewaraga...@lifelock.com>
Subject Re: Nifi cluster features - Questions
Date Sun, 10 Jan 2016 20:41:44 GMT
I was able to get site-to-site work.
I tried to follow your instructions to send data distribute across the nodes.

GenerateFlowFile (On Primary) —> RPG
RPG —> Input Port   —> Putfile (Time driven scheduling)

However, data is only written to one slave (Secondary slave). Primary slave has not data.

Image screenshot :
http://tinyurl.com/jjvjtmq

From: Chakrader Dewaragatla <chakrader.dewaragatla@lifelock.com<mailto:chakrader.dewaragatla@lifelock.com>>
Date: Sunday, January 10, 2016 at 11:26 AM
To: "users@nifi.apache.org<mailto:users@nifi.apache.org>" <users@nifi.apache.org<mailto:users@nifi.apache.org>>
Subject: Re: Nifi cluster features - Questions

Bryan – Thanks – I am trying to setup site-to-site.
I have two slaves and one NCM.

My properties as follows :

On both Slaves:

nifi.remote.input.socket.port=10880
nifi.remote.input.secure=false

On NCM:
nifi.remote.input.socket.port=10880
nifi.remote.input.secure=false

When I try drop remote process group (with http://<NCM IP>:8080/nifi), I see error as
follows for two nodes.

[<Slave1 ip>:8080] - Remote instance is not allowed for Site to Site communication
[<Slave2 ip>:8080] - Remote instance is not allowed for Site to Site communication

Do you have insight why its trying to connecting 8080 on slaves ? When do 10880 port come
into the picture ? I remember try setting site to site few months back and succeeded.

Thanks,
-Chakri



From: Bryan Bende <bbende@gmail.com<mailto:bbende@gmail.com>>
Reply-To: "users@nifi.apache.org<mailto:users@nifi.apache.org>" <users@nifi.apache.org<mailto:users@nifi.apache.org>>
Date: Saturday, January 9, 2016 at 11:22 AM
To: "users@nifi.apache.org<mailto:users@nifi.apache.org>" <users@nifi.apache.org<mailto:users@nifi.apache.org>>
Subject: Re: Nifi cluster features - Questions

The sending node (where the remote process group is) will distribute the data evenly across
the two nodes, so an individual file will only be sent to one of the nodes. You could think
of it as if a separate NiFi instance was sending directly to a two node cluster, it would
be evenly distributing the data across the two nodes. In this case it just so happens to all
be with in the same cluster.

The most common use case for this scenario is the List and Fetch processors like HDFS. You
can perform the listing on primary node, and then distribute the results so the fetching takes
place on all nodes.

On Saturday, January 9, 2016, Chakrader Dewaragatla <Chakrader.Dewaragatla@lifelock.com<mailto:Chakrader.Dewaragatla@lifelock.com>>
wrote:
Bryan – Thanks, how do the nodes distribute the load for a input port. As port is open and
listening on two nodes,  does it copy same files on both the nodes?
I need to try this setup to see the results, appreciate your help.

Thanks,
-Chakri

From: Bryan Bende <bbende@gmail.com<javascript:_e(%7B%7D,'cvml','bbende@gmail.com');>>
Reply-To: "users@nifi.apache.org<javascript:_e(%7B%7D,'cvml','users@nifi.apache.org');>"
<users@nifi.apache.org<javascript:_e(%7B%7D,'cvml','users@nifi.apache.org');>>
Date: Friday, January 8, 2016 at 3:44 PM
To: "users@nifi.apache.org<javascript:_e(%7B%7D,'cvml','users@nifi.apache.org');>" <users@nifi.apache.org<javascript:_e(%7B%7D,'cvml','users@nifi.apache.org');>>
Subject: Re: Nifi cluster features - Questions

Hi Chakri,

I believe the DistributeLoad processor is more for load balancing when sending to downstream
systems. For example, if you had two HTTP endpoints,
you could have the first relationship from DistributeLoad going to a PostHTTP that posts to
endpoint #1, and the second relationship going to a second PostHTTP that goes to endpoint
#2.

If you want to distribute the data with in the cluster, then you need to use site-to-site.
The way you do this is the following...

- Add an Input Port connected to your PutFile.
- Add GenerateFlowFile scheduled on primary node only, connected to a Remote Process Group.
The Remote Process Group should be connected to the Input Port from the previous step.

So both nodes have an input port listening for data, but only the primary node produces a
FlowFile and sends it to the RPG which then re-distributes it back to one of the Input Ports.

In order for this to work you need to set nifi.remote.input.socket.port in nifi.properties
to some available port, and you probably want nifi.remote.input.secure=false for testing.

-Bryan


On Fri, Jan 8, 2016 at 6:27 PM, Chakrader Dewaragatla <Chakrader.Dewaragatla@lifelock.com<javascript:_e(%7B%7D,'cvml','Chakrader.Dewaragatla@lifelock.com');>>
wrote:
Mark – I have setup a two node cluster and tried the following .
 GenrateFlowfile processor (Run only on primary node) —> DistributionLoad processor (RoundRobin)
  —> PutFile

>> The GetFile/PutFile will run on all nodes (unless you schedule it to run on primary
node only).
>From your above comment, It should put file on two nodes. It put files on primary node
only. Any thoughts ?

Thanks,
-Chakri

From: Mark Payne <markap14@hotmail.com<javascript:_e(%7B%7D,'cvml','markap14@hotmail.com');>>
Reply-To: "users@nifi.apache.org<javascript:_e(%7B%7D,'cvml','users@nifi.apache.org');>"
<users@nifi.apache.org<javascript:_e(%7B%7D,'cvml','users@nifi.apache.org');>>
Date: Wednesday, October 7, 2015 at 11:28 AM

To: "users@nifi.apache.org<javascript:_e(%7B%7D,'cvml','users@nifi.apache.org');>" <users@nifi.apache.org<javascript:_e(%7B%7D,'cvml','users@nifi.apache.org');>>
Subject: Re: Nifi cluster features - Questions

Chakri,

Correct - when NiFi instances are clustered, they do not transfer data between the nodes.
This is very different
than you might expect from something like Storm or Spark, as the key goals and design are
quite different.
We have discussed providing the ability to allow the user to indicate that they want to have
the framework
do load balancing for specific connections in the background, but it's still in more of a
discussion phase.

Site-to-Site is simply the capability that we have developed to transfer data between one
instance of
NiFi and another instance of NiFi. So currently, if we want to do load balancing across the
cluster, we would
create a site-to-site connection (by dragging a Remote Process Group onto the graph) and give
that
site-to-site connection the URL of our cluster. That way, you can push data to your own cluster,
effectively
providing a load balancing capability.

If you were to just run ListenHTTP without setting it to Primary Node, then every node in
the cluster will be listening
for incoming HTTP connections. So you could then use a simple load balancer in front of NiFi
to distribute the load
across your cluster.

Does this help? If you have any more questions we're happy to help!

Thanks
-Mark


On Oct 7, 2015, at 2:32 PM, Chakrader Dewaragatla <Chakrader.Dewaragatla@lifelock.com<javascript:_e(%7B%7D,'cvml','Chakrader.Dewaragatla@lifelock.com');>>
wrote:

Mark - Thanks for the notes.

>> The other option would be to have a ListenHTTP processor run on Primary Node only
and then use Site-to-Site to distribute the data to other nodes.
Lets say I have 5 node cluster and ListenHTTP processor on Primary node, collected data on
primary node is not transfered to other nodes by default for processing despite all nodes
are part of one cluster?
If ListenHTTP processor is running  as a dafult (with out explicit setting to run on primary
node), how does the data transferred to rest of the nodes? Does site-to-site come in play
when I make one processor to run on primary node ?

Thanks,
-Chakri

From: Mark Payne <markap14@hotmail.com<javascript:_e(%7B%7D,'cvml','markap14@hotmail.com');>>
Reply-To: "users@nifi.apache.org<javascript:_e(%7B%7D,'cvml','users@nifi.apache.org');>"
<users@nifi.apache.org<javascript:_e(%7B%7D,'cvml','users@nifi.apache.org');>>
Date: Wednesday, October 7, 2015 at 7:00 AM
To: "users@nifi.apache.org<javascript:_e(%7B%7D,'cvml','users@nifi.apache.org');>" <users@nifi.apache.org<javascript:_e(%7B%7D,'cvml','users@nifi.apache.org');>>
Subject: Re: Nifi cluster features - Questions

Hello Chakro,

When you create a cluster of NiFi instances, each node in the cluster is acting independently
and in exactly
the same way. I.e., if you have 5 nodes, all 5 nodes will run exactly the same flow. However,
they will be
pulling in different data and therefore operating on different data.

So if you pull in 10 1-gig files from S3, each of those files will be processed on the node
that pulled the data
in. NiFi does not currently shuffle data around between nodes in the cluster (you can use
site-to-site to do
this if you want to, but it won't happen automatically). If you set the number of Concurrent
Tasks to 5, then
you will have up to 5 threads running for that processor on each node.

The only exception to this is the Primary Node. You can schedule a Processor to run only on
the Primary Node
by right-clicking on the Processor, and going to the Configure menu. In the Scheduling tab,
you can change
the Scheduling Strategy to Primary Node Only. In this case, that Processor will only be triggered
to run on
whichever node is elected the Primary Node (this can be changed in the Cluster management
screen by clicking
the appropriate icon in the top-right corner of the UI).

The GetFile/PutFile will run on all nodes (unless you schedule it to run on primary node only).

If you are attempting to have a single input running HTTP and then push that out across the
entire cluster to
process the data, you would have a few options. First, you could just use an HTTP Load Balancer
in front of NiFi.
The other option would be to have a ListenHTTP processor run on Primary Node only and then
use Site-to-Site
to distribute the data to other nodes.

For more info on site-to-site, you can see the Site-to-Site section of the User Guide at
http://nifi.apache.org/docs/nifi-docs/html/user-guide.html#site-to-site

If you have any more questions, let us know!

Thanks
-Mark

On Oct 7, 2015, at 2:33 AM, Chakrader Dewaragatla <Chakrader.Dewaragatla@lifelock.com<javascript:_e(%7B%7D,'cvml','Chakrader.Dewaragatla@lifelock.com');>>
wrote:

Nifi Team – I would like to understand the advantages of Nifi clustering setup.

Questions :

 - How does workflow work on multiple nodes ? Does it share the resources intra nodes ?
Lets say I need to pull data 10 1Gig files from S3, how does work load distribute  ? Setting
concurrent tasks as 5. Does it spew 5 tasks per node ?

 - How to “isolate” the processor to the master node (or one node)?

- Getfile/Putfile processors on cluster setup, does it get/put on primary node ? How do I
force processor to look in one of the slave node?

- How can we have a workflow where the input side we want to receive requests (http) and then
the rest of the pipeline need to run in parallel on all the nodes ?

Thanks,
-Chakro

________________________________
The information contained in this transmission may contain privileged and confidential information.
It is intended only for the use of the person(s) named above. If you are not the intended
recipient, you are hereby notified that any review, dissemination, distribution or duplication
of this communication is strictly prohibited. If you are not the intended recipient, please
contact the sender by reply email and destroy all copies of the original message.
________________________________

________________________________
The information contained in this transmission may contain privileged and confidential information.
It is intended only for the use of the person(s) named above. If you are not the intended
recipient, you are hereby notified that any review, dissemination, distribution or duplication
of this communication is strictly prohibited. If you are not the intended recipient, please
contact the sender by reply email and destroy all copies of the original message.
________________________________

________________________________
The information contained in this transmission may contain privileged and confidential information.
It is intended only for the use of the person(s) named above. If you are not the intended
recipient, you are hereby notified that any review, dissemination, distribution or duplication
of this communication is strictly prohibited. If you are not the intended recipient, please
contact the sender by reply email and destroy all copies of the original message.
________________________________

________________________________
The information contained in this transmission may contain privileged and confidential information.
It is intended only for the use of the person(s) named above. If you are not the intended
recipient, you are hereby notified that any review, dissemination, distribution or duplication
of this communication is strictly prohibited. If you are not the intended recipient, please
contact the sender by reply email and destroy all copies of the original message.
________________________________


--
Sent from Gmail Mobile
________________________________
The information contained in this transmission may contain privileged and confidential information.
It is intended only for the use of the person(s) named above. If you are not the intended
recipient, you are hereby notified that any review, dissemination, distribution or duplication
of this communication is strictly prohibited. If you are not the intended recipient, please
contact the sender by reply email and destroy all copies of the original message.
________________________________

Mime
View raw message