Return-Path: X-Original-To: apmail-nifi-users-archive@minotaur.apache.org Delivered-To: apmail-nifi-users-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id D28A518945 for ; Wed, 14 Oct 2015 21:07:06 +0000 (UTC) Received: (qmail 1402 invoked by uid 500); 14 Oct 2015 21:07:00 -0000 Delivered-To: apmail-nifi-users-archive@nifi.apache.org Received: (qmail 1376 invoked by uid 500); 14 Oct 2015 21:07:00 -0000 Mailing-List: contact users-help@nifi.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: users@nifi.apache.org Delivered-To: mailing list users@nifi.apache.org Received: (qmail 1366 invoked by uid 99); 14 Oct 2015 21:07:00 -0000 Received: from Unknown (HELO spamd1-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 14 Oct 2015 21:07:00 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd1-us-west.apache.org (ASF Mail Server at spamd1-us-west.apache.org) with ESMTP id 12B8FC445F for ; Wed, 14 Oct 2015 21:07:00 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd1-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 3.219 X-Spam-Level: *** X-Spam-Status: No, score=3.219 tagged_above=-999 required=6.31 tests=[FREEMAIL_ENVFROM_END_DIGIT=0.25, HTML_MESSAGE=3, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, SPF_PASS=-0.001, T_RP_MATCHES_RCVD=-0.01] autolearn=disabled Received: from mx1-us-west.apache.org ([10.40.0.8]) by localhost (spamd1-us-west.apache.org [10.40.0.7]) (amavisd-new, port 10024) with ESMTP id s4KFjwJoXmkG for ; Wed, 14 Oct 2015 21:06:59 +0000 (UTC) Received: from BLU004-OMC1S13.hotmail.com (blu004-omc1s13.hotmail.com [65.55.116.24]) by mx1-us-west.apache.org (ASF Mail Server at mx1-us-west.apache.org) with ESMTPS id B9934204CB for ; Wed, 14 Oct 2015 21:06:58 +0000 (UTC) Received: from BLU437-SMTP44 ([65.55.116.7]) by BLU004-OMC1S13.hotmail.com over TLS secured channel with Microsoft SMTPSVC(7.5.7601.23008); Wed, 14 Oct 2015 14:06:52 -0700 X-TMN: [r7TKryKnmCD9LZ5PMH1mt13HrrfvhIOi] X-Originating-Email: [markap14@hotmail.com] Message-ID: From: Mark Payne Content-Type: multipart/alternative; boundary="Apple-Mail=_DD358A46-88A3-4DCB-9816-2038BC49AFF2" MIME-Version: 1.0 (Mac OS X Mail 8.2 \(2104\)) Subject: Re: Nifi Clustering - work distribution on workers Date: Wed, 14 Oct 2015 17:06:49 -0400 References: <1174907916.398683.1444847314218.JavaMail.yahoo@mail.yahoo.com> <1174907916.398683.1444847314218.JavaMail.yahoo@mail.yahoo.com> <1018805354.478359.1444855762348.JavaMail.yahoo@mail.yahoo.com> To: users@nifi.apache.org, M Singh In-Reply-To: <1018805354.478359.1444855762348.JavaMail.yahoo@mail.yahoo.com> X-Mailer: Apple Mail (2.2104) X-OriginalArrivalTime: 14 Oct 2015 21:06:50.0653 (UTC) FILETIME=[3EBF34D0:01D106C4] --Apple-Mail=_DD358A46-88A3-4DCB-9816-2038BC49AFF2 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="us-ascii" Mans, Nodes in a cluster work independently from one another and do not know = about each other. That is accurate. Each node in a cluster runs the same flow. Typically, if you want to = pull from HDFS and partition that data across the cluster, you would run ListHDFS on the Primary Node only, and = then use Site-to-Site [1] to distribute that listing to all nodes in the cluster. Each node would then pull the = data that it is responsible to pull and begin working on it. We do realize that this is not ideal to have to setup = this way, and it is something that we are working on so that it is much easier to have that listing automatically = distributed across the cluster. I'm not sure that I understand your #3 - how do we design the workflow = so that the nodes work on one file at a time? For each Processor, you can configure how many threads (Concurrent = Tasks) are to be used in the Scheduling tab of the Processor Configuration dialog. You can certainly configure that = to run only a single Concurrent Task.=20 This is the number of Concurrent Tasks that will run on each node in the = cluster, not the total number of concurrent tasks that would run across the entire cluster. I am not sure that I understand your #4 either. Are you indicating that = you want to configure each node in the cluster with a different value for a processor property? Does this help? Thanks -Mark [1] = http://nifi.apache.org/docs/nifi-docs/html/user-guide.html#site-to-site > On Oct 14, 2015, at 4:49 PM, M Singh wrote: >=20 > Hi: >=20 >=20 >=20 > A few questions about NiFi cluster: >=20 > 1. If we have multiple worker nodes in the cluster, do they partition = the work if the source allows partitioning - eg: HDFS, or do all the = nodes work on the same data ? > 2. If the nodes partition the work, then how do they coordinate the = work distribution and recovery etc ? =46rom the documentation it = appears that the workers are not aware of each other. > 3. If I need to process multiple files - how do we design the work = flow so that the nodes work on one file at a time ? > 4. If I have multiple arguments and need to pass one parameter to each = worker, how can I do that ? > 5. Is there any way to control how many workers are involved in = processing the flow ? > 6. Does specifying the number of threads in the processor distribute = work on multiple workers ? Does it split the task across the threads or = is it the responsibility of the application ? >=20 > I tried to find some answers from the documentation and users list but = could not get a clear picture. >=20 > Thanks >=20 > Mans >=20 >=20 >=20 >=20 --Apple-Mail=_DD358A46-88A3-4DCB-9816-2038BC49AFF2 Content-Transfer-Encoding: quoted-printable Content-Type: text/html; charset="us-ascii"
Mans,

Nodes in a cluster work independently = from one another and do not know about each other. That is = accurate.
Each node in a cluster runs the same = flow. Typically, if you want to pull from HDFS and partition that = data
across the cluster, you would run ListHDFS on = the Primary Node only, and then use Site-to-Site [1] to = distribute
that listing to all nodes in the = cluster. Each node would then pull the data that it is responsible to = pull and begin
working on it. We do realize that = this is not ideal to have to setup this way, and it is something that we = are working
on so that it is much easier to have = that listing automatically distributed across the cluster.

I'm not sure that I = understand your #3 - how do we design the workflow so that the nodes = work on one file at a time?
For each Processor, you = can configure how many threads (Concurrent Tasks) are to be used in the = Scheduling tab
of the Processor Configuration = dialog. You can certainly configure that to run only a single Concurrent = Task. 
This is the number of Concurrent Tasks = that will run on each node in the cluster, not the total number of = concurrent
tasks that would run across the entire = cluster.

I am = not sure that I understand your #4 either. Are you indicating that you = want to configure each node in the cluster
with a = different value for a processor property?

Does this help?

Thanks
-Mark



On = Oct 14, 2015, at 4:49 PM, M Singh <mans2singh@yahoo.com> wrote:

Hi:



A few = questions about NiFi cluster:

1. If we have multiple worker nodes in the cluster, do they = partition the work if the source allows partitioning - eg: HDFS, or do = all the nodes work on the same data ?
2. If the = nodes partition the work, then how do they coordinate the work = distribution and recovery etc ?  =46rom the documentation it = appears that the workers are not aware of each other.
3. If I need to process multiple files - how do we design the = work flow so that the nodes work on one file at a time ?
4. If I have multiple arguments and need to pass one = parameter to each worker, how can I do that ?
5. Is there any way to control how many workers are involved = in processing the flow ?
6. Does specifying the number of threads in the processor = distribute work on multiple workers ?  Does it split the task = across the threads or is it the responsibility of the application = ?

I tried to find some answers from the documentation and users = list but could not get a clear picture.

Thanks

Mans




=

= --Apple-Mail=_DD358A46-88A3-4DCB-9816-2038BC49AFF2--