Return-Path: X-Original-To: apmail-storm-user-archive@minotaur.apache.org Delivered-To: apmail-storm-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 90A88117B9 for ; Wed, 16 Jul 2014 23:54:12 +0000 (UTC) Received: (qmail 17586 invoked by uid 500); 16 Jul 2014 23:54:12 -0000 Delivered-To: apmail-storm-user-archive@storm.apache.org Received: (qmail 17543 invoked by uid 500); 16 Jul 2014 23:54:12 -0000 Mailing-List: contact user-help@storm.incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@storm.incubator.apache.org Delivered-To: mailing list user@storm.incubator.apache.org Received: (qmail 17532 invoked by uid 99); 16 Jul 2014 23:54:12 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 16 Jul 2014 23:54:12 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of andreas.grammenos@gmail.com designates 209.85.216.175 as permitted sender) Received: from [209.85.216.175] (HELO mail-qc0-f175.google.com) (209.85.216.175) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 16 Jul 2014 23:54:09 +0000 Received: by mail-qc0-f175.google.com with SMTP id w7so1482817qcr.34 for ; Wed, 16 Jul 2014 16:53:45 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :content-type; bh=+qs9q7NQDZCkAV/xgkgK5ySxVZvd1FW/XZXEPkyE0dQ=; b=xO8u/HfxBeusfJ9RBP81zrxa7NREimFnwI+CmVtMgdI5VB2Ot4ZXdx1P7EsD8hKSBh APBJ4mUhDUTitg9IcszflN+eWSs1o6JqP0+6rbNaVcBGy80cbKKp3LXjQwFVItIgkDZm js1og6V2GjoTDjiq2U1l2J2KZz5z0udRpoy7arBr1o9PG7QV7a5YCpAvv6L4kj0xU+sH VSkmj42HHN8da90fZZC/4M8K6ms58XHP2NWdGHtpymcwVXRUoVxicNTX48dkuYJkAgzt 0uRb9W2VL+FaFPo/aBt4hi5BchcwO2aQNMUjlKJFZi1zd8sqeTC8ujswyqKUntpkVuKq HRkA== X-Received: by 10.140.92.235 with SMTP id b98mr29870806qge.97.1405554824870; Wed, 16 Jul 2014 16:53:44 -0700 (PDT) MIME-Version: 1.0 Received: by 10.96.75.69 with HTTP; Wed, 16 Jul 2014 16:53:24 -0700 (PDT) In-Reply-To: References: From: Andrew Xor Date: Thu, 17 Jul 2014 02:53:24 +0300 Message-ID: Subject: Re: Distribute Spout output among all bolts To: user@storm.incubator.apache.org Content-Type: multipart/alternative; boundary=001a113a51b40c844c04fe583c8f X-Virus-Checked: Checked by ClamAV on apache.org --001a113a51b40c844c04fe583c8f Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Hey Stephen, Michael, Yea I feared as much... as searching the docs and API did not surface any reliable and elegant way of doing that unless you had a "RouterBolt". If setting the parallelism of a component is enough for load balancing the processes across different machines that are part of the Storm cluster then this would suffice in my use case. Although here the documentation says executors are threads and it does not explicitly say anywhere that threads are spawned across different nodes of the cluster... I want to avoid the possibility of these threads only spawning locally and not in a distributed fashion among the cluster nodes.. Andrew. On Thu, Jul 17, 2014 at 2:46 AM, Michael Rose wrote: > Maybe we can help with your topology design if you let us know what you'r= e > doing that requires you to shuffle half of the whole stream output to eac= h > of the two different types of bolts. > > If bolt b1 and bolt b2 are both instances of ExampleBolt (and not two > different types) as above, there's no point to doing this. Setting the > parallelism will make sure that data is partitioned across machines (by > default, setting parallelism sets tasks =3D executors =3D parallelism). > > Unfortunately, I don't know of any way to do this other than shuffling th= e > output to a new bolt, e.g. bolt "b0" a 'RouterBolt', then having bolt b0 > round-robin the received tuples between two streams, then have b1 and b2 > shuffle over those streams instead. > > > > Michael Rose (@Xorlev ) > Senior Platform Engineer, FullContact > michael@fullcontact.com > > > On Wed, Jul 16, 2014 at 5:40 PM, Andrew Xor > wrote: > >> =E2=80=8B >> Hi Tomas, >> >> As I said in my previous mail the grouping is for a bolt *task* not for >> the actual number of spawned bolts; for example let's say you have two >> bolts that have a parallelism hint of 3 and these two bolts are wired to >> the same spout. If you set the bolts as such: >> >> tb.setBolt("b1", new ExampleBolt(), 2 /* p-hint >> */).shuffleGrouping("spout1"); >> tb.setBolt("b2", new ExampleBolt(), 2 /* p-hint >> */).shuffleGrouping("spout1"); >> >> Then each of the tasks will receive half of the spout tuples but each >> actual spawned bolt will receive all of the tuples emitted from the spou= t. >> This is more evident if you set up a counter in the bolt counting how ma= ny >> tuples if has received and testing this with no parallelism hint as such= : >> >> tb.setBolt("b1", new ExampleBolt(),).shuffleGrouping("spout1"); >> tb.setBolt("b2", new ExampleBolt()).shuffleGrouping("spout1"); >> >> Now you will see that both bolts will receive all tuples emitted by >> spout1. >> >> Hope this helps. >> >> =E2=80=8B >> =E2=80=8BAndrew.=E2=80=8B >> >> >> On Thu, Jul 17, 2014 at 2:33 AM, Tomas Mazukna >> wrote: >> >>> Andrew, >>> >>> when you connect your bolt to your spout you specify the grouping. If >>> you use shuffle grouping then any free bolt gets the tuple - in my >>> experience even in lightly loaded topologies the distribution amongst b= olts >>> is pretty even. If you use all grouping then all bolts receive a copy o= f >>> the tuple. >>> Use shuffle grouping and each of your bolts will get about 1/3 of the >>> workload. >>> >>> Tomas >>> >>> >>> On Wed, Jul 16, 2014 at 7:05 PM, Andrew Xor >> > wrote: >>> >>>> H >>>> =E2=80=8Bi, >>>> >>>> I am trying to distribute the spout output to it's subscribed bolts >>>> evenly; let's say that I have a spout that emits tuples and three bolt= s >>>> that are subscribed to it. I want each of the three bolts to receive 1= /3 >>>> rth of the output (or emit a tuple to each one of these bolts in turns= ). >>>> Unfortunately as far as I understand all bolts will receive all of the >>>> emitted tuples of that particular spout regardless of the grouping def= ined >>>> (as grouping from my understanding is for bolt *tasks* not actual bolt= s). >>>> >>>> I've searched a bit and I can't seem to find a way to accomplish >>>> that...=E2=80=8B is there a way to do that or I am searching in vain? >>>> >>>> Thanks. >>>> >>> >>> >>> >>> -- >>> Tomas Mazukna >>> 678-557-3834 >>> >> >> > --001a113a51b40c844c04fe583c8f Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
Hey Stephen, Michael,<= br>
=C2=A0Yea I feared as much... as searching the docs and API did not surface= any reliable and elegant way of doing that unless you had a "RouterBo= lt". If setting the parallelism of a component is enough for load bala= ncing the processes across different machines that are part of the Storm cl= uster then this would suffice in my use case. Although here the documentation says executors are threads = and it does not explicitly say anywhere that threads are spawned across dif= ferent nodes of the cluster... I want to avoid the possibility of these thr= eads only spawning locally and not in a distributed fashion among the clust= er nodes..

Andrew.


On Thu, Jul 17, 2014 at= 2:46 AM, Michael Rose <michael@fullcontact.com> wrote= :
Maybe we can help= with your topology design if you let us know what you're doing that re= quires you to shuffle half of the whole stream output to each of the two di= fferent types of bolts.

If bolt b1 and bolt b2 are both instances of ExampleBolt (an= d not two different types) as above, there's no point to doing this. Se= tting the parallelism will make sure that data is partitioned across machin= es (by default, setting parallelism sets tasks =3D executors =3D parallelis= m).

Unfortunately, I don't know of any way to do this = other than shuffling the output to a new bolt, e.g. bolt "b0" a &= #39;RouterBolt', then having bolt b0 round-robin the received tuples be= tween two streams, then have b1 and b2 shuffle over those streams instead.<= div>


Michael Rose (@= Xorlev)
Senior Platform Engineer,=C2=A0FullContact
michael@fullcontact.com<= /p>



On Wed, Jul 16, 2014 at 5:40 PM, Andrew = Xor <andreas.grammenos@gmail.com> wrote:
=E2=80=8B
Hi Tomas,

=C2=A0As I said in my previous mail the grouping is for a bolt *task* not for=20 the actual number of spawned bolts; for example let's say you have two = bolts that=20 have a parallelism hint of 3 and these two bolts are wired to the same=20 spout. If you set the bolts as such:

tb.setBolt("b1", new = ExampleBolt(), 2 /* p-hint */).shuffleGrouping("spout1");
tb.s= etBolt("b2", new ExampleBolt(), 2 /* p-hint */).shuffleGrouping(&= quot;spout1");

Then each of the tasks will receive half of the spout tuples but each actual spawned bolt will receive all of the tuples emitted from the spout.=20 This is more evident if you set up a counter in the bolt counting how=20 many tuples if has received and testing this with no parallelism hint as such:

tb.setBolt("b1", new ExampleBolt(),).shuffleGroupin= g("spout1");
tb.setBolt("b2", new ExampleBolt()).shu= ffleGrouping("spout1");

Now you will see that both bolts will receive all tuples emitted by spout1.=

Hope this helps.

=E2=80=8B
=E2=80=8BAndrew.=E2=80=8B


On Thu, Jul 17, 2014 at 2:33 AM, Tomas Mazukna <t= omas.mazukna@gmail.com> wrote:
Andrew,<= div>
when you connect your bolt to your spout you specify the= grouping. If you use shuffle grouping then any free bolt gets the tuple - = in my experience even in lightly loaded topologies the distribution amongst= bolts is pretty even. If you use all grouping then all bolts receive a cop= y of the tuple.=C2=A0
Use shuffle grouping and each of your bolts will get about 1/3 of the = workload.

Tomas


On Wed, Jul 16, 2014 at 7:= 05 PM, Andrew Xor <andreas.grammenos@gmail.com> wr= ote:
H
=E2=80=8Bi,

=C2=A0I am trying to distribute the spout output to it's subscribed bol= ts evenly; let's say that I have a spout that emits tuples and three bo= lts that are subscribed to it. I want each of the three bolts to receive 1/= 3 rth of the output (or emit a tuple to each one of these bolts in turns). = Unfortunately as far as I understand all bolts will receive all of the emit= ted tuples of that particular spout regardless of the grouping defined (as = grouping from my understanding is for bolt *tasks* not actual bolts).

=C2=A0I've searched a bit and I ca= n't seem to find a way to accomplish that...=E2=80=8B is there a way to= do that or I am searching in vain?

Thanks.



--
Tomas Mazukna
678-557-3834



--001a113a51b40c844c04fe583c8f--