storm-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Thomas L. Redman" <tomred...@mchsi.com>
Subject Re: Problem moving topology from 1.2.3 to 2.2.0 - tuple distribution across cluster
Date Mon, 16 Nov 2020 15:09:35 GMT
I think somebody on the Dev team needs to look into this. The topology I wrote is on GitHub
and should reproduce the problem consistently. This feature seems broken. 

> On Nov 16, 2020, at 4:21 AM, Michael Giroux <michael_a_giroux@yahoo.com> wrote:
> 
> Hello Thomas,
> 
> Thank you very much for the response.  Disabling the feature allowed me to move from
1.2.3 => 2.2.0.  
> 
> I understand the intent of the feature however in my case the end result was one node
getting all the load and eventual netty heap space exceptions.  Perhaps I should look at that...
or perhaps I'll just leave the feature disabled.
> 
> Again - thanks for the reply - VERY helpful.
> 
> 
> On Saturday, November 14, 2020, 01:05:03 PM EST, Thomas L. Redman <tomredman@mchsi.com>
wrote:
> 
> 
> I have seen this same thing. I sent a query on this list, and. after some time, got a
response. The issue is reportedly the result of a new feature. I would assume this feature
is CLEARLY broken, as I had built a test topology that was clearly compute-bound, not IO bound,
and there were adjustments to change ratio of compute/IO. I have not tested this fix, I am
too close to release, I just rolled back to version 1.2.3.
> 
>  From version 2.1.0 forward, not matter how I changed this compute/net IO ratio, tuples
were not distributed across nodes. Now, I could only reproduce this with anchored (acked)
tuples. But if you anchored tuples, you could never span more than one node, which defeats
the purpose of using Storm. Following is that email from Kishor Patil identifying the issue
(and a way to disable that feature!!!):
> 
>> From: Kishor Patil <kishorvpatil@apache.org <mailto:kishorvpatil@apache.org>>
>> Subject: Re: Significant Bug
>> Date: October 29, 2020 at 8:07:18 AM CDT
>> To: <dev@storm.apache.org <mailto:dev@storm.apache.org>>
>> Reply-To: dev@storm.apache.org <mailto:dev@storm.apache.org>
>> 
>> Hello Thomas,
>> 
>> Apologies for delay in responding here. I tested the topology code provided in storm-issue
repo. 
>> *only one machine gets peggeg*: Although it appears, his is not a bug. This is related
to Locality Awareness. Please refer to https://github.com/apache/storm/blob/master/docs/LocalityAwareness.md
<https://github.com/apache/storm/blob/master/docs/LocalityAwareness.md>
>> It appears spout to bolt ratio is 200, so if there are enough bolt's on single node
to handle events generated by the spout, it won't send events out to another node unless it
runs out of capacity on single node. If you do not like this and want to distribute events
evenly, you can try disabling this feature. You can turn off LoadAwareShuffleGrouping by setting
topology.disable.loadaware.messaging to true.
>> -Kishor
>> 
>> On 2020/10/28 15:21:54, "Thomas L. Redman" <tomredman@mchsi.com <mailto:tomredman@mchsi.com>>
wrote: 
>>> What’s the word on this? I sent this out some time ago, including a GitHub
project that clearly demonstrates the brokenness, yet I have not heard a word. Is there anybody
supporting Storm?
>>> 
>>>> On Sep 30, 2020, at 9:03 AM, Thomas L. Redman <tomredman@mchsi.com <mailto:tomredman@mchsi.com>>
wrote:
>>>> 
>>>> I believe I have encountered a significant bug. It seems topologies employing
anchored tuples do not distribute across multiple nodes, regardless of the computation demands
of the bolts. It works fine on a single node, but when throwing multiple nodes into the mix,
only one machine gets pegged. When we disable anchoring, it will distribute across all nodes
just fine, pegging each machine appropriately.
>>>> 
>>>> This bug manifests from version 2.1 forward. I first encountered this issue
with my own production cluster on an app that does significant NLP computation across hundreds
of millions of documents. This topology is fairly complex, so I developed a very simple exemplar
that demonstrates the issue with only one spout and bolt. I pushed this demonstration up to
github to provide the developers with a mechanism to easily isolate the bug, and maybe provide
some workaround. I used gradle to build this simple topology and software and package the
results. This code is well documented, so it should be fairly simple to reproduce the issue.
I first encountered this issue on 3 32 core nodes, but when I started experimenting, I set
up a test cluster with 8 cores, and then I increased each node to 16 cores, and plenty of
memory in every case.
>>>> 
>>>> The topology can be accessed from github at https://github.com/cowchipkid/storm-issue.git
<https://github.com/cowchipkid/storm-issue.git> <https://github.com/cowchipkid/storm-issue.git
<https://github.com/cowchipkid/storm-issue.git>>. Please feel free to respond to
me directory if you have any questions that are beyond the scope of this mail list.
>>> 
>>> 
> 
> 
> Hope this helps. Please let me know how this goes, I will upgrade to 2.2.0 again for
my next release.
> 
>> On Nov 13, 2020, at 12:53 PM, Michael Giroux <michael_a_giroux@yahoo.com <mailto:michael_a_giroux@yahoo.com>>
wrote:
>> 
>> Hello, all,
>> 
>> I have a topology with 16 workers running across 4 nodes.  This topology has a bolt
"transform" with executors=1 producing a stream that is comsumed by a bolt "ontology" with
executors=160.  Everything is configured as shufflegrouping.
>> 
>> With Storm 1.2.3 all of the "ontology" bolts get their fair share of tuples.  When
I run Storm 2.2.0 only the "ontology" bolts that are on the same node as the single "transform"
bolt get tuples.  
>> 
>> Same cluster - same baseline code - only difference is binding in the new maven artifact.
>> 
>> No errors in the logs.  
>> 
>> Any thoughts would be welcome.  Thanks!
>> 
> 


Mime
View raw message