storm-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jiahong Li (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (STORM-339) Severe memory leak to OOM when ackers disabled
Date Mon, 07 Jul 2014 14:06:34 GMT

    [ https://issues.apache.org/jira/browse/STORM-339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14053676#comment-14053676
] 

Jiahong Li commented on STORM-339:
----------------------------------

1)DROP is not a good way because if your components are not properly sized, there will be
many dropped tuples and there is no good way to measure the max throughput  your topology
can bear without significant dropping.
2)BLOCK is not a good way either because it will cause potential dead loop problem in transport
layer. It will cause the whole topology hang.
3)PAGE is not a good way as it will continuous  increase latency and there is no good way
to measure the max throughput your topology can bear with all tuples executed in reasonable
latency.

Maybe the best way is to implement a flow control mechanism like spout pending but much simple
one.


> Severe memory leak to OOM when ackers disabled
> ----------------------------------------------
>
>                 Key: STORM-339
>                 URL: https://issues.apache.org/jira/browse/STORM-339
>             Project: Apache Storm (Incubating)
>          Issue Type: Bug
>    Affects Versions: 0.9.2-incubating
>            Reporter: Jiahong Li
>
> Without any ackers enabled, fast component  will continuously leak memory and causing
OOM problems when target component is slow. The OOM problem can be reproduced by running this
fast-slow-topology:
> https://github.com/Gvain/storm-perf-test/tree/fast-slow-topology
> with command:
> {code}
> $ storm jar storm_perf_test-1.0.0-SNAPSHOT-jar-with-dependencies.jar com.yahoo.storm.perftest.Main
--spout 1 --bolt 1 --workers 2 --testTime 600 --messageSize 6400
> {code}
> And the worker childopts with {{-Xms2g -Xmx2g -Xmn512m ...}}.
> At the same time, the executed count of target component is far behind from the emitted
count of source component.  I guess it could be that netty client is buffering too much messages
in its message_queue as target component sends back OK/Failure Response too slowly. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message