storm-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Radim Kolar (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (STORM-339) Severe memory leak to OOM when ackers disabled
Date Fri, 04 Jul 2014 15:40:34 GMT

    [ https://issues.apache.org/jira/browse/STORM-339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14052518#comment-14052518
] 

Radim Kolar commented on STORM-339:
-----------------------------------

There are 3 methods for implementing protection against OOM without need to acknowledge every
message. Storm in ack mode has 10x lower throughput.

See end of http://docs.jboss.org/hornetq/2.2.5.Final/user-manual/en/html/queue-attributes.html#queue-attributes.address-settings

1) use ring buffer for receiving messages. If messages are processed too slowly newly arriving
message will replace older unprocessed message. This is not a flow control - just protection
against OOM. (type DROP)

2) implement flow control messages, something simple like XON/XOFF protocol (http://en.wikipedia.org/wiki/Software_flow_control)
should suffice (type BLOCK)

3) save messages to disk instead of throwing them away (type PAGE)

for inspiration see http://docs.jboss.org/hornetq/2.2.5.Final/user-manual/en/html/flow-control.html

> Severe memory leak to OOM when ackers disabled
> ----------------------------------------------
>
>                 Key: STORM-339
>                 URL: https://issues.apache.org/jira/browse/STORM-339
>             Project: Apache Storm (Incubating)
>          Issue Type: Bug
>    Affects Versions: 0.9.2-incubating
>            Reporter: Jiahong Li
>
> Without any ackers enabled, fast component  will continuously leak memory and causing
OOM problems when target component is slow. The OOM problem can be reproduced by running this
fast-slow-topology:
> https://github.com/Gvain/storm-perf-test/tree/fast-slow-topology
> with command:
> {code}
> $ storm jar storm_perf_test-1.0.0-SNAPSHOT-jar-with-dependencies.jar com.yahoo.storm.perftest.Main
--spout 1 --bolt 1 --workers 2 --testTime 600 --messageSize 6400
> {code}
> And the worker childopts with {{-Xms2g -Xmx2g -Xmn512m ...}}.
> At the same time, the executed count of target component is far behind from the emitted
count of source component.  I guess it could be that netty client is buffering too much messages
in its message_queue as target component sends back OK/Failure Response too slowly. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message