cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Oleg Kibirev (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CASSANDRA-5039) Make sure all instances of BlockingQueue have configurable and sane limits
Date Wed, 22 Oct 2014 22:54:34 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-5039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14180704#comment-14180704
] 

Oleg Kibirev commented on CASSANDRA-5039:
-----------------------------------------

I no longer work on this particular project, but basically the problem happened when I ran
3 nodes on different disks of the same machine, loaded them heavily with inserts and the pulled
out one of the disks. The remaining nodes would run out of memory as they would queue many
operations before discovering that the destination has died.

Setting a limit on the corresponding queue allowed the system to remain operational.

We have seen similar OOMs in production. That's about all the details I remember about this
experiment.

> Make sure all instances of BlockingQueue have configurable and sane limits
> --------------------------------------------------------------------------
>
>                 Key: CASSANDRA-5039
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-5039
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>    Affects Versions: 1.1.7
>            Reporter: Oleg Kibirev
>            Priority: Minor
>              Labels: performance
>
> Currently, most BlockingQueues in cassandra are creating without any limits (execution
stages) or with limits high enough to consume gigabytes of heap (PeriodicCommitLogExecutorService).
I have observed many cases where a single unresponsive node can bring down entire cluster
because others accumulate huge backlogs of operations.
> We need to make sure each queue is configurable through a yaml entry or a system property
and defaults are chosen so that any given queue doesn't consume more than 100M of heap. I
have successfully tested that adding these limits makes cluster resistant to heavy load or
a bad node.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message