cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Benedict (JIRA)" <>
Subject [jira] [Commented] (CASSANDRA-15013) Message Flusher queue can grow unbounded, potentially running JVM out of memory
Date Wed, 10 Jul 2019 07:02:00 GMT


Benedict commented on CASSANDRA-15013:

Thanks for these [~sumanth.pasupuleti]!

Just to log for watchers, I have had a brief chat with Sumanth, and we intend to capture flame
graphs to see if we can explain the 10% (5 percentage point) bump in average CPU utilisation,
which may well be down to competition on a single variable for every operation.  This is a
worst case cost, given the formulation of this test, which was the whole point - but it's
potentially still significant, so we might need to reduce friction by e.g. assigning each
connection its own share of the pie at connection, so that we only have to compete for the
shared resource infrequently (when we overshot our share, or need to dis/connect).  We'll
see what the flame graphs show.

We will also try to explain the different shape of heap utilisation graph - which might be
as simple as only one node is coordinating instead of all three, for instance.

> Message Flusher queue can grow unbounded, potentially running JVM out of memory
> -------------------------------------------------------------------------------
>                 Key: CASSANDRA-15013
>                 URL:
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Messaging/Client
>            Reporter: Sumanth Pasupuleti
>            Assignee: Sumanth Pasupuleti
>            Priority: Normal
>              Labels: pull-request-available
>             Fix For: 4.0, 3.0.x, 3.11.x
>         Attachments: BlockedEpollEventLoopFromHeapDump.png, BlockedEpollEventLoopFromThreadDump.png,
RequestExecutorQueueFull.png, heap dump showing each ImmediateFlusher taking upto 600MB.png,
perftest_blockedthreads.png, perftest_connections_count.png, perftest_cpu_usage.png, perftest_heap_usage.png,
perftest_readlatency_99th.png, perftest_readlatency_avg.png, perftest_readops.png, perftest_writelatency_99th.png,
perftest_writelatency_avg.png, perftest_writeops.png
> This is a follow-up ticket out of CASSANDRA-14855, to make the Flusher queue bounded,
since, in the current state, items get added to the queue without any checks on queue size,
nor with any checks on netty outbound buffer to check the isWritable state.
> We are seeing this issue hit our production 3.0 clusters quite often.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message