cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "David O'Dell (JIRA)" <j...@apache.org>
Subject [jira] [Created] (CASSANDRA-7567) when the commit_log disk for a single node is overwhelmed the entire cluster slows down
Date Thu, 17 Jul 2014 17:47:06 GMT
David O'Dell created CASSANDRA-7567:
---------------------------------------

             Summary: when the commit_log disk for a single node is overwhelmed the entire
cluster slows down
                 Key: CASSANDRA-7567
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-7567
             Project: Cassandra
          Issue Type: Bug
          Components: Core
         Environment: debian 7.5, bare metal, 14 nodes, 64CPUs, 64GB RAM, commit_log disk
sata, data disk SSD, vnodes, leveled compaction strategy
            Reporter: David O'Dell
         Attachments: write_request_latency.png

We've run into a situation where a single node out of 14 is experiencing high disk io. This
can happen when a node is being decommissioned or after it joins the ring and runs into the
bug cassandra-6621.
When this occurs the write latency for the entire cluster spikes.
>From 0.3ms to 170ms.
To simulate this simply run dd on the commit_log disk (dd if=/dev/zero of=/tmp/foo bs=1024)
and you will see that instantly all nodes in the cluster have slowed down.
BTW overwhelming the data disk does not have this same effect.
Also I've tried this where the overwhelmed node isn't being connected directly from the client
and it still has the same effect.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message