Fay, what do you mean by "partition key data is on one node." Shouldn't a write request with RF=3 be fulfillable by any of three nodes?

I do think we have a "hot key," we're working on tracking that down.

On Sat, Sep 2, 2017 at 11:30 PM, Fay Hou [Storage Service] ­ <fayhou@coupang.com> wrote:
Most likely related to a poor data modeling. The partition key data is on one node. Checking into the queries and table design

On Sep 2, 2017 5:48 PM, Andrew Bialecki <andrew.bialecki@klaviyo.com> wrote:
We're running Cassandra 3.7 on AWS, different AZs, same region. The columns are counters and the workload is 95% writes, but of course those involves a local read and write because their coutners.

We have a node with much higher CPU load than others under heavy write volume. That node is at 100% disk utilization / high iowait. The IO load when looked at with iostat is primarily reads (95%) vs writes in terms of requests and bytes. Below's a graph of the CPU.

Any ideas to how we could diagnose what is causing so much IO vs. other nodes?

Also, we're not sure why this node in particular is hot the other two "replica" nodes (we use RF = 3). We're using the DataStax driver and are looking into the load balancing policy to see if that's an issue.

Inline image 1

--
Andrew Bialecki
Klaviyo




--
Andrew Bialecki