cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sylvain Lebresne (JIRA)" <>
Subject [jira] [Commented] (CASSANDRA-10249) Reduce over-read for standard disk io by 16x
Date Thu, 03 Sep 2015 07:55:46 GMT


Sylvain Lebresne commented on CASSANDRA-10249:

Aside from Jonathan's comment, I don't think it's reasonable to even consider a change in
default in 2.1 at this point unless we're willing to do *way* more benchmarking that just
with the default stress workload (for the reasons Jake mentions). What I could suggest however
is to make the {{DEFAULT_BUFFER_SIZE}} configurable through a system property so that 2.1/2.2
have at least the option to optimize this for their workload.

> Reduce over-read for standard disk io by 16x
> --------------------------------------------
>                 Key: CASSANDRA-10249
>                 URL:
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Albert P Tobey
>             Fix For: 2.1.x
>         Attachments: patched-2.1.9-dstat-lvn10.png, stock-2.1.9-dstat-lvn10.png, yourkit-screenshot.png
> On read workloads, Cassandra 2.1 reads drastically more data than it emits over the network.
This causes problems throughput the system by wasting disk IO and causing unnecessary GC.
> I have reproduce the issue on clusters and locally with a single instance. The only requirement
to reproduce the issue is enough data to blow through the page cache. The default schema and
data size with cassandra-stress is sufficient for exposing the issue.
> With stock 2.1.9 I regularly observed anywhere from 300:1  to 500 disk:network ratio.
That is to say, for 1MB/s of network IO, Cassandra was doing 300-500MB/s of disk reads, saturating
the drive.
> After applying this patch for standard IO mode
the ratio fell to around 100:1 on my local test rig. Latency improved considerably and GC
became a lot less frequent.
> I tested with 512 byte reads as well, but got the same performance, which makes sense
since all HDD and SSD made in the last few years have a 4K block size (many of them lie and
say 512).
> I'm re-running the numbers now and will post them tomorrow.

This message was sent by Atlassian JIRA

View raw message