cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jim Plush (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (CASSANDRA-10249) Make buffered read size configurable
Date Fri, 11 Sep 2015 19:55:47 GMT

     [ https://issues.apache.org/jira/browse/CASSANDRA-10249?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Jim Plush updated CASSANDRA-10249:
----------------------------------
    Attachment: Screenshot 2015-09-11 09.34.10.png
                Screenshot 2015-09-11 09.32.04.png

Uploading some testing screenshots I was doing the last couple days when trying to establish
some benchmarks. With compression off I was looking to do 1million writes (RF3) with 50K reads
on a 60 node cluster. with the default of 64K buffer size I/O was saturated and read latency
was 100+ms. with the buffer at 4K I/O was quite stable at that rate. This was a straight row
key look up test. e.g. no wide rows. It was reading way too much data for the queries. Would
it be possible to have the buffer size set on a per table setting?
(screenshots attached)

> Make buffered read size configurable
> ------------------------------------
>
>                 Key: CASSANDRA-10249
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-10249
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Albert P Tobey
>             Fix For: 2.1.x
>
>         Attachments: Screenshot 2015-09-11 09.32.04.png, Screenshot 2015-09-11 09.34.10.png,
patched-2.1.9-dstat-lvn10.png, stock-2.1.9-dstat-lvn10.png, yourkit-screenshot.png
>
>
> On read workloads, Cassandra 2.1 reads drastically more data than it emits over the network.
This causes problems throughput the system by wasting disk IO and causing unnecessary GC.
> I have reproduce the issue on clusters and locally with a single instance. The only requirement
to reproduce the issue is enough data to blow through the page cache. The default schema and
data size with cassandra-stress is sufficient for exposing the issue.
> With stock 2.1.9 I regularly observed anywhere from 300:1  to 500 disk:network ratio.
That is to say, for 1MB/s of network IO, Cassandra was doing 300-500MB/s of disk reads, saturating
the drive.
> After applying this patch for standard IO mode https://gist.github.com/tobert/10c307cf3709a585a7cf
the ratio fell to around 100:1 on my local test rig. Latency improved considerably and GC
became a lot less frequent.
> I tested with 512 byte reads as well, but got the same performance, which makes sense
since all HDD and SSD made in the last few years have a 4K block size (many of them lie and
say 512).
> I'm re-running the numbers now and will post them tomorrow.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message