cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Albert P Tobey (JIRA)" <j...@apache.org>
Subject [jira] [Created] (CASSANDRA-10249) Reduce over-read for standard disk io by 16x
Date Wed, 02 Sep 2015 01:46:46 GMT
Albert P Tobey created CASSANDRA-10249:
------------------------------------------

             Summary: Reduce over-read for standard disk io by 16x
                 Key: CASSANDRA-10249
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-10249
             Project: Cassandra
          Issue Type: Improvement
            Reporter: Albert P Tobey
             Fix For: 2.1.x


On read workloads, Cassandra 2.1 reads drastically more data than it emits over the network.
This causes problems throughput the system by wasting disk IO and causing unnecessary GC.

I have reproduce the issue on clusters and locally with a single instance. The only requirement
to reproduce the issue is enough data to blow through the page cache. The default schema and
data size with cassandra-stress is sufficient for exposing the issue.

With stock 2.1.9 I regularly observed anywhere from 300:1  to 500 disk:network ratio. That
is to say, for 1MB/s of network IO, Cassandra was doing 300-500MB/s of disk reads, saturating
the drive.

After applying this patch https://gist.github.com/tobert/10c307cf3709a585a7cf the ratio fell
to around 100:1 on my local test rig.

I tested with 512 byte reads as well, but got the same performance, which makes sense since
all HDD and SSD made in the last few years have a 4K block size (many of them lie and say
512).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message