cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Kyrylo Lebediev <Kyrylo_Lebed...@epam.com>
Subject Re: compaction: huge number of random reads
Date Tue, 08 May 2018 11:49:08 GMT
You are right, Kurt, it's what I was trying to do - lowering compression chunk size and device
read-ahead.

Column-family settings: "compression = {'chunk_length_kb': '16', 'sstable_compression': 'org.apache.cassandra.io.compress.SnappyCompressor'}"
Device read-ahead: blockdev --setra 8 ....

I had to fallback to default RA 256 and got large merged reads and small iops with good MBytes/sec
after this.
I believe it's not caused by C* settings, but it's something with filesystem / IO-related
kernel settings (or it's by design?).


Tried to emulate C* reads during compactions by dd:


******  RA=8 (4k)

# blockdev --setra 8 /dev/xvdb
# dd if=/dev/zero of=/data/ZZZ
^C16980952+0 records in
16980951+0 records out
8694246912 bytes (8.7 GB, 8.1 GiB) copied, 36.4651 s, 238 MB/s
# sync

# echo 3 > /proc/sys/vm/drop_caches
# dd if=/data/ZZZ of=/dev/null
^C846513+0 records in
846512+0 records out
433414144 bytes (433 MB, 413 MiB) copied, 21.4604 s, 20.2 MB/s   <<<<<

High IOPS in this case, io size = 4k.
What's interesting, setting bs=128k in dd didn't decrease iops, io size still was 4k


****** RA=256 (128k):
# blockdev --setra 256 /dev/xvdb
# echo 3 > /proc/sys/vm/drop_caches
# dd if=/data/ZZZ of=/dev/null
^C15123937+0 records in
15123936+0 records out
7743455232 bytes (7.7 GB, 7.2 GiB) copied, 60.8407 s, 127 MB/s  <<<<<<

io size - 128k, small iops, good throughput (limited by EBS bandwidth)

Writes were fine in both cases: io size 128k, good throughput limited by EBS bandwidth only

Is above situation typical for small read-ahead ("price for small fast reads") or it's something
wrong with my setup?
[It's not XFS mailing list, but as somebody here may know this, ] Why in case of small RA
even large reads (bs=128k) are converted to multiple small reads?

Regards,
Kyrill


________________________________
From: kurt greaves <kurt@instaclustr.com>
Sent: Tuesday, May 8, 2018 2:12:40 AM
To: User
Subject: Re: compaction: huge number of random reads

If you've got small partitions/small reads you should test lowering your compression chunk
size on the table and disabling read ahead. This sounds like it might just be a case of read
amplification.

On Tue., 8 May 2018, 05:43 Kyrylo Lebediev, <Kyrylo_Lebediev@epam.com<mailto:Kyrylo_Lebediev@epam.com>>
wrote:

Dear Experts,


I'm observing strange behavior on a cluster 2.1.20 during compactions.


My setup is:

12 nodes  m4.2xlarge (8 vCPU, 32G RAM) Ubuntu 16.04, 2T EBS gp2.

Filesystem: XFS, blocksize 4k, device read-ahead - 4k

/sys/block/vxdb/queue/nomerges = 0

SizeTieredCompactionStrategy


After data loads when effectively nothing else is talking to the cluster and compactions is
the only activity, I see something like this:
$ iostat -dkx 1
...


Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await
r_await w_await  svctm  %util
xvda              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00
   0.00    0.00   0.00   0.00
xvdb              0.00     0.00 4769.00  213.00 19076.00 26820.00    18.42     7.95    1.17
   1.06    3.76   0.20 100.00

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await
r_await w_await  svctm  %util
xvda              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00
   0.00    0.00   0.00   0.00
xvdb              0.00     0.00 6098.00  177.00 24392.00 22076.00    14.81     6.46    1.36
   0.96   15.16   0.16 100.00

Writes are fine: 177 writes/sec <-> ~22Mbytes/sec,

But for some reason compactions generate a huge number of small reads:
6098 reads/s <-> ~24Mbytes/sec.  ===>   Read size is 4k


Why instead much smaller amount of large reads I'm getting huge number of 4k reads instead?

What could be the reason?


Thanks,

Kyrill



Mime
View raw message