cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ariel Weisberg (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (CASSANDRA-13241) Lower default chunk_length_in_kb from 64kb to 4kb
Date Wed, 01 Mar 2017 02:50:45 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-13241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15888953#comment-15888953
] 

Ariel Weisberg edited comment on CASSANDRA-13241 at 3/1/17 2:50 AM:
--------------------------------------------------------------------

[~brstgt] That is basically what I was thinking but don't keep two separate arrays. Do it
in a single array so that when you cache miss you pull in the entire section you are looking
for. Assuming 128 byte alignment you would get one 8 byte value and then 60 2-byte values.

It could also be 40 3-byte values that are not relative to each other but just the one absolute
offset. Then you don't have do loop summing.


was (Author: aweisberg):
[~brstgt] That is basically what I was thinking but don't keep two separate arrays. Do it
in a single array so that when you cache miss and you pull in the entire section you are looking
for. Assuming 128 byte alignment you would get one 8 byte value and then 60 2-byte values.

It could also be 40 3-byte values that are not relative to each other but just the one absolute
offset. Then you don't have do a loop summing.

> Lower default chunk_length_in_kb from 64kb to 4kb
> -------------------------------------------------
>
>                 Key: CASSANDRA-13241
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-13241
>             Project: Cassandra
>          Issue Type: Wish
>          Components: Core
>            Reporter: Benjamin Roth
>
> Having a too low chunk size may result in some wasted disk space. A too high chunk size
may lead to massive overreads and may have a critical impact on overall system performance.
> In my case, the default chunk size lead to peak read IOs of up to 1GB/s and avg reads
of 200MB/s. After lowering chunksize (of course aligned with read ahead), the avg read IO
went below 20 MB/s, rather 10-15MB/s.
> The risk of (physical) overreads is increasing with lower (page cache size) / (total
data size) ratio.
> High chunk sizes are mostly appropriate for bigger payloads pre request but if the model
consists rather of small rows or small resultsets, the read overhead with 64kb chunk size
is insanely high. This applies for example for (small) skinny rows.
> Please also see here:
> https://groups.google.com/forum/#!topic/scylladb-dev/j_qXSP-6-gY
> To give you some insights what a difference it can make (460GB data, 128GB RAM):
> - Latency of a quite large CF: https://cl.ly/1r3e0W0S393L
> - Disk throughput: https://cl.ly/2a0Z250S1M3c
> - This shows, that the request distribution remained the same, so no "dynamic snitch
magic": https://cl.ly/3E0t1T1z2c0J



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Mime
View raw message