cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Wei Deng (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CASSANDRA-12591) Re-evaluate the default 160MB sstable_size_in_mb choice in LCS
Date Thu, 01 Sep 2016 18:42:20 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-12591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15456278#comment-15456278
] 

Wei Deng commented on CASSANDRA-12591:
--------------------------------------

So I've done some quick initial tests using latest trunk (i.e. C* 3.10) code just to prove
the point whether this is a worthwhile effort. The hardware I'm using is still not a typical/adequate-enough
configuration I'd use for a production Cassandra deployment (GCE n1-standard-4, with 4 vCPUs,
15GB RAM and a single 1TB persistent disk that's spindle-based), but I'm already seeing a
positive sign that shows bigger max_sstable_size can be helpful for compaction throughput.

Based on the initial results (at each max_sstable_size, I did three runs from scratch; for
all runs I set compaction threads to 4, and since there will be no throttling enforced by
compaction-stress the setting would be equivalent to setting compaction_throughput_mb_per_sec
to 0, the initial SSTable files generated by `compaction-stress write` are using the default
128MB size, which is inline with the typical flush size I saw on this kind of hardware using
default cassandra.yaml configuration parameters), using 10GB of stress data generated by the
blogpost data model [here|https://gist.githubusercontent.com/tjake/8995058fed11d9921e31/raw/a9334d1090017bf546d003e271747351a40692ea/blogpost.yaml],
the overall compaction times with 1280MB max_sstable_size are: 7m16.456s, 7m7.225s, 7m9.102s;
the overall compaction times with 160MB max_sstable_size are: 9m16.715s, 9m28.146s, 9m7.192s.

Given these numbers, the average seconds to finish compaction with 1280MB max_sstable_size
is 430.66, and the average seconds to finish compaction with 160MB max_sstable_size is 557.33,
which is already a 23% improvement.

I realize 10GB data is barely enough to test 1280MB sstable size as the data will only go
from L0->L1, so the next run I'm going to use 100GB data size on this hardware (keeping
everything else the same) and see how the numbers compare.

> Re-evaluate the default 160MB sstable_size_in_mb choice in LCS
> --------------------------------------------------------------
>
>                 Key: CASSANDRA-12591
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-12591
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Compaction
>            Reporter: Wei Deng
>              Labels: lcs
>
> There has been some effort from CASSANDRA-5727 in benchmarking and evaluating the best
max_sstable_size used by LeveledCompactionStrategy, and the conclusion derived from that effort
was to use 160MB as the most optimal size for both throughput (i.e. the time spent on compaction,
the smaller the better) and the amount of bytes compacted (to avoid write amplification, the
less the better).
> However, when I read more into that test report (the short [comment|https://issues.apache.org/jira/browse/CASSANDRA-5727?focusedCommentId=13722571&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13722571]
describing the tests), I realized it was conducted on a hardware with the following configuration:
"a single rackspace node with 2GB of ram." I'm not sure if this was an ok hardware configuration
for production Cassandra deployment at that time (mid-2013), but it is definitely far lower
from today's hardware standard now.
> Given that we now have compaction-stress which is able to generate SSTables based on
user defined stress profile with user defined table schema and compaction parameters (compatible
to cassandra-stress), it would be a useful effort to relook at this number using a more realistic
hardware configuration and see if 160MB is still the optimal choice. It might also impact
our perceived "practical" node density with LCS nodes if it turns out bigger max_sstable_size
actually works better as it will allow less number of SSTables (and hence less level and less
write amplification) per node with bigger density.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message