cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jeff Jirsa (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CASSANDRA-13780) ADD Node streaming throughput performance
Date Tue, 29 Aug 2017 02:38:00 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-13780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16144667#comment-16144667
] 

Jeff Jirsa commented on CASSANDRA-13780:
----------------------------------------

{quote}
2. compaction on the newly added node - compaction has fallen behind, with anywhere from 4,000
to 10,000 SSTables at any given time. It took 3 weeks for compaction to finish on each newly
added node. Increasing number of compaction threads to match number of CPU (40) and increasing
compaction throughput to 32MB/s seemed to be the sweet spot.
{quote}

This is a known limitation of vnodes, and has been since they were introduced in 1.2. It's
less awful if you use LCS, but obviously that typically requires quite a bit more IO. I'm
not going to link it here, but if you google "real world dtcs for operators", I cover this
for time-series use cases in my 2015 cassandra summit talk (which is where I introduced TWCS
to solve problems I encountered doing exactly what you're trying to do now.

{quote}
3. TWCS buckets on new node, data streamed to this node over 4 1/2 days. Compaction correctly
placed the data in daily files, but the problem is the file dates reflect when compaction
created the file and not the date of the last record written in the TWCS bucket, which will
cause the files to remain around much longer than necessary.
{quote}

TWCS ignores the file dates, it uses the sstable's max timestamp value from the metadata.
This should be respected after streaming.

{quote}
1. What can be done to substantially improve the performance of adding a new node?
{quote}

Jason has been working on fixing streaming. Right now it's CPU bound rebuilding sstable components
(which is CPU intensive, generates a lot of garbage re-compressing and whatnot).

{quote}
2. Can compaction on TWCS partitions for newly added nodes change the file create date to
match the highest date record in the file or add another piece of meta-data to the TWCS files
that reflect the file drop date so that TWCS partitions can be dropped consistently?
{quote}

Pretty sure this is a nonissue, because as I mentioned, TWCS doesnt care about the sstable's
file creation date, it uses the metadata max timestamp.

{quote}
Requirement is to double cluster size, capacity, and ingestion volume within a few weeks.
{quote}

Usually people who need to grow rapidly tend not to use vnodes, so they can bootstrap more
than one node at a time simultaneously. It's too late to give you that advice, though, so
here's something slightly different:

If you can spare the hardware (easier in a cloud environment than physical), you can add a
whole new "datacenter" with RF=0, then ALTER KEYSPACE to set RF=3 (or whatever you use), then
{{nodetool rebuild}} it in place to stream data to it all at once, then tear down the old
one. That requires a bit of extra hardware, but let's you greatly increase cluster size very
quickly.




> ADD Node streaming throughput performance
> -----------------------------------------
>
>                 Key: CASSANDRA-13780
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-13780
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>         Environment: Linux 2.6.32-696.3.2.el6.x86_64 #1 SMP Mon Jun 19 11:55:55 PDT 2017
x86_64 x86_64 x86_64 GNU/Linux
> Architecture:          x86_64
> CPU op-mode(s):        32-bit, 64-bit
> Byte Order:            Little Endian
> CPU(s):                40
> On-line CPU(s) list:   0-39
> Thread(s) per core:    2
> Core(s) per socket:    10
> Socket(s):             2
> NUMA node(s):          2
> Vendor ID:             GenuineIntel
> CPU family:            6
> Model:                 79
> Model name:            Intel(R) Xeon(R) CPU E5-2630 v4 @ 2.20GHz
> Stepping:              1
> CPU MHz:               2199.869
> BogoMIPS:              4399.36
> Virtualization:        VT-x
> L1d cache:             32K
> L1i cache:             32K
> L2 cache:              256K
> L3 cache:              25600K
> NUMA node0 CPU(s):     0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38
> NUMA node1 CPU(s):     1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39
>              total       used       free     shared    buffers     cached
> Mem:          252G       217G        34G       708K       308M       149G
> -/+ buffers/cache:        67G       185G
> Swap:          16G         0B        16G
>            Reporter: Kevin Rivait
>             Fix For: 3.0.9
>
>
> Problem: Adding a new node to a large cluster runs at least 1000x slower than what the
network and node hardware capacity can support, taking several days per new node.  Adjusting
stream throughput and other YAML parameters seems to have no effect on performance.  Essentially,
it appears that Cassandra has an architecture scalability growth problem when adding new nodes
to a moderate to high data ingestion cluster because Cassandra cannot add new node capacity
fast enough to keep up with increasing data ingestion volumes and growth.
> Initial Configuration: 
> Running 3.0.9 and have implemented TWCS on one of our largest table.
> Largest table partitioned on (ID, YYYYMM)  using 1 day buckets with a TTL of 60 days.
> Next release will change partitioning to (ID, YYYYMMDD) so that partitions are aligned
with daily TWCS buckets.
> Each node is currently creating roughly a 30GB SSTable per day.
> TWCS working as expected,  daily SSTables are dropping off daily after 70 days ( 60 +
10 day grace)
> Current deployment is a 28 node 2 datacenter cluster, 14 nodes in each DC , replication
factor 3
> Data directories are backed with 4 - 2TB SSDs on each node  and a 1 800GB SSD for commit
logs.
> Requirement is to double cluster size, capacity, and ingestion volume within a few weeks.
> Observed Behavior:
> 1. streaming throughput during add node – we observed maximum 6 Mb/s streaming from
each of the 14 nodes on a 20Gb/s switched network, taking at least 106 hours for each node
to join cluster and each node is only about 2.2 TB is size.
> 2. compaction on the newly added node - compaction has fallen behind, with anywhere from
4,000 to 10,000 SSTables at any given time.  It took 3 weeks for compaction to finish on each
newly added node.   Increasing number of compaction threads to match number of CPU (40)  and
increasing compaction throughput to 32MB/s seemed to be the sweet spot. 
> 3. TWCS buckets on new node, data streamed to this node over 4 1/2 days.  Compaction
correctly placed the data in daily files, but the problem is the file dates reflect when compaction
created the file and not the date of the last record written in the TWCS bucket, which will
cause the files to remain around much longer than necessary.  
> Two Questions:
> 1. What can be done to substantially improve the performance of adding a new node?
> 2. Can compaction on TWCS partitions for newly added nodes change the file create date
to match the highest date record in the file -or- add another piece of meta-data to the TWCS
files that reflect the file drop date so that TWCS partitions can be dropped consistently?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@cassandra.apache.org
For additional commands, e-mail: commits-help@cassandra.apache.org


Mime
View raw message