cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Aleksey Yeschenko (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (CASSANDRA-12200) Backlogged compactions can make repair on trivially small tables waiting for a long time to finish
Date Fri, 15 Jul 2016 14:27:20 GMT

     [ https://issues.apache.org/jira/browse/CASSANDRA-12200?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Aleksey Yeschenko updated CASSANDRA-12200:
------------------------------------------
    Issue Type: Improvement  (was: Bug)

> Backlogged compactions can make repair on trivially small tables waiting for a long time
to finish
> --------------------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-12200
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-12200
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Wei Deng
>
> In C* 3.0 we started to use incremental repair by default. However, this seems to create
a repair performance problem if you have a relatively write-heavy workload that can drive
all available concurrent_compactors to be used by active compactions.
> I was able to demonstrate this issue by the following scenario:
> 1. On a three-node C* 3.0.7 cluster, use "cassandra-stress write n=100000000" to generate
100GB of data with keyspace1.standard1 table using LCS (ctrl+c the stress client once the
data size on each node reaches 35+GB).
> 2. At this point, there will be hundreds of L0 SSTables waiting for LCS to digest on
each node, and with concurrent_compactors set to default at 2, the two compaction threads
are constantly busy processing the backlogged L0 SSTables.
> 3. Now create a new keyspace called "trivial_ks" with RF=3 and create a small two-column
CQL table in it, and insert 6 records.
> 4. Start a "nodetool repair trivial_ks" session on one of the nodes, and watch the following
behavior:
> {noformat}
> automaton@wdengdse50google-98425b985-3:~$ nodetool repair trivial_ks
> [2016-07-13 01:57:28,364] Starting repair command #1, repairing keyspace trivial_ks with
repair options (parallelism: parallel, primary range: false, incremental: true, job threads:
1, ColumnFamilies: [], dataCenters: [], hosts: [], # of ranges: 3)
> [2016-07-13 01:57:31,027] Repair session 27212dd0-489d-11e6-a6d6-cd06faa0aaa2 for range
[(3074457345618258602,-9223372036854775808], (-9223372036854775808,-3074457345618258603],
(-3074457345618258603,3074457345618258602]] finished (progress: 66%)
> [2016-07-13 02:07:47,637] Repair completed successfully
> [2016-07-13 02:07:47,657] Repair command #1 finished in 10 minutes 19 seconds
> {noformat}
> Basically for such a small table it took 10+ minutes to finish the repair. Looking at
debug.log for this particular repair session UUID, you will find that all nodes were able
to pass through validation compaction within 15ms, but one of the nodes actually got stuck
waiting for a compaction slot because it has to do an anti-compaction step before it can finally
tell the initiating node that it's done with its part of the repair session, so it took 10+
minutes for one compaction slot to be freed up, like shown in the following debug.log entries:
> {noformat}
> DEBUG [AntiEntropyStage:1] 2016-07-13 01:57:30,956  RepairMessageVerbHandler.java:149
- Got anticompaction request AnticompactionRequest{parentRepairSession=27103de0-489d-11e6-a6d6-cd06faa0aaa2}
org.apache.cassandra.repair.messages.AnticompactionRequest@34449ff4
> <...>
> <snip>
> <...>
> DEBUG [CompactionExecutor:5] 2016-07-13 02:07:47,506  CompactionTask.java:217 - Compacted
(286609e0-489d-11e6-9e03-1fd69c5ec46c) 32 sstables to [/var/lib/cassandra/data/keyspace1/standard1-9c02e9c1487c11e6b9161dbd340a212f/mb-499-big,]
to level=0.  2,892,058,050 bytes to 2,874,333,820 (~99% of original) in 616,880ms = 4.443617MB/s.
 0 total partitions merged to 12,233,340.  Partition merge counts were {1:12086760, 2:146580,
}
> INFO  [CompactionExecutor:5] 2016-07-13 02:07:47,512  CompactionManager.java:511 - Starting
anticompaction for trivial_ks.weitest on 1/[BigTableReader(path='/var/lib/cassandra/data/trivial_ks/weitest-538b07d1489b11e6a9ef61c6ff848952/mb-1-big-Data.db')]
sstables
> INFO  [CompactionExecutor:5] 2016-07-13 02:07:47,513  CompactionManager.java:540 - SSTable
BigTableReader(path='/var/lib/cassandra/data/trivial_ks/weitest-538b07d1489b11e6a9ef61c6ff848952/mb-1-big-Data.db')
fully contained in range (-9223372036854775808,-9223372036854775808], mutating repairedAt
instead of anticompacting
> INFO  [CompactionExecutor:5] 2016-07-13 02:07:47,570  CompactionManager.java:578 - Completed
anticompaction successfully
> {noformat}
> Since validation compaction has its own threads outside of the regular compaction thread
pool restricted by concurrent_compactors, we were able to pass through validation compaction
without any issue. If we could treat anti-compaction the same way (i.e. to give it its own
thread pool), we can avoid this kind of repair performance problem from happening.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message