cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chen Shen (JIRA)" <>
Subject [jira] [Commented] (CASSANDRA-10862) LCS repair: compact tables before making available in L0
Date Thu, 09 Jun 2016 00:30:21 GMT


Chen Shen commented on CASSANDRA-10862:

I've done some investigation and I find it might not so easy to schedule a compaction on L0
table on reception as the only straightforward way to trigger a compaction is by submitting
a task to CompactionManager.submitBackground, and 1) it's not guaranteed to be executed according
to my knowledge 2) submitBackground need a `ColumnFamilyStore` as input, so we need either
create a new CFS, or split the compaction strategy out of CompactionManager, each of which
might need lots of work.
So instead I am doing a different tricky approach: Don't add tables to CFS until the number
of L0 sstables is smaller than a threshold. And subscribe to `SSTableListChangedNotification`
so that the `OnCompletionRunnable` could sleep and wait on notification. 
Is this a right direction? I have a commit here
if you want to take a look. I'm also planing to apply this patch to our production tier to
see if this helps.

> LCS repair: compact tables before making available in L0
> --------------------------------------------------------
>                 Key: CASSANDRA-10862
>                 URL:
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Compaction, Streaming and Messaging
>            Reporter: Jeff Ferland
>            Assignee: Chen Shen
> When doing repair on a system with lots of mismatched ranges, the number of tables in
L0 goes up dramatically, as correspondingly goes the number of tables referenced for a query.
Latency increases dramatically in tandem.
> Eventually all the copied tables are compacted down in L0, then copied into L1 (which
may be a very large copy), finally reducing the number of SSTables per query into the manageable
> It seems to me that the cleanest answer is to compact after streaming, then mark tables
available rather than marking available when the file itself is complete.

This message was sent by Atlassian JIRA

View raw message