cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Wei Deng (JIRA)" <>
Subject [jira] [Created] (CASSANDRA-12526) For LCS, single SSTable up-level is handled inefficiently
Date Tue, 23 Aug 2016 22:04:20 GMT
Wei Deng created CASSANDRA-12526:

             Summary: For LCS, single SSTable up-level is handled inefficiently
                 Key: CASSANDRA-12526
             Project: Cassandra
          Issue Type: Bug
          Components: Compaction
            Reporter: Wei Deng

I'm using the latest trunk (as of August 2016, which probably is going to be 3.10) to run
some experiments on LeveledCompactionStrategy and noticed this inefficiency.

The test data is generated using cassandra-stress default parameters (keyspace1.standard1),
so as you can imagine, it consists of a ton of newly inserted partitions that will never merge
in compactions, which is probably the worst kind of workload for LCS (however, I'll detail
later why this scenario should not be ignored as a corner case; for now, let's just assume
we still want to handle this scenario efficiently).

After the compaction test is done, I scrubbed debug.log for patterns that match  the "Compacted"
summary so that I can see how long each individual compaction took and how many bytes they
processed. The search pattern is like the following:

grep 'Compacted.*standard1' debug.log

Interestingly, I noticed a lot of the finished compactions are marked as having *only one*
SSTable involved. With the workload mentioned above, the "single SSTable" compactions actually
consist of the majority of all compactions (as shown below), so its efficiency can affect
the overall compaction throughput quite a bit.

automaton@0ce59d338-1:~/cassandra-trunk/logs$ grep 'Compacted.*standard1' debug.log-test1
| wc -l
automaton@0ce59d338-1:~/cassandra-trunk/logs$ grep 'Compacted.*standard1' debug.log-test1
| grep ") 1 sstable" | wc -l

By looking at the code, it appears that there's a way to directly edit the level of a particular
SSTable like the following:

sstable.descriptor.getMetadataSerializer().mutateLevel(sstable.descriptor, targetLevel);

Compared to what we have now (reading the whole single-SSTable from old level and writing
out the same single-SSTable at the new level), the only difference I could think of by using
this approach is that the new SSTable will have the same file name (sequence number) as the
old one's, which could break some assumptions on some other part of the code. However, not
having to go through the full read/write IO, and not having to bear the overhead of cleaning
up the old file, creating the new file, creating more churns in heap and file buffer, it seems
the benefits outweigh the inconvenience. So I'd argue this JIRA belongs to LHF and should
be made available in 3.0.x as well.

This message was sent by Atlassian JIRA

View raw message