cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Johnny Miller <johnny.p.mil...@gmail.com>
Subject Re: Questions on LCS behaviour after big BulkLoad cluster bootstrap
Date Tue, 12 Jul 2016 20:37:13 GMT
Garo,

When your loading data in using LCS in a bulk fashion like this, there are a few things you
should do.

You can disable STCS in L0 (https://issues.apache.org/jira/browse/CASSANDRA-6621 <https://issues.apache.org/jira/browse/CASSANDRA-6621>)
with the JVM flag "-Dcassandra.disable_stcs_in_l0=true” which should stop you getting huge
sstables in L0 while LCS is catching up. Once the load is complete you can then shutdown the
node and perform an sstableofflinerelevel (https://docs.datastax.com/en/cassandra/2.2/cassandra/tools/toolsSSTableOfflineRelevel.html
<https://docs.datastax.com/en/cassandra/2.2/cassandra/tools/toolsSSTableOfflineRelevel.html>)
. This should help LCS catch up with things and reduce the pending compactions etc. however,
it may just take a while to catchup still

Hope this helps.

Johnny

> On 6 Jul 2016, at 07:56, Juho Mäkinen <juho.makinen@gmail.com> wrote:
> 
> Hello. I'm in the process of migrating my old 60 node cluster into a new 72 node cluster
running 2.2.6. I fired BulkLoader on the old cluster to stream all data from every node in
the old cluster to my new cluster, and I'm now watching as my new cluster is doing compactions.
What I like is to understand the LeveledCompactionStrategy behaviour in more detail.
> 
> I'm taking one node as an example, but all other nodes have quite same situation.
> 
> There are 53 live SSTables in a big table. This can be seen both by looking la-*Data.db
files and also with nodetool cfstats: "SSTables in each level: [31/4, 10, 12, 0, 0, 0, 0,
0, 0]"
> 
> If I look on the SSTable files in the disk I see some huge SSTables, like a 37 GiB, 57
GiB, 74 GiB, which are all on Level 0 (used sstablemetadata to see this). The size of all
live sstables are about 920 GiB.
> 
> Then there are tmp-la-*Data.db and tmplink-la-*Data.db files (the tmplink files are hardlinks
to the tmp file due to CASSANDRA-6916). I guess that these come from the single active compaction.
The total size of these files are around ~65 GiB.
> 
> On the compaction side compactionstats shows that there's just one compaction running,
which is heavily CPU bound: (I've reformatted the output here)
> pending tasks: 5390
> bytes done: 673623792733 (673 GiB)
> bytes left: 3325656896682 (3325 GiB)
> Active compaction remaining time :   2h44m39s
> 
> Why is the bytes done and especially bytes left such big? I don't have that much data
in my node.
> 
> Also how does Cassandra calculate the pending tasks with LCS?
> 
> Why are there a few such big SSTables in the active sstable list? Is it because LCS falls
back to STCS if L0 is too full? Should I use the stcs_im_l0:false option? What will happen
to these big sstables in the future?
> 
> I'm currently just waiting for the compactions to eventually finish, but I'm hoping to
learn in more detail what the system does and possibly to help similar migration in the future.
> 
> Thanks,
> 
>  - Garo
> 
>                                                                      
> 
> 


Mime
View raw message