cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Frederick Ryckbosch <>
Subject Concurrent major compaction
Date Thu, 10 May 2012 16:27:17 GMT

We have a single-node cassandra that contains volatile data: every day about 2 Gb of data
is written, this data is kept for 7 days and then removed (using TTL). To avoid that the application
becomes slow during a large compaction, we do a major compaction every night (less users,
less performance impact).

The major compaction is CPU bound: it uses about 1 core and only consumes 4 Mb/sec disk IO.
We would like to scale the compaction with the resources available in the machine (cores,
disks). Enabling multithreaded_compaction didn't help a lot, the CPU usage goes up to 120%
of one core, but does not scale with the number of cores.

To make the compaction scale with the number of cores in our machine, we tried to perform
a major compaction on multiple column families (in the same keyspace) at the same time using
`nodetool -h localhost compact testSpace data1 data2`, however the 2 compactions are executed
serially in stead of concurrently, with concurrent_compactors set to 4 (the number of cores).

Is this normal behavior (both the multihreading and concurrent compactions) ? Is there any
way to make the major compactions scale with the number of cores in the machine ?

Thanks !

View raw message