cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tomàs Núnez <>
Subject Alternate "major compaction"
Date Thu, 11 Jul 2013 09:46:02 GMT

About a year ago, we did a major compaction in our cassandra cluster (a
n00b mistake, I know), and since then we've had huge sstables that never
get compacted, and we were condemned to repeat the major compaction process
every once in a while (we are using SizeTieredCompaction strategy, and
we've not avaluated yet LeveledCompaction, because it has its downsides,
and we've had no time to test all of them in our environment).

I was trying to find a way to solve this situation (that is, do something
like a major compaction that writes small sstables, not huge as major
compaction does), and I couldn't find it in the documentation. I tried
cleanup and scrub/upgradesstables, but they don't do that (as documentation
states). Then I tried deleting all data in a node and then bootstrapping it
(or "nodetool rebuild"-ing it), hoping that this way the sstables would get
cleaned from deleted records and updates. But the deleted node just copied
the sstables from another node as they were, cleaning nothing.

So I tried a new approach: I switched the sstable compaction strategy
(SizeTiered to Leveled), forcing the sstables to be rewritten from scratch,
and then switching it back (Leveled to SizeTiered). It took a while (but so
do the major compaction process) and it worked, I have smaller sstables,
and I've regained a lot of disk space.

I'm happy with the results, but it doesn't seem a orthodox way of
"cleaning" the sstables. What do you think, is it something wrong or crazy?
Is there a different way to achieve the same thing?

Let's put an example:
Suppose you have a write-only columnfamily (no updates and no deletes, so
no need for LeveledCompaction, because SizeTiered works perfectly and
requires less I/O) and you mistakenly run a major compaction on it. After a
few months you need more space and you delete half the data, and you find
out that you're not freeing half the disk space, because most of those
records were in the "major compacted" sstables. How can you free the disk
space? Waiting will do you no good, because the huge sstable won't get
compacted anytime soon. You can run another major compaction, but that
would just postpone the real problem. Then you can switch compaction
strategy and switch it back, as I just did. Is there any other way?

[image: Groupalia] <> <>Tomàs NúñezIT-SysprodTel. + 34
93 159 31 00 Fax. + 34 93 396 18 52Llull, 95-97, 2º planta, 08005
Twitter] Twitter <>    [image: Twitter]
 Facebook <>    [image: Twitter]
 Linkedin <>

View raw message