incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Theroux <mthero...@yahoo.com>
Subject Re: Alternate "major compaction"
Date Thu, 11 Jul 2013 13:41:47 GMT
Information is only deleted from Cassandra during a compaction.  Using SizeTieredCompaction,
compaction only occurs when a number of similarly sized sstables are combined into a new sstable.
 

When you perform a major compaction, all sstables are combined into one, very large, sstable.
 As a result, any tombstoned data in that large sstable will only be removed when a number
of very large sstable exists.  This means tombstoned data maybe trapped in that sstable for
a very long time (or indefinitely depending on your usecase).

-Mike

On Jul 11, 2013, at 9:31 AM, Brian Tarbox wrote:

> Perhaps I should already know this but why is running a major compaction considered so
bad?  We're running 1.1.6.
> 
> Thanks.
> 
> 
> On Thu, Jul 11, 2013 at 7:51 AM, Takenori Sato <tsato@cloudian.com> wrote:
> Hi,
> 
> I think it is a common headache for users running a large Cassandra cluster in production.
> 
> 
> Running a major compaction is not the only cause, but more. For example, I see two typical
scenario.
> 
> 1. backup use case
> 2. active wide row
> 
> In the case of 1, say, one data is removed a year later. This means, tombstone on the
row is 1 year away from the original row. To remove an expired row entirely, a compaction
set has to include all the rows. So, when do the original, 1 year old row, and the tombstoned
row are included in a compaction set? It is likely to take one year.
> 
> In the case of 2, such an active wide row exists in most of sstable files. And it typically
contains many expired columns. But none of them wouldn't be removed entirely because a compaction
set practically do not include all the row fragments.
> 
> 
> Btw, there is a very convenient MBean API is available. It is CompactionManager's forceUserDefinedCompaction.
You can invoke a minor compaction on a file set you define. So the question is how to find
an optimal set of sstable files.
> 
> Then, I wrote a tool to check garbage, and print outs some useful information to find
such an optimal set.
> 
> Here's a simple log output.
> 
> # /opt/cassandra/bin/checksstablegarbage -e /cassandra_data/UserData/Test5_BLOB-hc-4-Data.db
> [Keyspace, ColumnFamily, gcGraceSeconds(gcBefore)] = [UserData, Test5_BLOB, 300(1373504071)]
> ===================================================================================
> ROW_KEY, TOTAL_SIZE, COMPACTED_SIZE, TOMBSTONED, EXPIRED, REMAINNING_SSTABLE_FILES
> ===================================================================================
> hello5/100.txt.1373502926003, 40, 40, YES, YES, Test5_BLOB-hc-3-Data.db    
> -----------------------------------------------------------------------------------
> TOTAL, 40, 40
> ===================================================================================
> REMAINNING_SSTABLE_FILES means any other sstable files that contain the respective row.
So, the following is an optimal set.
> 
> # /opt/cassandra/bin/checksstablegarbage -e /cassandra_data/UserData/Test5_BLOB-hc-4-Data.db
/cassandra_data/UserData/Test5_BLOB-hc-3-Data.db 
> [Keyspace, ColumnFamily, gcGraceSeconds(gcBefore)] = [UserData, Test5_BLOB, 300(1373504131)]
> ===================================================================================
> ROW_KEY, TOTAL_SIZE, COMPACTED_SIZE, TOMBSTONED, EXPIRED, REMAINNING_SSTABLE_FILES
> ===================================================================================
> hello5/100.txt.1373502926003, 223, 0, YES, YES
> -----------------------------------------------------------------------------------
> TOTAL, 223, 0
> ===================================================================================
> This tool relies on SSTableReader and an aggregation iterator as Cassandra does in compaction.
I was considering to share this with the community. So let me know if anyone is interested.
> 
> Ah, note that it is based on 1.0.7. So I will need to check and update for newer versions.
> 
> Thanks,
> Takenori
> 
> 
> On Thu, Jul 11, 2013 at 6:46 PM, Tomàs Núnez <tomas.nunez@groupalia.com> wrote:
> Hi
> 
> About a year ago, we did a major compaction in our cassandra cluster (a n00b mistake,
I know), and since then we've had huge sstables that never get compacted, and we were condemned
to repeat the major compaction process every once in a while (we are using SizeTieredCompaction
strategy, and we've not avaluated yet LeveledCompaction, because it has its downsides, and
we've had no time to test all of them in our environment).
> 
> I was trying to find a way to solve this situation (that is, do something like a major
compaction that writes small sstables, not huge as major compaction does), and I couldn't
find it in the documentation. I tried cleanup and scrub/upgradesstables, but they don't do
that (as documentation states). Then I tried deleting all data in a node and then bootstrapping
it (or "nodetool rebuild"-ing it), hoping that this way the sstables would get cleaned from
deleted records and updates. But the deleted node just copied the sstables from another node
as they were, cleaning nothing. 
> 
> So I tried a new approach: I switched the sstable compaction strategy (SizeTiered to
Leveled), forcing the sstables to be rewritten from scratch, and then switching it back (Leveled
to SizeTiered). It took a while (but so do the major compaction process) and it worked, I
have smaller sstables, and I've regained a lot of disk space.
> 
> I'm happy with the results, but it doesn't seem a orthodox way of "cleaning" the sstables.
What do you think, is it something wrong or crazy? Is there a different way to achieve the
same thing?
> 
> Let's put an example:
> Suppose you have a write-only columnfamily (no updates and no deletes, so no need for
LeveledCompaction, because SizeTiered works perfectly and requires less I/O) and you mistakenly
run a major compaction on it. After a few months you need more space and you delete half the
data, and you find out that you're not freeing half the disk space, because most of those
records were in the "major compacted" sstables. How can you free the disk space? Waiting will
do you no good, because the huge sstable won't get compacted anytime soon. You can run another
major compaction, but that would just postpone the real problem. Then you can switch compaction
strategy and switch it back, as I just did. Is there any other way?
> 
> -- 
> <groupalia.jpg>
> www.groupalia.com	
> Tomàs Núñez
> IT-Sysprod
> Tel. + 34 93 159 31 00 
> Fax. + 34 93 396 18 52
> Llull, 95-97, 2º planta, 08005 Barcelona
> Skype: tomas.nunez.groupalia
> tomas.nunez@groupalia.com
> <twitter.png> Twitter    <facebook.png> Facebook    <linkedin.png>
Linkedin
> 
> 


Mime
View raw message