incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mike <mthero...@yahoo.com>
Subject Re: Column Family migration/tombstones
Date Sat, 05 Jan 2013 17:44:40 GMT
A couple more questions.

When these rows are deleted, tombstones will be created and stored in 
more recent sstables.  Upon compaction of sstables, and after 
gc_grace_period, I presume cassandra will have removed all traces of 
that row from disk.

However, after deleting such a large amount of information, there is no 
guarantee that Cassandra will compact these two tables together, causing 
the data to be deleted (right?).  Therefore, even after gc_grace_period, 
a large amount of space may still be used.

Is there a way, other than a major compaction, to clean up all this old 
data?  I assume a nodetool scrub will cleanup old tombstones only if 
that row is not in another sstable?

Do tombstones take up bloomfilter space after gc_grace_period?

-Mike

On 1/2/2013 6:41 PM, aaron morton wrote:
>> 1) As one can imagine, the index and bloom filter for this column family is large.
 Am I correct to assume that bloom filter and index space will not be reduced until after
gc_grace_period?
> Yes.
>
>> 2) If I would manually run repair across a cluster, is there a process I can use
to safely remove these tombstones before gc_grace period to free this memory sooner?
> There is nothing to specifically purge tombstones.
>
> You can temporarily reduce the gc_grace_seconds and then trigger compaction. Either by
reducing the min_compaction_threshold to 2 and doing a flush. Or by kicking of a user defined
compaction using the JMX interface.
>
>> 3) Any words of warning when undergoing this?
> Make sure you have a good breakfast.
> (It's more general advice than Cassandra specific.)
>
>
> Cheers
>
> -----------------
> Aaron Morton
> Freelance Cassandra Developer
> New Zealand
>
> @aaronmorton
> http://www.thelastpickle.com
>
> On 30/12/2012, at 8:51 AM, Mike <mtheroux2@yahoo.com> wrote:
>
>> Hello,
>>
>> We are undergoing a change to our internal datamodel that will result in the eventual
deletion of over a hundred million rows from a Cassandra column family.  From what I understand,
this will result in the generation of tombstones, which will be cleaned up during compaction,
after gc_grace_period time (default: 10 days).
>>
>> A couple of questions:
>>
>> 1) As one can imagine, the index and bloom filter for this column family is large.
 Am I correct to assume that bloom filter and index space will not be reduced until after
gc_grace_period?
>>
>> 2) If I would manually run repair across a cluster, is there a process I can use
to safely remove these tombstones before gc_grace period to free this memory sooner?
>>
>> 3) Any words of warning when undergoing this?
>>
>> We are running Cassandra 1.1.2 on a 6 node cluster and a Replication Factor of 3.
 We use LOCAL_QUORM consistency for all operations.
>>
>> Thanks!
>> -Mike


Mime
View raw message