incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From aaron morton <aa...@thelastpickle.com>
Subject Re: old data / tombstones are not deleted after ttl
Date Wed, 06 Mar 2013 07:16:46 GMT
If you have a data model with long lived and frequently updated rows, you can get around the
"all fragments" problem by running a user defined compaction. 

Look for the CompactionManagerMbean on the JMX API https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/db/compaction/CompactionManagerMBean.java#L67

Cheers

-----------------
Aaron Morton
Freelance Cassandra Developer
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 5/03/2013, at 1:52 AM, Michal Michalski <michalm@opera.com> wrote:

> > I have read in the documentation, that after a major compaction,
> > minor compactions are no longer automatically trigger.
> > Does this mean, that I have to do the nodetool compact regulary? Or
> > is there a way to get back to the automatically minor compactions?
> 
> I think it's one of the most confusing parts of C* docs.
> 
> There's nothing like a "switch" for minor compactions that gets magically turned off
when you trigger major compaction. Minor compactions won't get trigerred automatically for
_some_ time, because you'll only have one gargantuan SSTable and unless you get enough new
(smaller) SSTables to get them compacted together (4 by default), no compactions will kick
in.
> 
> Of course you'll still have one huge SSTable and it will take a lot of time to get another
3 of similar size to get them compacted. I think that it will be a problem for your TTL-based
data model, as you'll have tons of Tombstones in the newer/smaller SSTables that you won't
be able to compact together with the huge SSTable containing data.
> 
> BTW: As far as I remember, there was an "external" tool (I don't remember the name) allowing
to split SSTables - I didn't use it, so I can't suggest you using it, but you may want to
give it a try.
> 
> M.
> 
> W dniu 05.03.2013 09:46, Matthias Zeilinger pisze:
>> Short question afterwards:
>> 
>> I have read in the documentation, that after a major compaction, minor compactions
are no longer automatically trigger.
>> Does this mean, that I have to do the nodetool compact regulary? Or is there a way
to get back to the automatically minor compactions?
>> 
>> Thx,
>> 
>> Br,
>> Matthias Zeilinger
>> Production Operation – Shared Services
>> 
>> P: +43 (0) 50 858-31185
>> M: +43 (0) 664 85-34459
>> E: matthias.zeilinger@bwinparty.com
>> 
>> bwin.party services (Austria) GmbH
>> Marxergasse 1B
>> A-1030 Vienna
>> 
>> www.bwinparty.com
>> 
>> 
>> -----Original Message-----
>> From: Matthias Zeilinger [mailto:Matthias.Zeilinger@bwinparty.com]
>> Sent: Dienstag, 05. März 2013 08:03
>> To: user@cassandra.apache.org
>> Subject: RE: old data / tombstones are not deleted after ttl
>> 
>> Yes it was a major compaction.
>> I know it´s not a great solution, but I needed something to get rid of the old data,
because I went out of diskspace.
>> 
>> Br,
>> Matthias Zeilinger
>> Production Operation – Shared Services
>> 
>> P: +43 (0) 50 858-31185
>> M: +43 (0) 664 85-34459
>> E: matthias.zeilinger@bwinparty.com
>> 
>> bwin.party services (Austria) GmbH
>> Marxergasse 1B
>> A-1030 Vienna
>> 
>> www.bwinparty.com
>> 
>> 
>> -----Original Message-----
>> From: Michal Michalski [mailto:michalm@opera.com]
>> Sent: Dienstag, 05. März 2013 07:47
>> To: user@cassandra.apache.org
>> Subject: Re: old data / tombstones are not deleted after ttl
>> 
>> Was it a major compaction? I ask because it's definitely a solution that had to work,
but it's also a solution that - in general - probably no-one here would suggest you to use.
>> 
>> M.
>> 
>> W dniu 05.03.2013 07:08, Matthias Zeilinger pisze:
>>> Hi,
>>> 
>>> I have done a manually compaction over the nodetool and this worked.
>>> But thx for the explanation, why it wasn´t compacted
>>> 
>>> Br,
>>> Matthias Zeilinger
>>> Production Operation – Shared Services
>>> 
>>> P: +43 (0) 50 858-31185
>>> M: +43 (0) 664 85-34459
>>> E: matthias.zeilinger@bwinparty.com
>>> 
>>> bwin.party services (Austria) GmbH
>>> Marxergasse 1B
>>> A-1030 Vienna
>>> 
>>> www.bwinparty.com
>>> 
>>> From: Bryan Talbot [mailto:btalbot@aeriagames.com]
>>> Sent: Montag, 04. März 2013 23:36
>>> To: user@cassandra.apache.org
>>> Subject: Re: old data / tombstones are not deleted after ttl
>>> 
>>> Those older files won't be included in a compaction until there are min_compaction_threshold
(4) files of that size.  When you get another SS table -Data.db file that is about 12-18GB
then you'll have 4 and they will be compacted together into one new file.  At that time, if
there are any rows with only tombstones that are all older than gc_grace the row will be removed
(assuming the row exists exclusively in the 4 input SS tables).  Columns with data that is
more than TTL seconds old will be written with a tombstone.  If the row does have column values
in SS tables that are not being compacted, the row will not be removed.
>>> 
>>> 
>>> -Bryan
>>> 
>>> On Sun, Mar 3, 2013 at 11:07 PM, Matthias Zeilinger <Matthias.Zeilinger@bwinparty.com<mailto:Matthias.Zeilinger@bwinparty.com>>
wrote:
>>> Hi,
>>> 
>>> I´m running Cassandra 1.1.5 and have following issue.
>>> 
>>> I´m using a 10 days TTL on my CF. I can see a lot of tombstones in there, but
they aren´t deleted after compaction.
>>> 
>>> I have tried a nodetool –cleanup and also a restart of Cassandra, but nothing
happened.
>>> 
>>> total 61G
>>> drwxr-xr-x  2 cassandra dba  20K Mar  4 06:35 .
>>> drwxr-xr-x 10 cassandra dba 4.0K Dec 10 13:05 ..
>>> -rw-r--r--  1 cassandra dba  15M Dec 15 22:04
>>> whatever-he-1398-CompressionInfo.db
>>> -rw-r--r--  1 cassandra dba  19G Dec 15 22:04 whatever-he-1398-Data.db
>>> -rw-r--r--  1 cassandra dba  15M Dec 15 22:04
>>> whatever-he-1398-Filter.db
>>> -rw-r--r--  1 cassandra dba 357M Dec 15 22:04
>>> whatever-he-1398-Index.db
>>> -rw-r--r--  1 cassandra dba 4.3K Dec 15 22:04
>>> whatever-he-1398-Statistics.db
>>> -rw-r--r--  1 cassandra dba 9.5M Feb  6 15:45
>>> whatever-he-5464-CompressionInfo.db
>>> -rw-r--r--  1 cassandra dba  12G Feb  6 15:45 whatever-he-5464-Data.db
>>> -rw-r--r--  1 cassandra dba  48M Feb  6 15:45
>>> whatever-he-5464-Filter.db
>>> -rw-r--r--  1 cassandra dba 736M Feb  6 15:45
>>> whatever-he-5464-Index.db
>>> -rw-r--r--  1 cassandra dba 4.3K Feb  6 15:45
>>> whatever-he-5464-Statistics.db
>>> -rw-r--r--  1 cassandra dba 9.7M Feb 21 19:13
>>> whatever-he-6829-CompressionInfo.db
>>> -rw-r--r--  1 cassandra dba  12G Feb 21 19:13 whatever-he-6829-Data.db
>>> -rw-r--r--  1 cassandra dba  47M Feb 21 19:13
>>> whatever-he-6829-Filter.db
>>> -rw-r--r--  1 cassandra dba 792M Feb 21 19:13
>>> whatever-he-6829-Index.db
>>> -rw-r--r--  1 cassandra dba 4.3K Feb 21 19:13
>>> whatever-he-6829-Statistics.db
>>> -rw-r--r--  1 cassandra dba 3.7M Mar  1 10:46
>>> whatever-he-7578-CompressionInfo.db
>>> -rw-r--r--  1 cassandra dba 4.3G Mar  1 10:46 whatever-he-7578-Data.db
>>> -rw-r--r--  1 cassandra dba  12M Mar  1 10:46
>>> whatever-he-7578-Filter.db
>>> -rw-r--r--  1 cassandra dba 274M Mar  1 10:46
>>> whatever-he-7578-Index.db
>>> -rw-r--r--  1 cassandra dba 4.3K Mar  1 10:46
>>> whatever-he-7578-Statistics.db
>>> -rw-r--r--  1 cassandra dba 3.6M Mar  1 11:21
>>> whatever-he-7582-CompressionInfo.db
>>> -rw-r--r--  1 cassandra dba 4.3G Mar  1 11:21 whatever-he-7582-Data.db
>>> -rw-r--r--  1 cassandra dba 9.7M Mar  1 11:21
>>> whatever-he-7582-Filter.db
>>> -rw-r--r--  1 cassandra dba 236M Mar  1 11:21
>>> whatever-he-7582-Index.db
>>> -rw-r--r--  1 cassandra dba 4.3K Mar  1 11:21
>>> whatever-he-7582-Statistics.db
>>> -rw-r--r--  1 cassandra dba 3.7M Mar  3 12:13
>>> whatever-he-7869-CompressionInfo.db
>>> -rw-r--r--  1 cassandra dba 4.3G Mar  3 12:13 whatever-he-7869-Data.db
>>> -rw-r--r--  1 cassandra dba 9.8M Mar  3 12:13
>>> whatever-he-7869-Filter.db
>>> -rw-r--r--  1 cassandra dba 239M Mar  3 12:13
>>> whatever-he-7869-Index.db
>>> -rw-r--r--  1 cassandra dba 4.3K Mar  3 12:13
>>> whatever-he-7869-Statistics.db
>>> -rw-r--r--  1 cassandra dba 924K Mar  3 18:02
>>> whatever-he-7953-CompressionInfo.db
>>> -rw-r--r--  1 cassandra dba 1.1G Mar  3 18:02 whatever-he-7953-Data.db
>>> -rw-r--r--  1 cassandra dba 2.1M Mar  3 18:02
>>> whatever-he-7953-Filter.db
>>> -rw-r--r--  1 cassandra dba  51M Mar  3 18:02
>>> whatever-he-7953-Index.db
>>> -rw-r--r--  1 cassandra dba 4.3K Mar  3 18:02
>>> whatever-he-7953-Statistics.db
>>> -rw-r--r--  1 cassandra dba 231K Mar  3 20:06
>>> whatever-he-7974-CompressionInfo.db
>>> -rw-r--r--  1 cassandra dba 268M Mar  3 20:06 whatever-he-7974-Data.db
>>> -rw-r--r--  1 cassandra dba 483K Mar  3 20:06
>>> whatever-he-7974-Filter.db
>>> -rw-r--r--  1 cassandra dba  12M Mar  3 20:06
>>> whatever-he-7974-Index.db
>>> -rw-r--r--  1 cassandra dba 4.3K Mar  3 20:06
>>> whatever-he-7974-Statistics.db
>>> -rw-r--r--  1 cassandra dba 116K Mar  4 06:28
>>> whatever-he-8002-CompressionInfo.db
>>> -rw-r--r--  1 cassandra dba 146M Mar  4 06:28 whatever-he-8002-Data.db
>>> -rw-r--r--  1 cassandra dba 646K Mar  4 06:28
>>> whatever-he-8002-Filter.db
>>> -rw-r--r--  1 cassandra dba  16M Mar  4 06:28
>>> whatever-he-8002-Index.db
>>> -rw-r--r--  1 cassandra dba 4.3K Mar  4 06:28
>>> whatever-he-8002-Statistics.db
>>> -rw-r--r--  1 cassandra dba  58K Mar  4 06:28
>>> whatever-he-8003-CompressionInfo.db
>>> -rw-r--r--  1 cassandra dba  67M Mar  4 06:28 whatever-he-8003-Data.db
>>> -rw-r--r--  1 cassandra dba 105K Mar  4 06:28
>>> whatever-he-8003-Filter.db
>>> -rw-r--r--  1 cassandra dba 2.5M Mar  4 06:28
>>> whatever-he-8003-Index.db
>>> -rw-r--r--  1 cassandra dba 4.3K Mar  4 06:28
>>> whatever-he-8003-Statistics.db
>>> -rw-r--r--  1 cassandra dba 230K Mar  4 06:30
>>> whatever-he-8004-CompressionInfo.db
>>> -rw-r--r--  1 cassandra dba 261M Mar  4 06:30 whatever-he-8004-Data.db
>>> -rw-r--r--  1 cassandra dba 480K Mar  4 06:30
>>> whatever-he-8004-Filter.db
>>> -rw-r--r--  1 cassandra dba  12M Mar  4 06:30
>>> whatever-he-8004-Index.db
>>> -rw-r--r--  1 cassandra dba 4.3K Mar  4 06:30
>>> whatever-he-8004-Statistics.db
>>> -rw-r--r--  1 cassandra dba  15K Mar  4 06:30
>>> whatever-he-8005-CompressionInfo.db
>>> -rw-r--r--  1 cassandra dba  16M Mar  4 06:30 whatever-he-8005-Data.db
>>> -rw-r--r--  1 cassandra dba  39K Mar  4 06:30
>>> whatever-he-8005-Filter.db
>>> -rw-r--r--  1 cassandra dba 944K Mar  4 06:30
>>> whatever-he-8005-Index.db
>>> -rw-r--r--  1 cassandra dba 4.3K Mar  4 06:30
>>> whatever-he-8005-Statistics.db
>>> -rw-r--r--  1 cassandra dba 5.0K Mar  4 06:35
>>> whatever-he-8006-CompressionInfo.db
>>> -rw-r--r--  1 cassandra dba 6.7M Mar  4 06:35 whatever-he-8006-Data.db
>>> -rw-r--r--  1 cassandra dba  81K Mar  4 06:35
>>> whatever-he-8006-Filter.db
>>> -rw-r--r--  1 cassandra dba 2.0M Mar  4 06:35
>>> whatever-he-8006-Index.db
>>> -rw-r--r--  1 cassandra dba 4.3K Mar  4 06:35
>>> whatever-he-8006-Statistics.db
>>> 
>>> The things marked in red, I guess, are the old data, but they aren´t deleted.
As you can see on the date, they are older than 10 days.
>>> 
>>> Is there any possibility to delete them?
>>> 
>>> 
>>> Here is also the schema of the CF:
>>> create column family whatever
>>>    with column_type = 'Standard'
>>>    and comparator = 'AsciiType'
>>>    and default_validation_class = 'AsciiType'
>>>    and key_validation_class = 'AsciiType'
>>>    and read_repair_chance = 0.0
>>>    and dclocal_read_repair_chance = 0.0
>>>    and gc_grace = 0
>>>    and min_compaction_threshold = 4
>>>    and max_compaction_threshold = 32
>>>    and replicate_on_write = false
>>>    and compaction_strategy = 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy'
>>>    and caching = 'KEYS_ONLY'
>>>    and compression_options = {'sstable_compression' :
>>> 'org.apache.cassandra.io.compress.SnappyCompressor'};
>>> 
>>> 
>>> Br,
>>> Matthias Zeilinger
>>> Production Operation – Shared Services
>>> 
>>> P: +43 (0) 50 858-31185<tel:%2B43%20%280%29%2050%20858-31185>
>>> M: +43 (0) 664 85-34459<tel:%2B43%20%280%29%20664%2085-34459>
>>> E:
>>> matthias.zeilinger@bwinparty.com<mailto:matthias.zeilinger@bwinparty.c
>>> om>
>>> 
>>> bwin.party services (Austria) GmbH
>>> Marxergasse 1B
>>> A-1030 Vienna
>>> 
>>> www.bwinparty.com<http://www.bwinparty.com>
>>> 
>>> 
>> 
> 


Mime
View raw message