incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Yuki Morishita <mor.y...@gmail.com>
Subject Re: supercolumns with TTL columns not being compacted correctly
Date Tue, 22 May 2012 14:21:29 GMT
Data will not be deleted when those keys appear in other stables outside of compaction. This
is to prevent obsolete data from appearing again.

yuki


On Tuesday, May 22, 2012 at 7:37 AM, Pieter Callewaert wrote:

>  
> Hi Samal,
>  
>  
>   
>  
>  
> Thanks for your time looking into this.
>  
>  
>   
>  
>  
> I force the compaction by using forceUserDefinedCompaction on only that particular sstable.
This gurantees me the new sstable being written only contains the data from the old sstable.
>  
>  
> The data in the sstable is more than 31 days old and gc_grace is 0, but still the data
from the sstable is being written to the new one, while I am 100% sure all the data is invalid.
>  
>  
>   
>  
>  
> Kind regards,
>  
>  
> Pieter Callewaert
>  
>  
>   
>  
>  
> From: samal [mailto:samalgorai@gmail.com]  
> Sent: dinsdag 22 mei 2012 14:33
> To: user@cassandra.apache.org (mailto:user@cassandra.apache.org)
> Subject: Re: supercolumns with TTL columns not being compacted correctly
>  
>  
>   
>  
> Data will remain till next compaction but won't be available. Compaction will delete
old sstable create new one.
>  
> On 22-May-2012 5:47 PM, "Pieter Callewaert" <pieter.callewaert@be-mobile.be (mailto:pieter.callewaert@be-mobile.be)>
wrote:
>  
>  
> Hi,
>  
>  
>   
>  
>  
> I’ve had my suspicions some months, but I think I am sure about it.
>  
>  
> Data is being written by the SSTableSimpleUnsortedWriter and loaded by the sstableloader.
>  
>  
> The data should be alive for 31 days, so I use the following logic:
>  
>  
>   
>  
>  
> int ttl = 2678400;
>  
>  
> long timestamp = System.currentTimeMillis() * 1000;
>  
>  
> long expirationTimestampMS = (long) ((timestamp / 1000) + ((long) ttl * 1000));
>  
>  
>   
>  
>  
> And using this to write it:
>  
>  
>   
>  
>  
> sstableWriter.newRow(bytes(entry.id (http://entry.id)));
>  
>  
> sstableWriter.newSuperColumn(bytes(superColumn));
>  
>  
> sstableWriter.addExpiringColumn(nameTT, bytes(entry.aggregatedTTMs), timestamp, ttl,
expirationTimestampMS);
>  
>  
> sstableWriter.addExpiringColumn(nameCov, bytes(entry.observationCoverage), timestamp,
ttl, expirationTimestampMS);
>  
>  
> sstableWriter.addExpiringColumn(nameSpd, bytes(entry.speed), timestamp, ttl, expirationTimestampMS);
>  
>  
>   
>  
>  
> This works perfectly, data can be queried until 31 days are passed, then no results are
given, as expected.
>  
>  
> But the data is still on disk until the sstables are being recompacted:
>  
>  
>   
>  
>  
> One of our nodes (we got 6 total) has the following sstables:
>  
>  
> [cassandra@bemobile-cass3 ~]$ ls -hal /data/MapData007/HOS-* | grep G
>  
>  
> -rw-rw-r--. 1 cassandra cassandra 103G May  3 03:19 /data/MapData007/HOS-hc-125620-Data.db
>  
>  
> -rw-rw-r--. 1 cassandra cassandra 103G May 12 21:17 /data/MapData007/HOS-hc-163141-Data.db
>  
>  
> -rw-rw-r--. 1 cassandra cassandra  25G May 15 06:17 /data/MapData007/HOS-hc-172106-Data.db
>  
>  
> -rw-rw-r--. 1 cassandra cassandra  25G May 17 19:50 /data/MapData007/HOS-hc-181902-Data.db
>  
>  
> -rw-rw-r--. 1 cassandra cassandra  21G May 21 07:37 /data/MapData007/HOS-hc-191448-Data.db
>  
>  
> -rw-rw-r--. 1 cassandra cassandra 6.5G May 21 17:41 /data/MapData007/HOS-hc-193842-Data.db
>  
>  
> -rw-rw-r--. 1 cassandra cassandra 5.8G May 22 11:03 /data/MapData007/HOS-hc-196210-Data.db
>  
>  
> -rw-rw-r--. 1 cassandra cassandra 1.4G May 22 13:20 /data/MapData007/HOS-hc-196779-Data.db
>  
>  
> -rw-rw-r--. 1 cassandra cassandra 401G Apr 16 08:33 /data/MapData007/HOS-hc-58572-Data.db
>  
>  
> -rw-rw-r--. 1 cassandra cassandra 169G Apr 16 17:59 /data/MapData007/HOS-hc-61630-Data.db
>  
>  
> -rw-rw-r--. 1 cassandra cassandra 173G Apr 17 03:46 /data/MapData007/HOS-hc-63857-Data.db
>  
>  
> -rw-rw-r--. 1 cassandra cassandra 105G Apr 23 06:41 /data/MapData007/HOS-hc-87900-Data.db
>  
>  
>   
>  
>  
> As you can see, the following files should be invalid:
>  
>  
> /data/MapData007/HOS-hc-58572-Data.db
>  
>  
> /data/MapData007/HOS-hc-61630-Data.db
>  
>  
> /data/MapData007/HOS-hc-63857-Data.db
>  
>  
>   
>  
>  
> Because they are all written more than an moth ago. gc_grace is 0 so this should also
not be a problem.
>  
>  
>   
>  
>  
> As a test, I use forceUserSpecifiedCompaction on the HOS-hc-61630-Data.db.
>  
>  
> Expected behavior should be an empty file is being written because all data in the sstable
should be invalid:
>  
>  
>   
>  
>  
> Compactionstats is giving:
>  
>  
> compaction type        keyspace   column family bytes compacted     bytes total  progress
>  
>  
>                Compaction      MapData007             HOS     11518215662    532355279724
    2.16%
>  
>  
>   
>  
>  
> And when I ls the directory I find this:
>  
>  
> -rw-rw-r--. 1 cassandra cassandra 3.9G May 22 14:12 /data/MapData007/HOS-tmp-hc-196898-Data.db
>  
>  
>   
>  
>  
> The sstable is being 1-on-1 copied to a new one. What am I missing here?
>  
>  
> TTL works perfectly, but is it giving a problem because it is in a super column, and
so never to be deleted from disk?
>  
>  
>   
>  
>  
> Kind regards
>  
>  
> Pieter Callewaert | Web & IT engineer
>  
>  
>  Be-Mobile NV (http://www.be-mobile.be/) | TouringMobilis (http://www.touringmobilis.be/)
>  
>  
>  Technologiepark 12b - 9052 Ghent - Belgium
>  
>  
> Tel + 32 9 330 51 80 | Fax + 32 9 330 51 81 |  Cell + 32 473 777 121
>  
>  
>   
>  
>  
>  
>  
>  
>  
>  



Mime
View raw message