incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From samal <samalgo...@gmail.com>
Subject Re: supercolumns with TTL columns not being compacted correctly
Date Wed, 23 May 2012 04:03:38 GMT
Thanks I didn't knew  two stage removal process.
On 23-May-2012 2:20 AM, "Jonathan Ellis" <jbellis@gmail.com> wrote:

> Correction: the first compaction after expiration + gcgs can remove
> it, even if it hasn't been turned into a tombstone previously.
>
> On Tue, May 22, 2012 at 9:37 AM, Jonathan Ellis <jbellis@gmail.com> wrote:
> > Additionally, it will always take at least two compaction passes to
> > purge an expired column: one to turn it into a tombstone, and a second
> > (after gcgs) to remove it.
> >
> > On Tue, May 22, 2012 at 9:21 AM, Yuki Morishita <mor.yuki@gmail.com>
> wrote:
> >> Data will not be deleted when those keys appear in other stables
> outside of
> >> compaction. This is to prevent obsolete data from appearing again.
> >>
> >> yuki
> >>
> >> On Tuesday, May 22, 2012 at 7:37 AM, Pieter Callewaert wrote:
> >>
> >> Hi Samal,
> >>
> >>
> >>
> >> Thanks for your time looking into this.
> >>
> >>
> >>
> >> I force the compaction by using forceUserDefinedCompaction on only that
> >> particular sstable. This gurantees me the new sstable being written only
> >> contains the data from the old sstable.
> >>
> >> The data in the sstable is more than 31 days old and gc_grace is 0, but
> >> still the data from the sstable is being written to the new one, while
> I am
> >> 100% sure all the data is invalid.
> >>
> >>
> >>
> >> Kind regards,
> >>
> >> Pieter Callewaert
> >>
> >>
> >>
> >> From: samal [mailto:samalgorai@gmail.com]
> >> Sent: dinsdag 22 mei 2012 14:33
> >> To: user@cassandra.apache.org
> >> Subject: Re: supercolumns with TTL columns not being compacted correctly
> >>
> >>
> >>
> >> Data will remain till next compaction but won't be available. Compaction
> >> will delete old sstable create new one.
> >>
> >> On 22-May-2012 5:47 PM, "Pieter Callewaert" <
> pieter.callewaert@be-mobile.be>
> >> wrote:
> >>
> >> Hi,
> >>
> >>
> >>
> >> I’ve had my suspicions some months, but I think I am sure about it.
> >>
> >> Data is being written by the SSTableSimpleUnsortedWriter and loaded by
> the
> >> sstableloader.
> >>
> >> The data should be alive for 31 days, so I use the following logic:
> >>
> >>
> >>
> >> int ttl = 2678400;
> >>
> >> long timestamp = System.currentTimeMillis() * 1000;
> >>
> >> long expirationTimestampMS = (long) ((timestamp / 1000) + ((long) ttl *
> >> 1000));
> >>
> >>
> >>
> >> And using this to write it:
> >>
> >>
> >>
> >> sstableWriter.newRow(bytes(entry.id));
> >>
> >> sstableWriter.newSuperColumn(bytes(superColumn));
> >>
> >> sstableWriter.addExpiringColumn(nameTT, bytes(entry.aggregatedTTMs),
> >> timestamp, ttl, expirationTimestampMS);
> >>
> >> sstableWriter.addExpiringColumn(nameCov,
> bytes(entry.observationCoverage),
> >> timestamp, ttl, expirationTimestampMS);
> >>
> >> sstableWriter.addExpiringColumn(nameSpd, bytes(entry.speed), timestamp,
> ttl,
> >> expirationTimestampMS);
> >>
> >>
> >>
> >> This works perfectly, data can be queried until 31 days are passed,
> then no
> >> results are given, as expected.
> >>
> >> But the data is still on disk until the sstables are being recompacted:
> >>
> >>
> >>
> >> One of our nodes (we got 6 total) has the following sstables:
> >>
> >> [cassandra@bemobile-cass3 ~]$ ls -hal /data/MapData007/HOS-* | grep G
> >>
> >> -rw-rw-r--. 1 cassandra cassandra 103G May  3 03:19
> >> /data/MapData007/HOS-hc-125620-Data.db
> >>
> >> -rw-rw-r--. 1 cassandra cassandra 103G May 12 21:17
> >> /data/MapData007/HOS-hc-163141-Data.db
> >>
> >> -rw-rw-r--. 1 cassandra cassandra  25G May 15 06:17
> >> /data/MapData007/HOS-hc-172106-Data.db
> >>
> >> -rw-rw-r--. 1 cassandra cassandra  25G May 17 19:50
> >> /data/MapData007/HOS-hc-181902-Data.db
> >>
> >> -rw-rw-r--. 1 cassandra cassandra  21G May 21 07:37
> >> /data/MapData007/HOS-hc-191448-Data.db
> >>
> >> -rw-rw-r--. 1 cassandra cassandra 6.5G May 21 17:41
> >> /data/MapData007/HOS-hc-193842-Data.db
> >>
> >> -rw-rw-r--. 1 cassandra cassandra 5.8G May 22 11:03
> >> /data/MapData007/HOS-hc-196210-Data.db
> >>
> >> -rw-rw-r--. 1 cassandra cassandra 1.4G May 22 13:20
> >> /data/MapData007/HOS-hc-196779-Data.db
> >>
> >> -rw-rw-r--. 1 cassandra cassandra 401G Apr 16 08:33
> >> /data/MapData007/HOS-hc-58572-Data.db
> >>
> >> -rw-rw-r--. 1 cassandra cassandra 169G Apr 16 17:59
> >> /data/MapData007/HOS-hc-61630-Data.db
> >>
> >> -rw-rw-r--. 1 cassandra cassandra 173G Apr 17 03:46
> >> /data/MapData007/HOS-hc-63857-Data.db
> >>
> >> -rw-rw-r--. 1 cassandra cassandra 105G Apr 23 06:41
> >> /data/MapData007/HOS-hc-87900-Data.db
> >>
> >>
> >>
> >> As you can see, the following files should be invalid:
> >>
> >> /data/MapData007/HOS-hc-58572-Data.db
> >>
> >> /data/MapData007/HOS-hc-61630-Data.db
> >>
> >> /data/MapData007/HOS-hc-63857-Data.db
> >>
> >>
> >>
> >> Because they are all written more than an moth ago. gc_grace is 0 so
> this
> >> should also not be a problem.
> >>
> >>
> >>
> >> As a test, I use forceUserSpecifiedCompaction on the
> HOS-hc-61630-Data.db.
> >>
> >> Expected behavior should be an empty file is being written because all
> data
> >> in the sstable should be invalid:
> >>
> >>
> >>
> >> Compactionstats is giving:
> >>
> >> compaction type        keyspace   column family bytes compacted
> bytes
> >> total  progress
> >>
> >>                Compaction      MapData007             HOS
> 11518215662
> >> 532355279724     2.16%
> >>
> >>
> >>
> >> And when I ls the directory I find this:
> >>
> >> -rw-rw-r--. 1 cassandra cassandra 3.9G May 22 14:12
> >> /data/MapData007/HOS-tmp-hc-196898-Data.db
> >>
> >>
> >>
> >> The sstable is being 1-on-1 copied to a new one. What am I missing here?
> >>
> >> TTL works perfectly, but is it giving a problem because it is in a super
> >> column, and so never to be deleted from disk?
> >>
> >>
> >>
> >> Kind regards
> >>
> >> Pieter Callewaert | Web & IT engineer
> >>
> >>  Be-Mobile NV | TouringMobilis
> >>
> >>  Technologiepark 12b - 9052 Ghent - Belgium
> >>
> >> Tel + 32 9 330 51 80 | Fax + 32 9 330 51 81 |  Cell + 32 473 777 121
> >>
> >>
> >>
> >>
> >
> >
> >
> > --
> > Jonathan Ellis
> > Project Chair, Apache Cassandra
> > co-founder of DataStax, the source for professional Cassandra support
> > http://www.datastax.com
>
>
>
> --
> Jonathan Ellis
> Project Chair, Apache Cassandra
> co-founder of DataStax, the source for professional Cassandra support
> http://www.datastax.com
>

Mime
View raw message