cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From kurt greaves <k...@instaclustr.com>
Subject Re: Tombstoned data seems to remain after compaction
Date Mon, 11 Dec 2017 23:53:57 GMT
It might... If you have the disk space a major compaction would be better,
or user defined compactions with the large/old SSTable. Better yet if
you're on a recent version you can do a splitting major compaction (all
these options are available through *nodetool compact*).


On 11 December 2017 at 07:41, taka-t@fujitsu.com <taka-t@fujitsu.com> wrote:

> Hi Jeff
>
>
>
>
>
> I’m appreciate for your detailed explanation :)
>
>
>
>
>
> Ø  Expired data gets purged on compaction as long as it doesn’t overlap
> with other live data. The overlap thing can be difficult to reason about,
> but it’s meant to ensure correctness in the event that you write a value
> with ttl 180, then another value with ttl 1, and you don’t want to remove
> the value with ttl1 until you’ve also removed the value with ttl180, since
> it would lead to data being resurrected
>
>
>
> I understand that TTL setting sometimes does not work as we expect,
> especially when we alter the
>
> value afterword because of the Cassandra’s data consistency
> functionalities. My understanding is
>
> correct?
>
>
>
>
>
> And I think of trying sstablesplit utility to let the Cassandra do minor
> compaction because one of
>
> SSTables, which is oldest and very large so I want to compact it.
>
>
>
> Do you  think my plan works as expected?
>
>
>
>
>
>
>
>
>
> Regards,
>
> Takashima
>
>
>
> *From:* Jeff Jirsa [mailto:jjirsa@gmail.com]
> *Sent:* Monday, December 11, 2017 3:36 PM
>
> *To:* user@cassandra.apache.org
> *Subject:* Re: Tombstoned data seems to remain after compaction
>
>
>
> Replies inline
>
>
>
>
> On Dec 10, 2017, at 9:59 PM, "taka-t@fujitsu.com" <taka-t@fujitsu.com>
> wrote:
>
> Hi Jeff,
>
>
>
>
>
> Ø  Are all of your writes TTL’d in this table?
>
> Yes. We set TTL to 180 days at first, and then altered it to just 1 day
> because we noticed the First TTL
>
> setting is too long.
>
>
>
>
>
> Ok this is different - Kurt’s answer is true when you issue explicit
> deletes. Expiring data is slightly different.
>
>
>
> Expired data gets purged on compaction as long as it doesn’t overlap with
> other live data. The overlap thing can be difficult to reason about, but
> it’s meant to ensure correctness in the event that you write a value with
> ttl 180, then another value with ttl 1, and you don’t want to remove the
> value with ttl1 until you’ve also removed the value with ttl180, since it
> would lead to data being resurrected
>
>
>
> This is the primary reason that ttl’d data doesn’t get cleaned up when
> people expect
>
>
>
>
>
>
>
>
>
> Ø  Which compaction strategy are you using?
>
> We use Size Tiered Compaction Strategy.
>
>
>
>
>
>
>
> LCS would compact more aggressively and try to minimize overlaps
>
>
>
> TWCS is designed for expiring data and tries to group data by time window
> for more efficient expiration.
>
>
>
> You would likely benefit from changing to either of those - but you’ll
> want to try it on a single node first to confirm (should be able to find
> videos online about using JMX to change the compaction strategy of a single
> node)
>
>
>
> Ø  Are you asking these questions because you’re running out of space
> faster than you expect and you’d like to expire data faster?
>
> You’re right. We want to know the reason and how to purge those old data
> soon if possible.
>
> And I want to understand why those old records reported by the
> sstablemetadata command persist in sstable data file *in advance*.
>
> https://m.youtube.com/watch?v=PWtekUWCIaw
>
>
>
>
>
> Not to self promote too much, but I’ve given a few talks on running time
> series Cassandra clusters. These slides https://www.slideshare.
> net/mobile/JeffJirsa1/using-time-window-compaction-
> strategy-for-time-series-workloads (in video form here,
> https://m.youtube.com/watch?v=PWtekUWCIaw ) may be useful.
>
>
>
>
>
> B.T.W
>
> I’m sorry but please let me ask the question again.
>
> Here is the excerpt of sstablemetadata command below.
>
>
>
> Does the section “*Estimated tombstone drop times*” mean that the sstable
> contains tombstones for those records that should expire
>
> on the date of the 1st column? And the data might exist in other SSTables?
>
>
>
> (excerpt)
>
> ----
> Estimated tombstone drop times:%n
> 1510934467:      2475 * 2017.11.18
> 1510965112:       135
> 1510983500:       225
> 1511003962:       105
> 1511021113:      2280
> 1511037818:        30
> 1511055563:       120
> ----
>
>
>
>
>
>
>
>
> Regards,
>
> Takashima
>
>
>
> *From:* Jeff Jirsa [mailto:jjirsa@gmail.com <jjirsa@gmail.com>]
> *Sent:* Monday, December 11, 2017 2:35 PM
> *To:* user@cassandra.apache.org
> *Subject:* Re: Tombstoned data seems to remain after compaction
>
>
>
> Mutations read during boot won’t go into the memtable unless the mutation
> is in the commitlog (which usually means fairly recent - they’re a fixed
> size)
>
> Are all of your writes TTL’d in this table?
>
> Which compaction strategy are you using?
>
> Are you asking these questions because you’re running out of space faster
> than you expect and you’d like to expire data faster?
>
>
>
>
>
> --
>
> Jeff Jirsa
>
>
>
>
> On Dec 10, 2017, at 9:30 PM, "taka-t@fujitsu.com" <taka-t@fujitsu.com>
> wrote:
>
> Hi Kurt,
>
>
>
>
>
> Thanks for your reply!
>
>
>
> “””
>
> The tombstone needs to compact with every SSTable that contains data for
> the corresponding tombstone.
>
> “””
>
>
>
> Let me explain my understanding by example:
>
>
>
> 1.     A record inserted with 180 days TTL (Very long).
>
> 2.     The record is saved to SSTable (A) when the server restarts or
> some events like that.
>
> 3.     After 180 days pass, The Cassandra process read SSTable (A) on its
> boot process ( or, read access?) and put tombstone for the record on *
> *Memory**.
>
> 4.     The tombstone on **Memory** is saved to SSTable (B) the next time
> the server is rebooted.
>
>
>
> The procedure above splits the sstable for both the record per se and
> tombstone.
>
>
>
> My understanding is correct?
>
>
>
>
>
>
>
> Regards,
>
> Takashima
>
>
>
>
>
> *From:* kurt greaves [mailto:kurt@instaclustr.com <kurt@instaclustr.com>]
> *Sent:* Monday, December 11, 2017 1:46 PM
> *To:* User <user@cassandra.apache.org>
> *Subject:* Re: Tombstoned data seems to remain after compaction
>
>
>
> The tombstone needs to compact with every SSTable that contains data for
> the corresponding tombstone. For example the tombstone may be in that
> SSTable but some data the tombstone covers may possibly be in another
> SSTable. Only once all SSTables that contain relevant data have been
> compacted with the SSTable containing the tombstone can the tombstone be
> removed.
>
>
>
> On 11 December 2017 at 01:08, taka-t@fujitsu.com <taka-t@fujitsu.com>
> wrote:
>
> Hi All,
>
>
> I'm using the SSTable with Size Tired Compaction Strategy with
> 10 days gc grace period as default.
>
> And sstablemetadata command shows Estimated tombstone drop times
> As follows after minor compaction on 9th Dec, 2018.
>
> (excerpt)
> Estimated tombstone drop times:%n
> 1510934467:      2475 * 2017.11.18
> 1510965112:       135
> 1510983500:       225
> 1511003962:       105
> 1511021113:      2280
> 1511037818:        30
> 1511055563:       120
> 1511075445:       165
>
>
> I just think there are records that should be deleted on
> 18th Nov, 2018 in the SSTable by the output above. My understanding
> is correct?
>
> If my understanding I correct, could someone tell me why those
> expired data remains after compation?
>
>
>
>
> Regards,
> Takashima
>
> ----------------------------------------------------------------------
> Toshiaki Takashima
> Toyama Fujitsu Limited
> +810764553131, ext. 7260292 <+81%2076-455-3131>355
>
> ----------------------------------------------------------------------
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@cassandra.apache.org
> For additional commands, e-mail: user-help@cassandra.apache.org
>
>
>
>

Mime
View raw message