cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Aleksey Yeschenko (JIRA)" <j...@apache.org>
Subject [jira] [Resolved] (CASSANDRA-6909) A way to expire columns without converting to tombstones
Date Fri, 18 Mar 2016 21:00:34 GMT

     [ https://issues.apache.org/jira/browse/CASSANDRA-6909?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Aleksey Yeschenko resolved CASSANDRA-6909.
------------------------------------------
    Resolution: Duplicate

CASSANDRA-5546 is going to mostly address the problem from a different angle - closing the
ticket as a dup of that.

> A way to expire columns without converting to tombstones
> --------------------------------------------------------
>
>                 Key: CASSANDRA-6909
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-6909
>             Project: Cassandra
>          Issue Type: New Feature
>            Reporter: Bartłomiej Romański
>
> Imagine the following scenario. 
> - You need to store some data knowing that you will need them only for limited time (say
7 days).
> - After that you just don't care. You don't need them to be returned in the queries,
but if they are returned that's not a problem at all - you won't look at them anyway.
> - You records are small. Row keys and column names are even longer than the actual values
(e.g. ints vs strings).
> - You reuse rows. You add some new columns to most of the rows every day or two. This
means that columns expire often, rows usually not.
> - You generate a lot of data and want to make sure that expired records do not consume
disk space for too long.
> Current TTL feature do not handle that situation well. When compaction finally decides
that it's worth to compact the given sstable it won't simply get rid of expired columns. Instead
it will transform them into tombstones. In case of small values that's not a saving at all.
> Even if you set grace period to 0 tombstones cannot be removed too early because some
other sstable can still have values that should be "covered" by this tombstone. 
> You can get rid of tombstone only in two cases:
> - it's a major compaction (never happens with LCS, requires a lot of space in STCS)
> - bloom filters tell you that there are no others sstable with this row key
> The second option is not common if you usually have multiple columns in a single row
that was written not at once. It's a great chance you'll have your row spread across multiple
sstables. And from time to time a new ones are generated. There's very little chance they'll
all meet in one compaction at some point. 
> What's funny, bloom filters returns true if there's a tombstone for the given row in
the given sstable. So you won't remove tombstones during compaction, because there's some
other tombstone in another sstable for that row :/
> After a while, you end up with a lot of tombstones (majority of your data) and can do
nothing about that.
> Now image that Cassandra knows that we just don't care about data older than 7 days.

> Firstly, it can simply drop such columns during compactions (without converting them
to tombstones or anything like that).
> Secondly, if it detects an sstable older than 7 days it can safely remove it at all (it
cannot contain any active data).
> These two *guarantee* that you data will be removed after 14 days (2xTTL). If we do compaction
after 7 days, expired data will be removed. If we not, whole sstable will be removed after
another 7 days.
> That's what I expected from CASSANDRA-3974, but it turned out to be a just trivial, frontend
feature. 
> I suggest to rethink this mechanism. I don't believe that it's a common scenario that
someone who sets TTL for whole CF need all this strong guarantees that data will not reappear
in the future in case of some issues with consistency (that's why we need this whole mess
with tombstones). 
> I believe common case with per-CF TTL is that you just want an efficient way of recover
you disk space (and improve reads performance by having less sstables and less data in general).
> To work around this we currently periodically stop Cassandra, simply remove too old sstables,
and start it back. Works OK, but does not solve problem fully (if tombstone is rewritten by
compactions often, we will never remove it).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message