cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Haebin Na (JIRA)" <j...@apache.org>
Subject [jira] [Created] (CASSANDRA-7115) Partitioned Column Family (Table) based on Column Keys (Sorta TTLed Table)
Date Wed, 30 Apr 2014 02:34:20 GMT
Haebin Na created CASSANDRA-7115:
------------------------------------

             Summary: Partitioned Column Family (Table) based on Column Keys (Sorta TTLed
Table)
                 Key: CASSANDRA-7115
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-7115
             Project: Cassandra
          Issue Type: New Feature
          Components: Core
            Reporter: Haebin Na
            Priority: Minor


We need a better solution to expire columns than TTLed columns.

If you set TTL 6 months for a column in a frequently updated(deleted, yes, this is anti-pattern)
wide row, it is not likely to be deleted since the row would be highly fragmented.

In order to solve the problem above, I suggest partitioning column family (table) with column
key (column1) as partition key.

It is like a set of column families (tables) which share the same structure and cover certain
range of columns per CF. This means that a row is deterministically fragmented by column key.

If you use timestamp like column key, then you would be able to truncate specific partition
(a sub-table or CF with specific range) if it is older than certain age easily without worrying
about zombie tombstones. 

It is not optimal to have many column families, yet even with small set like by biyearly or
quarterly, we could achieve whole lot more efficient than TTLed columns.

What do you think?






--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message