incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Charlie Mason <charlie....@gmail.com>
Subject Avoiding High Cell Tombstone Count
Date Sun, 25 May 2014 19:01:42 GMT
Hi All,

I have a table which has one column per user. It revives at lot of updates
to these columns through out the life time. They are always updates on a
few specific columns Firstly is Cassandra storing a Tombstone for each of
these old column values.

I have run a simple select and seen the following tracing results:

activity
               | timestamp    | source    | source_elapsed
-------------------------------------------------------------------------------------------+--------------+-----------+----------------
execute_cql3_query | 19:48:36,582 | 127.0.0.1 |              0
Parsing SELECT Account, Balance FROM AccountBalances WHERE Account =
'test9' LIMIT 10000; | 19:48:36,582 | 127.0.0.1 |             56
Preparing statement | 19:48:36,582 | 127.0.0.1 |            181
Executing single-partition query on accountbalances | 19:48:36,583 |
127.0.0.1 |            878
Acquiring sstable references | 19:48:36,583 | 127.0.0.1 |            895
Merging memtable tombstones | 19:48:36,583 | 127.0.0.1 |            918
Key cache hit for sstable 569 | 19:48:36,583 | 127.0.0.1 |            997
Seeking to partition beginning in data file | 19:48:36,583 | 127.0.0.1 |
        1034
Skipped 0/1 non-slice-intersecting sstables, included 0 due to tombstones |
19:48:36,583 | 127.0.0.1 |           1383
Merging data from memtables and 1 sstables | 19:48:36,583 | 127.0.0.1 |
      1402
Read 1 live and 123780 tombstoned cells | 19:48:36,710 | 127.0.0.1 |
  128631
Request complete | 19:48:36,711 | 127.0.0.1 |         129276


As you can see that's awful lot of tombstoned cells. That's after a full
compaction as well. Just so you are aware this table is updated using a
Paxos IF statement.

Its still seems fairly snappy however I am concerned its only going to get
worse.

Would I better off adding a time based key to the primary key. Then doing a
sepperate insert and then deleting the original. If I did the query with a
limit of one it should always find the first rows before hitting a
tombstone. Is that correct?

Thanks,

Charlie M

Mime
View raw message