cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From John Sanda <john.sa...@gmail.com>
Subject Re: Time series data model and tombstones
Date Sat, 28 Jan 2017 20:32:43 GMT
Thanks for the response. This version of the code is using STCS.
gc_grace_seconds was set to one day and then I changed it to zero since RF
= 1. I understand that expired data will still generate tombstones and that
STCS is not the best. More recent versions of the code use DTCS, and we'll
be switching over to TWCS shortly. The suggestions raised are excellent
ones, but I tend to think of them as optimizations that might not address
my issue which I think may be 1) a problem with my data model, 2) problem
with the queries used or 3) some misunderstanding of Cassandra performs
range scans.

I am doing append-only writes. There is no out of order data. There are no
deletes, just TTLs. Data is stored on disk in descending order, and queries
access recent data and never query past the TTL of seven days. Given this I
would not except to be reading tombstones, certainly not the large numbers
that I am seeing.

On Sat, Jan 28, 2017 at 12:15 PM, Jonathan Haddad <jon@jonhaddad.com> wrote:

> Since you didn't specify a compaction strategy I'm guessing you're using
> STCS. Your TTL'ed data is becoming a tombstone. TWCS is a better strategy
> for this type of workload.
> On Sat, Jan 28, 2017 at 8:30 AM John Sanda <john.sanda@gmail.com> wrote:
>
>> I have a time series data model that is basically:
>>
>> CREATE TABLE metrics (
>>     id text,
>>     time timeuuid,
>>     value double,
>>     PRIMARY KEY (id, time)
>> ) WITH CLUSTERING ORDER BY (time DESC);
>>
>> I do append-only writes, no deletes, and use a TTL of seven days. Data
>> points are written every seconds. The UI queries data for the past hour,
>> two hours, day, or week. The UI refreshes and executes queries every 30
>> seconds. In one test environment I am seeing lots of tombstone threshold
>> warnings and Cassandra has even OOME'd. Since I am storing data in
>> descending order and always query for recent data, I do not understand why
>> I am running into this problem.
>>
>> I know that it is recommended to do some date partitioning in part to
>> ensure partitions do not grow too large. I already have some changes in
>> place to partition by day.. Before I make those changes I want to
>> understand why I am scanning so many tombstones so that I can be more
>> confident that the date partitioning changes will help.
>>
>> Thanks
>>
>> - John
>>
>


-- 

- John

Mime
View raw message