cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Björn Hegerfors (JIRA) <j...@apache.org>
Subject [jira] [Commented] (CASSANDRA-10280) Make DTCS work well with old data
Date Mon, 21 Sep 2015 17:01:04 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-10280?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14900985#comment-14900985
] 

Björn Hegerfors commented on CASSANDRA-10280:
---------------------------------------------

Yes, I'm absolutely in favor of expressing this in terms of max window size instead of max
SSTable age. And it's also become more and more clear to me that rather than never compacting
SSTables that are too old, we should just keep fixed size windows around, so that if SSTables
come in there (bootstrap, repairs), compaction will happen.

I haven't looked at the patch, but is there a clear way to express maximum window size? If
base_time_seconds=1 do you then say something like max_window_seconds=10? And in that case,
will the larges windows be 4 or 16? I guess only 4 would make sense with that name...

I've suggested before declaring how many times a window will be coalesced. But that might
sound really complicated to users. What I mean is a setting like "window_coalitions" or "write_amplification"
which you can set to 5 in order to get a maximum window size of 4^5=1024 times the base window.
But let's go with whatever is easiest to understand.

> Make DTCS work well with old data
> ---------------------------------
>
>                 Key: CASSANDRA-10280
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-10280
>             Project: Cassandra
>          Issue Type: Sub-task
>          Components: Core
>            Reporter: Marcus Eriksson
>            Assignee: Marcus Eriksson
>             Fix For: 3.x, 2.1.x, 2.2.x
>
>
> Operational tasks become incredibly expensive if you keep around a long timespan of data
with DTCS - with default settings and 1 year of data, the oldest window covers about 180 days.
Bootstrapping a node with vnodes with this data layout will force cassandra to compact very
many sstables in this window.
> We should probably put a cap on how big the biggest windows can get. We could probably
default this to something sane based on max_sstable_age (ie, say we can reasonably handle
1000 sstables per node, then we can calculate how big the windows should be to allow that)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message