cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jeff Jirsa <jji...@gmail.com>
Subject Re: Challenge with initial data load with TWCS
Date Sun, 29 Sep 2019 01:41:13 GMT


We used to do either:

- CQLSSTableWriter and explicitly break between windows (then nodetool refresh or sstableloader
to push them into the system), or

- Use the normal write path for a single window at a time, explicitly calling flush between
windows. You can’t have current data writing while you do your historical load using this
method



> On Sep 28, 2019, at 1:31 PM, DuyHai Doan <doanduyhai@gmail.com> wrote:
> 
> Hello users
> 
> TWCS works great for permanent state. It creates SSTables of roughly
> fixed size if your insertion rate is pretty constant.
> 
> Now the big deal is about the initial load.
> 
> Let's say we configure a TWCS with window unit = day and window size =
> 1, we would have 1 SSTable per day and with TTL = 365 days all data
> would expire after 1 year
> 
> Now, since the cluster is still empty we need to load data worth of 1
> year. If we use TWCS and if the loading takes 7 days, we would have 7
> SSTables, each of them aggregating 365/7 worth of annual data. Ideally
> we would like TWCS to split these data into 365 distinct SSTables
> 
> So my question is: how to manage this scenario ? How to perform an
> initial load for a table using TWCS and make the compaction split
> nicely the data base on source data timestamp and not insertion
> timestamp ?
> 
> Regards
> 
> Duy Hai DOAN
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@cassandra.apache.org
> For additional commands, e-mail: user-help@cassandra.apache.org
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@cassandra.apache.org
For additional commands, e-mail: user-help@cassandra.apache.org


Mime
View raw message