cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From DuyHai Doan <doanduy...@gmail.com>
Subject Re: Cluster sizing for huge dataset
Date Tue, 01 Oct 2019 21:48:48 GMT
The client wants to be able to access cold data (2 years old) in the
same cluster so moving data to another system is not possible

However, since we're using Datastax Enterprise, we can leverage Tiered
Storage and store old data on Spinning Disks to save on hardware

Regards

On Tue, Oct 1, 2019 at 9:47 AM Julien Laurenceau
<julien.laurenceau@pepitedata.com> wrote:
>
> Hi,
> Depending on the use case, you may also consider storage tiering with fresh data on hot-tier
(Cassandra) and older data on cold-tier (Spark/Parquet or Presto/Parquet). It would be a lot
more complex, but may fit more appropriately the budget and you may reuse some tech already
present in your environment.
> You may even do subsampling during the transformation offloading data from Cassandra
in order to keep one point out of 10 for older data if subsampling makes sense for your data
signal.
>
> Regards
> Julien
>
> Le lun. 30 sept. 2019 à 22:03, DuyHai Doan <doanduyhai@gmail.com> a écrit :
>>
>> Thanks all for your reply
>>
>> The target deployment is on Azure so with the Nice disk snapshot feature, replacing
a dead node is easier, no streaming from Cassandra
>>
>> About compaction overhead, using TwCs with a 1 day bucket and removing read repair
and subrange repair should be sufficient
>>
>> Now the only remaining issue is Quorum read which triggers repair automagically
>>
>> Before 4.0  there is no flag to turn it off unfortunately
>>
>> Le 30 sept. 2019 15:47, "Eric Evans" <john.eric.evans@gmail.com> a écrit :
>>
>> On Sat, Sep 28, 2019 at 8:50 PM Jeff Jirsa <jjirsa@gmail.com> wrote:
>>
>> [ ... ]
>>
>> > 2) The 2TB guidance is old and irrelevant for most people, what you really care
about is how fast you can replace the failed machine
>> >
>> > You’d likely be ok going significantly larger than that if you use a few vnodes,
since that’ll help rebuild faster (you’ll stream from more sources on rebuild)
>> >
>> > If you don’t want to use vnodes, buy big machines and run multiple Cassandra
instances in it - it’s not hard to run 3-4TB per instance and 12-16T of SSD per machine
>>
>> We do this too.  It's worth keeping in mind though that you'll still
>> have a 12-16T blast radius in the event of a host failure.  As the
>> host density goes up, consider steps to make the host more robust
>> (RAID, redundant power supplies, etc).
>>
>> --
>> Eric Evans
>> john.eric.evans@gmail.com
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: user-unsubscribe@cassandra.apache.org
>> For additional commands, e-mail: user-help@cassandra.apache.org
>>
>>

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@cassandra.apache.org
For additional commands, e-mail: user-help@cassandra.apache.org


Mime
View raw message