cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From DuyHai Doan <doanduy...@gmail.com>
Subject Re: Cluster sizing for huge dataset
Date Fri, 04 Oct 2019 15:41:02 GMT
The problem is that the user wants to access old data also using cql, not
popping un a Sparksql just to fetch one or two old records

Le 4 oct. 2019 12:38, "Cedrick Lunven" <cedrick.lunven@datastax.com> a
écrit :

> Hi,
>
> If you are using DataStax Enterprise why not offloading cold data to DSEFS
> (HDFS implementation) with friendly analytics storage format like parquet,
> keep only OLTP in the Cassandra Tables. Recommended size for DSEFS can go
> up to 30TB a node.
>
> I am pretty sure you are already aware of this option and would be curious
> to get your think about this solution and limitations.
>
> Note: that would also probably help you with your init-load/TWCS issue .
>
> My2c.
> Cedrick
>
> On Tue, Oct 1, 2019 at 11:49 PM DuyHai Doan <doanduyhai@gmail.com> wrote:
>
>> The client wants to be able to access cold data (2 years old) in the
>> same cluster so moving data to another system is not possible
>>
>> However, since we're using Datastax Enterprise, we can leverage Tiered
>> Storage and store old data on Spinning Disks to save on hardware
>>
>> Regards
>>
>> On Tue, Oct 1, 2019 at 9:47 AM Julien Laurenceau
>> <julien.laurenceau@pepitedata.com> wrote:
>> >
>> > Hi,
>> > Depending on the use case, you may also consider storage tiering with
>> fresh data on hot-tier (Cassandra) and older data on cold-tier
>> (Spark/Parquet or Presto/Parquet). It would be a lot more complex, but may
>> fit more appropriately the budget and you may reuse some tech already
>> present in your environment.
>> > You may even do subsampling during the transformation offloading data
>> from Cassandra in order to keep one point out of 10 for older data if
>> subsampling makes sense for your data signal.
>> >
>> > Regards
>> > Julien
>> >
>> > Le lun. 30 sept. 2019 à 22:03, DuyHai Doan <doanduyhai@gmail.com> a
>> écrit :
>> >>
>> >> Thanks all for your reply
>> >>
>> >> The target deployment is on Azure so with the Nice disk snapshot
>> feature, replacing a dead node is easier, no streaming from Cassandra
>> >>
>> >> About compaction overhead, using TwCs with a 1 day bucket and removing
>> read repair and subrange repair should be sufficient
>> >>
>> >> Now the only remaining issue is Quorum read which triggers repair
>> automagically
>> >>
>> >> Before 4.0  there is no flag to turn it off unfortunately
>> >>
>> >> Le 30 sept. 2019 15:47, "Eric Evans" <john.eric.evans@gmail.com> a
>> écrit :
>> >>
>> >> On Sat, Sep 28, 2019 at 8:50 PM Jeff Jirsa <jjirsa@gmail.com> wrote:
>> >>
>> >> [ ... ]
>> >>
>> >> > 2) The 2TB guidance is old and irrelevant for most people, what you
>> really care about is how fast you can replace the failed machine
>> >> >
>> >> > You’d likely be ok going significantly larger than that if you use
a
>> few vnodes, since that’ll help rebuild faster (you’ll stream from more
>> sources on rebuild)
>> >> >
>> >> > If you don’t want to use vnodes, buy big machines and run multiple
>> Cassandra instances in it - it’s not hard to run 3-4TB per instance and
>> 12-16T of SSD per machine
>> >>
>> >> We do this too.  It's worth keeping in mind though that you'll still
>> >> have a 12-16T blast radius in the event of a host failure.  As the
>> >> host density goes up, consider steps to make the host more robust
>> >> (RAID, redundant power supplies, etc).
>> >>
>> >> --
>> >> Eric Evans
>> >> john.eric.evans@gmail.com
>> >>
>> >> ---------------------------------------------------------------------
>> >> To unsubscribe, e-mail: user-unsubscribe@cassandra.apache.org
>> >> For additional commands, e-mail: user-help@cassandra.apache.org
>> >>
>> >>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: user-unsubscribe@cassandra.apache.org
>> For additional commands, e-mail: user-help@cassandra.apache.org
>>
>>
>
> --
>
>
> Cedrick Lunven | EMEA Developer Advocate Manager
>
>
> <https://www.linkedin.com/in/clunven/> <https://twitter.com/clunven>
> <https://clun.github.io/> <https://github.com/clun/>
>
>
> ❓Ask us your questions : *DataStax Community
> <https://community.datastax.com/index.html>*
>
> 🔬Test our new products : *DataStax Labs
> <https://downloads.datastax.com/#labs>*
>
>
>
> <https://constellation.datastax.com/?utm_campaign=FY20Q2_CONSTELLATION&utm_medium=email&utm_source=signature>
>
>
>

Mime
View raw message