cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dor Laor <...@scylladb.com>
Subject Re: Dynamo autoscaling: does it beat cassandra?
Date Mon, 09 Dec 2019 22:57:23 GMT
The DynamoDB model has several key benefits over Cassandra's.
The most notable one is the tablet concept - data is partitioned into 10GB
chunks. So scaling happens where such a tablet reaches maximum capacity
and it is automatically divided to two. It can happen in parallel across
the entire
data set, thus there is no concept of growing the amount of nodes or vnodes.
As the actual hardware is multi-tenant, the average server should have
plenty
of capacity to receive these streams.

That said, when we benchmarked DynamoDB and just hit it with ingest
workload,
even when it was reserved, we had to slow down the pace since we received
many
'error 500' which means internal server errors. Their hot partitions do not
behave great
as well.

So I believe a growth of 10% the capacity with good key distribution can be
handled well
but a growth of 2x in a short time will fail. It's something you're expect
from any database
but Dynamo has an advantage with tablets and multitenancy and issues with
hot partitions
and accounting of hot keys which will get cached in Cassandra better.

Dynamo allows you to detach compute from the storage which is a key benefit
in a serverless, spiky deployment.

On Mon, Dec 9, 2019 at 1:02 PM Jeff Jirsa <jjirsa@gmail.com> wrote:

> Expansion probably much faster in 4.0 with complete sstable streaming
> (skips ser/deser), though that may have diminishing returns with vnodes
> unless you're using LCS.
>
> Dynamo on demand / autoscaling isn't magic - they're overprovisioning to
> give you the burst, then expanding on demand. That overprovisioning comes
> with a cost. Unless you're actively and regularly scaling, you're probably
> going to pay more for it.
>
> It'd be cool if someone focused on this - I think the faster streaming
> goes a long way. The way vnodes work today make it difficult to add more
> than one at a time without violating consistency, and thats unlikely to
> change, but if each individual node is much faster, that may mask it a bit.
>
>
>
> On Mon, Dec 9, 2019 at 12:35 PM Carl Mueller
> <carl.mueller@smartthings.com.invalid> wrote:
>
>> Dynamo salespeople have been pushing autoscaling abilities that have been
>> one of the key temptations to our management to switch off of cassandra.
>>
>> Has anyone done any numbers on how well dynamo will autoscale demand
>> spikes, and how we could architect cassandra to compete with such abilities?
>>
>> We probably could overprovision and with the presumably higher cost of
>> dynamo beat it, although the sales engineers claim they are closing the
>> cost factor too. We could vertically scale to some degree, but node
>> expansion seems close.
>>
>> VNode expansion is still limited to one at a time?
>>
>> We use VNodes so we can't do netflix's cluster doubling, correct? With
>> cass 4.0's alleged segregation of the data by token we could though and
>> possibly also "prep" the node by having the necessary sstables already
>> present ahead of time?
>>
>> There's always "caching" too, but there isn't a lot of data on general
>> fronting of cassandra with caches, and the row cache continues to be mostly
>> useless?
>>
>

Mime
View raw message