cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Oskar Kjellin <oskar.kjel...@gmail.com>
Subject Re: Using Cassandra for my usecase
Date Mon, 12 Jun 2017 06:55:03 GMT
You could put the tenant as a column that is part of the clustering key. That avoids large
partitions. 

On 12 Jun 2017, at 07:14, Erick Ramirez <flightctlr@gmail.com> wrote:

>> Given my use case is cassandra the best suited one or is there any other database
which suits my requirement better?
> 
> Probably not the right forum for that question. It's like walking into a Ford dealership
and asking if the Mustang is the best car for you. 😄
> 
> In any case, you would choose Cassandra because you require:
> - high availability
> - very fast reads
> - no single-point-of-failure
> - no downtime
> - you have a scale problem
> - etc
> 
>> What would be best way to implement multi-tenancy?
> 
> The "best" way is what works for your use case based on testing you've done. As you already
are aware in the example you provided, adding a column as the tenant indicator could lead
to large partitions so you need to be careful about how you model your data.
> 
> Some implementations completely side-step this by distributing tenants across keyspaces
but that may not suit your needs.
> 
>> Given that I need to query by multiple dimensions would denormalized tables work
better or should I be using materialized views?
> 
> With denormalised tables, your application needs to implement the logic for batching
the updates together.
> 
> With materialised views, that complexity is managed for you by C* but you need to be
aware of the performance impact associated with it. For example with RF=3 on the base table,
MV adds another RF=3 for an additional table so RF=3+3. A second MV increases RF=3+3+3 and
so on.
> 
>> Anything else that I need to consider based on your experiences with cassandra?
> 
> 
> Multi-tenancy can be difficult particularly for complex use cases. Test, test and test.
And make sure you always correctly size your cluster with enough nodes.
> 
> You need to limit the number of tables to about 200 at the most (regardless of the number
of keyspaces). Having too many tables puts pressure on the heap of each node.
> 
> Good luck!
> 
>> On Sun, Jun 11, 2017 at 2:07 AM, Govindarajan Srinivasaraghavan <govindraghvan@gmail.com>
wrote:
>> Hi All,
>> 
>> Just to give a background I'm working on a project where I need to store fast incoming
time series data and have rest api's to query and serve the data to users when needed. The
data as such is a single JSON which is 1kb in size and the data has to be purged after a specific
time period (say few weeks or months). The incoming rate would be approximately 100k messages
per second and the biggest challenge is the data should be query-able by multiple dimensions
with sorting, paging and data dump options. 
>> 
>> I started looking into database options and felt like cassandra might be a good choice
for my use case since the requirement needs faster writes. In order to query by multiple dimensions
I had to insert the same record into multiple denormalized tables (around 8 tables). Now I
need to implement multitenancy and having an extra column in the partition key to query by
tenant will not work since there will be some tenants with huge amounts of data compared to
the rest. My other option is to have the tenant identifier appended to the table names so
that I can perform per teannt queries easily. 
>> 
>> Here are my questions for which I need some help.
>> - Given my use case is cassandra the best suited one or is there any other database
which suits my requirement better?
>> - What would be best way to implement multi-tenancy?
>> - Given that I need to query by multiple dimensions would denormalized tables work
better or should I be using materialized views?
>> - Anything else that I need to consider based on your experiences with cassandra?
>> 
>> Thanks
> 

Mime
View raw message