cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Arun Chaitanya <chaitan64a...@gmail.com>
Subject Re: 10000+ CF support from Cassandra
Date Tue, 02 Jun 2015 02:19:50 GMT
Thanks Jon and Jack,

> I strongly advise against this approach.
Jon, I think so too. But so you actually foresee any problems with this
approach?
I can think of a few. [I want to evaluate if we can live with this problem]

   - No more CQL.
   - No data types, everything needs to be a blob.
   - Limited clustering Keys and default clustering order.

> First off, different workloads need different tuning.
Sorry for this naive question but how important is this tuning? Can this
have a huge impact in production?

> You might want to consider a model where you have an application layer
that maps logical tenant tables into partition keys within a single large
Casandra table, or at least a relatively small number of  Cassandra tables.
It will depend on the typical size of your tenant tables - very small ones
would make sense within a single partition, while larger ones should have
separate partitions for a tenant's data. The key here is that tables are
expensive, but partitions are cheap and scale very well with Cassandra.
We are actually trying similar approach. But we don't want to expose this
to application layer. We are attempting to hide this and provide an API.

> Finally, you said "10 clusters", but did you mean 10 nodes? You might
want to consider a model where you do indeed have multiple clusters, where
each handles a fraction of the tenants, since there is no need for separate
tenants to be on the same cluster.
I meant 10 clusters. We want to split our tables across multiple clusters
if above approach is not possible. [But it seems to be very costly]

Thanks,







On Fri, May 29, 2015 at 5:49 AM, Jack Krupansky <jack.krupansky@gmail.com>
wrote:

> How big is each of the tables - are they all fairly small or fairly large?
> Small as in no more than thousands of rows or large as in tens of millions
> or hundreds of millions of rows?
>
> Small tables are are not ideal for a Cassandra cluster since the rows
> would be spread out across the nodes, even though it might make more sense
> for each small table to be on a single node.
>
> You might want to consider a model where you have an application layer
> that maps logical tenant tables into partition keys within a single large
> Casandra table, or at least a relatively small number of Cassandra tables.
> It will depend on the typical size of your tenant tables - very small ones
> would make sense within a single partition, while larger ones should have
> separate partitions for a tenant's data. The key here is that tables are
> expensive, but partitions are cheap and scale very well with Cassandra.
>
> Finally, you said "10 clusters", but did you mean 10 nodes? You might want
> to consider a model where you do indeed have multiple clusters, where each
> handles a fraction of the tenants, since there is no need for separate
> tenants to be on the same cluster.
>
>
> -- Jack Krupansky
>
> On Tue, May 26, 2015 at 11:32 PM, Arun Chaitanya <chaitan64arun@gmail.com>
> wrote:
>
>> Good Day Everyone,
>>
>> I am very happy with the (almost) linear scalability offered by C*. We
>> had a lot of problems with RDBMS.
>>
>> But, I heard that C* has a limit on number of column families that can be
>> created in a single cluster.
>> The reason being each CF stores 1-2 MB on the JVM heap.
>>
>> In our use case, we have about 10000+ CF and we want to support
>> multi-tenancy.
>> (i.e 10000 * no of tenants)
>>
>> We are new to C* and being from RDBMS background, I would like to
>> understand how to tackle this scenario from your advice.
>>
>> Our plan is to use Off-Heap memtable approach.
>> http://www.datastax.com/dev/blog/off-heap-memtables-in-Cassandra-2-1
>>
>> Each node in the cluster has following configuration
>> 16 GB machine (8GB Cassandra JVM + 2GB System + 6GB Off-Heap)
>> IMO, this should be able to support 1000 CF with no(very less) impact on
>> performance and startup time.
>>
>> We tackle multi-tenancy using different keyspaces.(Solution I found on
>> the web)
>>
>> Using this approach we can have 10 clusters doing the job. (We actually
>> are worried about the cost)
>>
>> Can you please help us evaluate this strategy? I want to hear communities
>> opinion on this.
>>
>> My major concerns being,
>>
>> 1. Is Off-Heap strategy safe and my assumption of 16 GB supporting 1000
>> CF right?
>>
>> 2. Can we use multiple keyspaces to solve multi-tenancy? IMO, the number
>> of column families increase even when we use multiple keyspace.
>>
>> 3. I understand the complexity using multi-cluster for single
>> application. The code base will get tightly coupled with infrastructure. Is
>> this the right approach?
>>
>> Any suggestion is appreciated.
>>
>> Thanks,
>> Arun
>>
>
>

Mime
View raw message