cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Arun Chaitanya <chaitan64a...@gmail.com>
Subject Re: 10000+ CF support from Cassandra
Date Thu, 28 May 2015 06:49:48 GMT
Hello Jack,

> Column families? As opposed to tables? Are you using Thrift instead of
CQL3? You should be focusing on the latter, not the former.
We have an ORM developed in our company, which maps each DTO to a column
family. So, we have many column families. We are using CQL3.

> But either way, the general guidance is that there is no absolute limit
of tables per se, but "low hundreds" is the recommended limit, regardless
of whether how many key spaces they may be divided
> between. More than that is an anti-pattern for Cassandra - maybe you can
make it work for your application, but it isn't recommended.
You want to say that most cassandra users don't have more than 2-300 column
families? Is this achieved through careful data modelling?

> A successful Cassandra deployment is critically dependent on careful data
modeling - who is responsible for modeling each of these tables, you and a
single, tightly-knit team with very common interests > and very specific
goals and SLAs or many different developers with different managers with
different goals such as SLAs?
The latter.

> When you say multi-tenant, are you simply saying that each of your
organization's customers has their data segregated, or does each customer
have direct access to the cluster?
Each organization's data is in the same cluster. No customer doesn't have
access to the cluster.

Thanks,
Arun

On Wed, May 27, 2015 at 7:17 PM, Jack Krupansky <jack.krupansky@gmail.com>
wrote:

> Scalability of Cassandra refers primarily to number of rows and number of
> nodes - to add more data, add more nodes.
>
> Column families? As opposed to tables? Are you using Thrift instead of
> CQL3? You should be focusing on the latter, not the former.
>
> But either way, the general guidance is that there is no absolute limit of
> tables per se, but "low hundreds" is the recommended limit, regardless of
> whether how many key spaces they may be divided between. More than that is
> an anti-pattern for Cassandra - maybe you can make it work for your
> application, but it isn't recommended.
>
> A successful Cassandra deployment is critically dependent on careful data
> modeling - who is responsible for modeling each of these tables, you and a
> single, tightly-knit team with very common interests and very specific
> goals and SLAs or many different developers with different managers with
> different goals such as SLAs?
>
> When you say multi-tenant, are you simply saying that each of your
> organization's customers has their data segregated, or does each customer
> have direct access to the cluster?
>
>
>
>
>
> -- Jack Krupansky
>
> On Tue, May 26, 2015 at 11:32 PM, Arun Chaitanya <chaitan64arun@gmail.com>
> wrote:
>
>> Good Day Everyone,
>>
>> I am very happy with the (almost) linear scalability offered by C*. We
>> had a lot of problems with RDBMS.
>>
>> But, I heard that C* has a limit on number of column families that can be
>> created in a single cluster.
>> The reason being each CF stores 1-2 MB on the JVM heap.
>>
>> In our use case, we have about 10000+ CF and we want to support
>> multi-tenancy.
>> (i.e 10000 * no of tenants)
>>
>> We are new to C* and being from RDBMS background, I would like to
>> understand how to tackle this scenario from your advice.
>>
>> Our plan is to use Off-Heap memtable approach.
>> http://www.datastax.com/dev/blog/off-heap-memtables-in-Cassandra-2-1
>>
>> Each node in the cluster has following configuration
>> 16 GB machine (8GB Cassandra JVM + 2GB System + 6GB Off-Heap)
>> IMO, this should be able to support 1000 CF with no(very less) impact on
>> performance and startup time.
>>
>> We tackle multi-tenancy using different keyspaces.(Solution I found on
>> the web)
>>
>> Using this approach we can have 10 clusters doing the job. (We actually
>> are worried about the cost)
>>
>> Can you please help us evaluate this strategy? I want to hear communities
>> opinion on this.
>>
>> My major concerns being,
>>
>> 1. Is Off-Heap strategy safe and my assumption of 16 GB supporting 1000
>> CF right?
>>
>> 2. Can we use multiple keyspaces to solve multi-tenancy? IMO, the number
>> of column families increase even when we use multiple keyspace.
>>
>> 3. I understand the complexity using multi-cluster for single
>> application. The code base will get tightly coupled with infrastructure. Is
>> this the right approach?
>>
>> Any suggestion is appreciated.
>>
>> Thanks,
>> Arun
>>
>
>

Mime
View raw message