cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From jason zhao yang <zhaoyangsingap...@gmail.com>
Subject Re: Cassandra table limitation
Date Thu, 07 Apr 2016 02:26:23 GMT
Hi Thanks,

The schema is different. Putting a tenant id as first partition key will
make spark logic more complex ( filtering is needed in search-all).

> There's also the issue of lots of memtables flushing to disk during
commit log rotations.  Can be problematic.

If this is the case, I think Cassandra cannot even handle more than 10
tables during commit log rotations.

Does number of tables affect the schema modification(create, alter)
performance?

Jonathan Haddad <jon@jonhaddad.com>于2016年4月7日周四 上午5:13写道:

> There's also the issue of lots of memtables flushing to disk during commit
> log rotations.  Can be problematic.
>
> On Wed, Apr 6, 2016 at 2:08 PM Michael Penick <michael.penick@datastax.com>
> wrote:
>
>> Are the tenants using the same schema? If so, you might consider using
>> the tenant's ID as part of the primary key for the tables they have in
>> common.
>>
>> If they're all using different, largish schemas I'm not sure that
>> Cassandra is well suited to that type of multi-tenancy. There's the per
>> overhead memory pre-table and there's that fact that it's difficult to tune
>> a single cluster to handle the different (probably competing) workloads
>> effectively.
>>
>> Mike
>>
>> On Tue, Apr 5, 2016 at 8:40 PM, jason zhao yang <
>> zhaoyangsingapore@gmail.com> wrote:
>>
>>> Hi Jack,
>>>
>>> Thanks for the reply.
>>>
>>> Each tenant will has around 50-100 tables for their applications.
>>> probably log collection, probably account table, it's not fixed and depends
>>> on tenants' need.
>>>
>>> There will be a team in charge of helping tenant to do data modeling and
>>> access patterns. Tenants will not directly admin on the cluster, we will
>>> take care.
>>>
>>> Yes, multi-cluster is a solution. But the cost will be quite high,
>>> because each tenant's data is far less than the capacity of a 3 node
>>> cluster. So I want to put multiple tenants into one clusters.
>>>
>>>
>>>
>>> Jack Krupansky <jack.krupansky@gmail.com>于2016年4月6日周三 上午10:41写道:
>>>
>>>> What is the nature of these tenants? Are they each creating their own
>>>> data models? Is there one central authority that will approve of all data
>>>> models and who can adjust the cluster configuration to support those models?
>>>>
>>>> Generally speaking, multi-tenancy is an anti-pattern for Cassandra and
>>>> for most servers. The proper way to do multitenancy is to not do it at all,
>>>> and to use separate machines or at least separate virtual machines.
>>>>
>>>> In particular, there needs to be a central authority managing a
>>>> Cassandra cluster to assure its smooth operation. If each tenant is going
>>>> in their own directions, then nobody will be in charge and capable of
>>>> assuring that everybody is on the same page.
>>>>
>>>> Again, it depends on the nature of these tenants and how much control
>>>> the cluster administrator has over them.
>>>>
>>>> Think of a Cassandra cluster as managing the data for either a single
>>>> application or a collection of applications which share the same data. If
>>>> there are multiple applications that don't share the same data, then they
>>>> absolutely should be on separate clusters.
>>>>
>>>>
>>>> -- Jack Krupansky
>>>>
>>>> On Tue, Apr 5, 2016 at 5:40 PM, Kai Wang <depend@gmail.com> wrote:
>>>>
>>>>> Once a while the question about table count rises in this list. The
>>>>> most recent is
>>>>> https://groups.google.com/forum/#!topic/nosql-databases/IblAhiLUXdk
>>>>>
>>>>> In short C* is not designed to scale with the table count. For one
>>>>> each table/CF has some fixed memory footprint on *ALL* nodes. The consensus
>>>>> is you shouldn't have more than "a few hundreds" of tables.
>>>>>
>>>>> On Mon, Apr 4, 2016 at 10:17 AM, jason zhao yang <
>>>>> zhaoyangsingapore@gmail.com> wrote:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> This is Jason.
>>>>>>
>>>>>> Currently, I am using C* 2.1.10, I want to ask what's the optimal
>>>>>> number of tables I should create in one cluster?
>>>>>>
>>>>>> My use case is that I will prepare a keyspace for each of my tenant,
>>>>>> and every tenant will create tables they needed. Assume each tenant
created
>>>>>> 50 tables with normal workload (half read, half write).   so how
many
>>>>>> number of tenants I can support in one cluster?
>>>>>>
>>>>>> I know there are a few issues related to large number of tables.
>>>>>> * frequent GC
>>>>>> * frequent flush due to insufficient memory
>>>>>> * large latency when modifying table schema
>>>>>> * large amount of tombstones during creating table
>>>>>>
>>>>>> Is there any other issues with large number of tables? Using a 32GB
>>>>>> instance, I can easily create 4000 tables with off-heap-memtable.
>>>>>>
>>>>>> BTW, Is this table limitation solved in 3.X?
>>>>>>
>>>>>> Thank you very much.
>>>>>>
>>>>>>
>>>>>
>>>>
>>

Mime
View raw message