incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Edward Capriolo <edlinuxg...@gmail.com>
Subject Re: How expensive are additional keyspaces?
Date Tue, 11 Mar 2014 15:58:59 GMT
The mathematical overhead is one thing. I would guess if you tried some
design with 10,000 keyspaces and then you ran into a bug/performance
problem the first thing someone would say to you is "WTF do you have that
many keyspaces" :) Don't let that be you.


On Tue, Mar 11, 2014 at 11:38 AM, Jeremiah D Jordan <
jeremiah.jordan@gmail.com> wrote:

> Also, in terms of overhead, server side the overhead is pretty much all at
> the Column Family (CF)/Table level, so 100 keyspaces, 1 CF each, is the
> same as 1 keyspace, 100 CF's.
>
> -Jeremiah
>
> On Mar 11, 2014, at 10:36 AM, Jeremiah D Jordan <jeremiah.jordan@gmail.com>
> wrote:
>
> The use of more than one keyspace is not uncommon.  Using 100's of them
> is.  That being said, different keyspaces let you specify different
> replication and different authentication.  If you are not going to be doing
> one of those things, then there really is no point to multiple keyspaces.
>  If you do want to do one of those things, then go for it, make multiple
> keyspaces.
>
>
> -Jeremiah
>
> On Mar 11, 2014, at 10:17 AM, Edward Capriolo <edlinuxguru@gmail.com>
> wrote:
>
> I am not sure. As stated the only benefit of multiple keyspaces is if you
> need:
>
> 1) different replication per keyspace
> 2) different multiple data center configurations per keyspace
>
> Unless you have one of these cases you do not need to do this. I would
> always tackle this problem at the application level using something like:
>
>
> http://hector-client.github.io/hector/build/html/content/virtual_keyspaces.html
>
> Client issues aside, it is not a very common case and I would advice
> against uncommon set ups.
>
>
>
> On Tue, Mar 11, 2014 at 11:08 AM, Keith Wright <kwright@nanigans.com>wrote:
>
>> Does this whole true for the native protocol?  I've noticed that you can
>> create a session object in the datastax driver without specifying a
>> keyspace and so long as you include the keyspace in all queries instead of
>> just table name, it works fine.  In that case, I assume there's only one
>> connection pool for all keyspaces.
>>
>> From: Edward Capriolo <edlinuxguru@gmail.com>
>> Reply-To: "user@cassandra.apache.org" <user@cassandra.apache.org>
>> Date: Tuesday, March 11, 2014 at 11:05 AM
>> To: "user@cassandra.apache.org" <user@cassandra.apache.org>
>> Subject: Re: How expensive are additional keyspaces?
>>
>> The biggest expense of them is that you need to be authenticated to a
>> keyspace to perform and operation. Thus connection pools are bound to
>> keyspaces. Switching a keyspace is an RPC operation. In the thrift client,
>> If you have 100 keyspaces you need 100 connection pools that starts to be a
>> pain very quickly.
>>
>> I suggest keeping everything in one keyspace unless you really need
>> different replication factors and or network replication settings per
>> keyspace.
>>
>>
>> On Tue, Mar 11, 2014 at 10:17 AM, Martin Meyer <elreydetodo@gmail.com>wrote:
>>
>>> Hey all -
>>>
>>> My company is working on introducing a configuration service system to
>>> provide cofig data to several of our applications, to be backed by
>>> Cassandra. We're already using Cassandra for other services, and at
>>> the moment our pending design just puts all the new tables (9 of them,
>>> I believe) in one of our pre-existing keyspaces.
>>>
>>> I've got a few questions about keyspaces that I'm hoping for input on.
>>> Some Google hunting didn't turn up obvious answers, at least not for
>>> recent versions of Cassandra.
>>>
>>> 1) What trade offs are being made by using a new keyspace versus
>>> re-purposing an existing one (that is in active use by another
>>> application)? Organization is the obvious answer, I'm looking for any
>>> technical reasons.
>>>
>>> 2) Is there any per-keyspace overhead incurred by the cluster?
>>>
>>> 3) Does it impact on-disk layout at all for tables to be in a
>>> different keyspace from others? Is any sort of file fragmentation
>>> potentially introduced just by doing this in a new keyspace as opposed
>>> to an exiting one?
>>>
>>> 4) Does it add any metadata overhead to the system keyspace?
>>>
>>> 5) Why might we *not* want to make a separate keyspace for this?
>>>
>>> 6) Does anyone have experience with creating additional keyspaces to
>>> the point that Cassandra can no longer handle it? Note that we're
>>> *not* planning to do this, I'm just curious.
>>>
>>> Cheers,
>>> Martin
>>>
>>
>>
>
>
>

Mime
View raw message