cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Edward Capriolo <edlinuxg...@gmail.com>
Subject Re: implications of using more keyspaces vs single keyspace?
Date Tue, 02 Aug 2011 00:35:07 GMT
On Mon, Aug 1, 2011 at 6:08 PM, Yang <teddyyyy123@gmail.com> wrote:

> for example my data consists of "salary", "office stationery list",
>
> let's say I do use the same replicationStrategy for  them, these 2
> data sets have
> different key ranges, key distributions,
>
> then is it better to use separate keyspaces for each of them? or use a
> single one?
>
> the factors I can think of:
> separate: have to call set_keyspace() if your calls switch between datasets
>                potential to change to different replication factor in
> the future
>
> any thoughts?
>
> Thanks a lot
> Yang
>

Ah interesting question.

In the old days operations a operations like get() took keyspace as the
first string argument. Now changing keyspace requires running
setKeyspace(String) which is an extra RPC operation. If you want to interact
with two keyspaces you either need to keep two connection pools open, or you
have to use an RPC call every time you want to change keyspaces. While the
smaller signature for the get() is nice having the extra RPC call is not
good.

However as you mentioned you can only apply different replication factors on
the keyspace level. That is nice especially if you find one column family is
not as important as another. Since a keyspace is a folder you can also mount
a keyspace on a different physical device.

I still like one column family per keyspace, but having N connection pools
for N keyspaces complicates things.

Mime
View raw message