incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nick Bailey <n...@datastax.com>
Subject Re: Creating column families per client
Date Wed, 21 Dec 2011 16:50:16 GMT
The overhead for column families was greatly reduced in 0.8 and 1.0.
It should now be possible to have hundreds or thousands of column
families. The setting 'memtable_total_space_in_mb' was introduced that
allows for a global memtable threshold, and cassandra will handle
flushing on its own.

See http://www.datastax.com/dev/blog/whats-new-in-cassandra-1-0-improved-memory-and-disk-space-management

Another thing you should consider is the lack of built in access
controls. There is an authentication/authorization interface you can
plug in to and examples in the examples/ directory of the source
download.

On Wed, Dec 21, 2011 at 10:36 AM, Ryan Lowe <ryanjlowe@gmail.com> wrote:
> What we have done to avoid creating multiple column families is to sort of
> namespace the row key.  So if we have a column family of Users and accounts:
> "AccountA" and "AccountB", we do the following:
>
> Column Family User:
>    "AccountA/ryan" : { first: Ryan, last: Lowe }
>    "AccountB/ryan" : { first: Ryan, last: Smith}
>
> etc.
>
> For our needs, this did the same thing as having 2 "User" column families
> for "AccountA" and "AccountB"
>
> Ryan
>
>
> On Wed, Dec 21, 2011 at 10:34 AM, Flavio Baronti <f.baronti@list-group.com>
> wrote:
>>
>> Hi,
>>
>> based on my experience with Cassandra 0.7.4, i strongly discourage you to
>> do that: we tried dynamical creation of column families, and it was a
>> nightmare.
>> First of all, the operation can not be done concurrently, therefore you
>> must find a way to avoid parallel creation (over all the cluster, not in a
>> single node).
>> The main problem however is with timestamps. The structure of your
>> keyspace is versioned with a time-dependent id, which is assigned by the
>> host where you perform the schema update based on the local machine time. If
>> you do two updates in close succession on two different nodes, and their
>> clocks are not perfectly synchronized (and they will never be), Cassandra
>> might be confused by their relative ordering, and stop working altogether.
>>
>> Bottom line: don't.
>>
>> Flavio
>>
>> Il 12/21/2011 14:45 PM, Rafael Almeida ha scritto:
>>
>>> Hello,
>>>
>>> I am evaluating the usage of cassandra for my system. I will have several
>>> clients who won't share data with each other. My idea is to create one
>>> column family per client. When a new client comes in and adds data to the
>>> system, I'd like to create a column family dynamically. Is that reliable?
>>> Can I create a column family on a node and imediately add new data on that
>>> column family and be confident that the data added will eventually become
>>> visible to a read?
>>>
>>> []'s
>>> Rafael
>>>
>>>
>>>
>>
>

Mime
View raw message