cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From DuyHai Doan <doanduy...@gmail.com>
Subject Re: Maximum number of columns in a table
Date Thu, 15 Sep 2016 15:48:19 GMT
"The data model is too dynamic"

--> then create a table to cope with dynamic data types. Example

CREATE TABLE dynamic_data(
     object_id uuid,
     property_name text,
     property_type text,
     bool_value boolean,
     long_value bigint,
     decimal_value double,
     text_value text,
     date_value timestamp,
     uuid_value uuid,
     ...,
    PRIMARY KEY ((object_id), property_name)
);

Consider the following object in JSON format:

{
   "id": xxxx,
    "name": "John DOE",
    "age":  32,
    "last_visited_date":  "2016-09-10 12:01:03",
}

It would result into

BEGIN UNLOGGED BATCH
INSERT INTO dynamic_data(object_id, property_name, property_type,
text_value) VALUES(xxx, 'name', 'John DOE');
INSERT INTO dynamic_data(object_id, property_name, property_type,
long_value) VALUES(xxx, 'age', 32);
INSERT INTO dynamic_data(object_id, property_name, property_type,
date_value) VALUES(xxx, 'last_visited_date', '2016-09-10 12:01:03');
APPLY BATCH;

You can use safely unlogged batch because the partition key is the same for
all rows so C* is clever enough to coalesce all the inserts into a single
mutation. There will be no overhead because of the batch.

To fetch all values of the object: SELECT * FROM dynamic_data WHERE
object_id = xxx LIMIT 1000;

To delete the whole object, use delete by partition key: DELETE FROM
dynamic_date WHERE object_id = xxx;

To delete a single property, provide also the property name: DELETE FROM
dynamic_date WHERE object_id = xxx AND property_name = 'last_visited_date';

To add a new property to an existing object, just insert:  INSERT INTO
dynamic_data(object_id,
property_name, property_type, bool_value) VALUES(xxx, 'is_married', false);

The only drawback of this data model is that it is abstract e.g. by just
looking at the schema you cannot really tell what kind of data is contains,
but it is precisely what you look for ...


On Thu, Sep 15, 2016 at 4:19 PM, Dorian Hoxha <dorian.hoxha@gmail.com>
wrote:

> Since I will only have 1 table with that many columns, and the other
> tables will be "normal" tables with max 30 columns, and the memory of 2K
> columns won't be that big, I'm gonna guess I'll be fine.
>
> The data model is too dynamic, the alternative would be to create a table
> for each user which will have even more overhead since the number of users
> is in the several thousands/millions.
>
>
> On Thu, Sep 15, 2016 at 3:04 PM, DuyHai Doan <doanduyhai@gmail.com> wrote:
>
>> There is no real limit in term of number of columns in a table, I would
>> say that the impact of having a lot of columns is the amount of meta data
>> C* needs to keep in memory for encoding/decoding each row.
>>
>> Now, if you have a table with 1000+ columns, the problem is probably your
>> data model...
>>
>> On Thu, Sep 15, 2016 at 2:59 PM, Dorian Hoxha <dorian.hoxha@gmail.com>
>> wrote:
>>
>>> Is there alot of overhead with having a big number of columns in a table
>>> ? Not unbounded, but say, would 2000 be a problem(I think that's the
>>> maximum I'll need) ?
>>>
>>> Thank You
>>>
>>
>>
>

Mime
View raw message