cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Hannu Kröger <hkro...@gmail.com>
Subject Re: Maximum number of columns in a table
Date Thu, 15 Sep 2016 20:24:44 GMT
I do agree on that.

> On 15 Sep 2016, at 16:23, DuyHai Doan <doanduyhai@gmail.com> wrote:
> 
> I'd advise anyone against using the old native secondary index ... You'll get poor performance
(that's the main reason why some people developed SASI).
> 
> On Thu, Sep 15, 2016 at 10:20 PM, Hannu Kröger <hkroger@gmail.com <mailto:hkroger@gmail.com>>
wrote:
> Hi,
> 
> The ‘old-fashioned’ secondary indexes do support index of collection values:
> https://docs.datastax.com/en/cql/3.1/cql/ddl/ddlIndexColl.html <https://docs.datastax.com/en/cql/3.1/cql/ddl/ddlIndexColl.html>
> 
> Br,
> Hannu
> 
>> On 15 Sep 2016, at 15:59, DuyHai Doan <doanduyhai@gmail.com <mailto:doanduyhai@gmail.com>>
wrote:
>> 
>> "But the problem is I can't use secondary indexing "where int25=5", while with normal
columns I can."
>> 
>> You have many objectives that contradict themselves in term of impl.
>> 
>> Right now you're unlucky, SASI does not support indexing collections yet (it may
come in future, when ?  ¯\_(ツ)_/¯ )
>> 
>> If you're using DSE Search or Stratio Lucene Index, you can index map values 
>> 
>> On Thu, Sep 15, 2016 at 9:53 PM, Dorian Hoxha <dorian.hoxha@gmail.com <mailto:dorian.hoxha@gmail.com>>
wrote:
>> Yes that makes more sense. But the problem is I can't use secondary indexing "where
int25=5", while with normal columns I can.
>> 
>> On Thu, Sep 15, 2016 at 8:23 PM, sfescape@gmail.com <mailto:sfescape@gmail.com>
<sfescape@gmail.com <mailto:sfescape@gmail.com>> wrote:
>> I agree a single blob would also work (I do that in some cases). The reason for the
map is if you need more flexible updating. I think your solution of a map/data type works
well.
>> 
>> On Thu, Sep 15, 2016 at 11:10 AM DuyHai Doan <doanduyhai@gmail.com <mailto:doanduyhai@gmail.com>>
wrote:
>> "But I need rows together to work with them (indexing etc)"
>> 
>> What do you mean rows together ? You mean that you want to fetch a single row instead
of 1 row per property right ?
>> 
>> In this case, the map might be the solution:
>> 
>> CREATE TABLE generic_with_maps(
>>    object_id uuid
>>    boolean_map map<text, boolean>
>>    text_map map<text, text>
>>    long_map map<text, long>,
>>    ...
>>    PRIMARY KEY(object_id)
>> );
>> 
>> The trick here is to store all the fields of the object in different map, depending
on the type of the field.
>> 
>> The map key is always text and it contains the name of the field.
>> 
>> Example
>> 
>> {
>>    "id": xxxx,
>>     "name": "John DOE",
>>     "age":  32,
>>     "last_visited_date":  "2016-09-10 12:01:03", 
>> }
>> 
>> INSERT INTO generic_with_maps(id, map_text, map_long, map_date)
>> VALUES(xxx, {'name': 'John DOE'}, {'age': 32}, {'last_visited_date': '2016-09-10
12:01:03'});
>> 
>> When you do a select, you'll get a SINGLE row returned. But then you need to extract
all the properties from different maps, not a big deal
>> 
>> On Thu, Sep 15, 2016 at 7:54 PM, Dorian Hoxha <dorian.hoxha@gmail.com <mailto:dorian.hoxha@gmail.com>>
wrote:
>> @DuyHai
>> Yes, that's another case, the "entity" model used in rdbms. But I need rows together
to work with them (indexing etc).
>> 
>> @sfespace
>> The map is needed when you have a dynamic schema. I don't have a dynamic schema (may
have, and will use the map if I do). I just have thousands of schemas. One user needs 10 integers,
while another user needs 20 booleans, and another needs 30 integers, or a combination of them
all.
>> 
>> On Thu, Sep 15, 2016 at 7:46 PM, DuyHai Doan <doanduyhai@gmail.com <mailto:doanduyhai@gmail.com>>
wrote:
>> "Another possible alternative is to use a single map column"
>> 
>> --> how do you manage the different types then ? Because maps in Cassandra are
strongly typed
>> 
>> Unless you set the type of map value to blob, in this case you might as well store
all the object as a single blob column
>> 
>> On Thu, Sep 15, 2016 at 6:13 PM, sfescape@gmail.com <mailto:sfescape@gmail.com>
<sfescape@gmail.com <mailto:sfescape@gmail.com>> wrote:
>> Another possible alternative is to use a single map column.
>> 
>> 
>> On Thu, Sep 15, 2016 at 7:19 AM Dorian Hoxha <dorian.hoxha@gmail.com <mailto:dorian.hoxha@gmail.com>>
wrote:
>> Since I will only have 1 table with that many columns, and the other tables will
be "normal" tables with max 30 columns, and the memory of 2K columns won't be that big, I'm
gonna guess I'll be fine.
>> 
>> The data model is too dynamic, the alternative would be to create a table for each
user which will have even more overhead since the number of users is in the several thousands/millions.
>> 
>> 
>> On Thu, Sep 15, 2016 at 3:04 PM, DuyHai Doan <doanduyhai@gmail.com <mailto:doanduyhai@gmail.com>>
wrote:
>> There is no real limit in term of number of columns in a table, I would say that
the impact of having a lot of columns is the amount of meta data C* needs to keep in memory
for encoding/decoding each row.
>> 
>> Now, if you have a table with 1000+ columns, the problem is probably your data model...
>> 
>> On Thu, Sep 15, 2016 at 2:59 PM, Dorian Hoxha <dorian.hoxha@gmail.com <mailto:dorian.hoxha@gmail.com>>
wrote:
>> Is there alot of overhead with having a big number of columns in a table ? Not unbounded,
but say, would 2000 be a problem(I think that's the maximum I'll need) ?
>> 
>> Thank You
>> 
>> 
>> 
>> 
>> 
>> 
>> 
> 
> 


Mime
View raw message