cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Peter Lin <wool...@gmail.com>
Subject Re: Dynamic Columns in Cassandra 2.X
Date Fri, 13 Jun 2014 21:19:33 GMT
the validation type is set to bytes, and my code is type safe, so it knows
which serializers to use. Those dynamic columns are driven off the types in
Java.

Having said that, CQL3 does have a new custom type feature, but the
documentation is basically non-existent on how that actually works. One
could also modify CQL such that insert statements gives Cassandra hints
about what type it is, but I'm not aware of anyone enhancing CQL3 to do
that.

I realize my kind of use case is a bit unique, but I do know of others that
are doing similar kinds of things.




On Fri, Jun 13, 2014 at 5:11 PM, DuyHai Doan <doanduyhai@gmail.com> wrote:

> In thrift, when creating a column family, you need to define
>
> 1) the row/partition key type
> 2) the column comparator type
> 3) the validation type for the actual value (cell in CQL3 terminology)
>
> Unless you use "dynamic composites" feature, which does not exist (and
> probably won't) in CQL3, I don't see how you can have columns with
> "different types" on the same row/partition
>
>
> On Fri, Jun 13, 2014 at 11:06 PM, Peter Lin <woolfel@gmail.com> wrote:
>
>>
>> when I say dynamic column, I mean non-static columns of different types
>> within the same row. Some could be an object or one of the defined
>> datatypes.
>>
>> with thrift I use the appropriate serializer to handle these dynamic
>> columns.
>>
>>
>> On Fri, Jun 13, 2014 at 4:55 PM, DuyHai Doan <doanduyhai@gmail.com>
>> wrote:
>>
>>> Well, before talking and discussing about "dynamic columns", we should
>>> first define it clearly. What do people mean by "dynamic columns" exactly ?
>>> Is it the ability to add many columns "of same type" to an existing
>>> physical row?  If yes then CQL3 does support it with clustering columns.
>>>
>>>
>>> On Fri, Jun 13, 2014 at 10:36 PM, Mark Greene <greenemj@gmail.com>
>>> wrote:
>>>
>>>> Yeah I don't anticipate more than 1000 properties, well under in fact.
>>>> I guess the trade off of using the clustered columns is that I'd have a
>>>> table that would be tall and skinny which also has its challenges w/r/t
>>>> memory.
>>>>
>>>> I'll look into your suggestion a bit more and consider some others
>>>> around a hybrid of CQL and Thrift (where necssary). But from a newb's
>>>> perspective, I sense the community is unsettled around this concept of
>>>> truly dynamic columns. Coming from an HBase background, it's a
>>>> consideration I didn't anticipate having to evaluate.
>>>>
>>>>
>>>> --
>>>> about.me <http://about.me/markgreene>
>>>>
>>>>
>>>> On Fri, Jun 13, 2014 at 4:19 PM, DuyHai Doan <doanduyhai@gmail.com>
>>>> wrote:
>>>>
>>>>> Hi Mark
>>>>>
>>>>>  I believe that in your table you want to have some "common" fields
>>>>> that will be there whatever customer is, and other fields that are entirely
>>>>> customer-dependent, isn't it ?
>>>>>
>>>>>  In this case, creating a table with static columns for the common
>>>>> fields and a clustering column representing all custom fields defined
by a
>>>>> customer could be a solution (see here for static column:
>>>>> https://issues.apache.org/jira/browse/CASSANDRA-6561 )
>>>>>
>>>>> CREATE TABLE user_data (
>>>>>    user_id bigint,
>>>>>    user_firstname text static,
>>>>>    user_lastname text static,
>>>>>    ...
>>>>>    custom_property_name text,
>>>>>    custom_property_value text,
>>>>>    PRIMARY KEY(user_id, custom_property_name, custom_property_value));
>>>>>
>>>>>  Please note that with this solution you need to have "at least one"
>>>>> custom property per customer to make it work
>>>>>
>>>>>  The only thing to take care of is the type of custom_property_value.
>>>>> You need to define it once for all. To accommodate for dynamic types,
you
>>>>> can either save the value as blob or text(as JSON) and take care of the
>>>>> serialization/deserialization yourself at the client side
>>>>>
>>>>>  As an alternative you can save custom properties in a map, provided
>>>>> that their number is not too large. But considering the business case
of
>>>>> CRM, I believe that it's quite rare and user has more than 1000 custom
>>>>> properties isn't it ?
>>>>>
>>>>>
>>>>>
>>>>> On Fri, Jun 13, 2014 at 10:03 PM, Mark Greene <greenemj@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> My use case requires the support of arbitrary columns much like a
>>>>>> CRM. My users can define 'custom' fields within the application.
Ideally I
>>>>>> wouldn't have to change the schema at all, which is why I like the
old
>>>>>> thrift approach rather than the CQL approach.
>>>>>>
>>>>>> Having said all that, I'd be willing to adapt my API to make explicit
>>>>>> schema changes to Cassandra whenever my user makes a change to their
custom
>>>>>> fields if that's an accepted practice.
>>>>>>
>>>>>> Ultimately, I'm trying to figure out of the Cassandra community
>>>>>> intends to support true schemaless use cases in the future.
>>>>>>
>>>>>> --
>>>>>> about.me <http://about.me/markgreene>
>>>>>>
>>>>>>
>>>>>> On Fri, Jun 13, 2014 at 3:47 PM, DuyHai Doan <doanduyhai@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> This strikes me as bad practice in the world of multi tenant
>>>>>>> systems. I don't want to create a table per customer. So I'm
wondering if
>>>>>>> dynamically modifying the table is an accepted practice?  -->
Can you give
>>>>>>> some details about your use case ? How would you "alter" a table
structure
>>>>>>> to adapt it to a new customer ?
>>>>>>>
>>>>>>> Wouldn't it be better to model your table so that it supports
>>>>>>> addition/removal of customer ?
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Fri, Jun 13, 2014 at 9:00 PM, Mark Greene <greenemj@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Thanks DuyHai,
>>>>>>>>
>>>>>>>> I have a follow up question to #2. You mentioned ideally
I would
>>>>>>>> create a new table instead of mutating an existing one.
>>>>>>>>
>>>>>>>> This strikes me as bad practice in the world of multi tenant
>>>>>>>> systems. I don't want to create a table per customer. So
I'm wondering if
>>>>>>>> dynamically modifying the table is an accepted practice?
>>>>>>>>
>>>>>>>> --
>>>>>>>> about.me <http://about.me/markgreene>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Fri, Jun 13, 2014 at 2:54 PM, DuyHai Doan <doanduyhai@gmail.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Hello Mark
>>>>>>>>>
>>>>>>>>>  Dynamic columns, as you said, are perfectly supported
by CQL3 via
>>>>>>>>> clustering columns. And no, using collections for storing
dynamic data is a
>>>>>>>>> very bad idea if the cardinality is very high (>>
1000 elements)
>>>>>>>>>
>>>>>>>>> 1)  Is using Thrift a valid approach in the era of CQL?
 --> Less
>>>>>>>>> and less. Unless you are looking for extreme performance,
you'd better off
>>>>>>>>> choosing CQL3. The ease of programming and querying with
CQL3 does worth
>>>>>>>>> the small overhead in CPU
>>>>>>>>>
>>>>>>>>> 2) If CQL is the best practice,  should I alter the schema
at
>>>>>>>>> runtime when I detect I need to do an schema mutation?
 --> Ideally you
>>>>>>>>> should not alter schema but create a new table to adapt
to your changing
>>>>>>>>> requirements.
>>>>>>>>>
>>>>>>>>> 3) If I utilize CQL collections, will Cassandra page
the entire
>>>>>>>>> thing into the heap?  --> Of course. All collections
and maps in Cassandra
>>>>>>>>> are eagerly loaded entirely in memory on server side.
That's why it is
>>>>>>>>> recommended to limit their cardinality to ~ 1000 elements
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Fri, Jun 13, 2014 at 8:33 PM, Mark Greene <greenemj@gmail.com>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> I'm looking for some best practices w/r/t supporting
arbitrary
>>>>>>>>>> columns. It seems from the docs I've read around
CQL that they are
>>>>>>>>>> supported in some capacity via collections but you
can't exceed 64K in
>>>>>>>>>> size. For my requirements that would cause problems.
>>>>>>>>>>
>>>>>>>>>> So my questions are:
>>>>>>>>>>
>>>>>>>>>> 1)  Is using Thrift a valid approach in the era of
CQL?
>>>>>>>>>>
>>>>>>>>>> 2) If CQL is the best practice,  should I alter the
schema at
>>>>>>>>>> runtime when I detect I need to do an schema mutation?
>>>>>>>>>>
>>>>>>>>>>  3) If I utilize CQL collections, will Cassandra
page the entire
>>>>>>>>>> thing into the heap?
>>>>>>>>>>
>>>>>>>>>> My data model is akin to a CRM, arbitrary column
definitions per
>>>>>>>>>> customer.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Cheers,
>>>>>>>>>> Mark
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Mime
View raw message