cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jack Krupansky <jack.krupan...@gmail.com>
Subject Re: Re: Dynamic Columns
Date Thu, 22 Jan 2015 05:51:17 GMT
Peter,

At least from your description, the proposed use of the clustering column
name seems at first blush to fully fit the bill. The point is not that the
resulting clustered primary key is used to reference an object, but that a
SELECT on the partition key references the entire object, which will be a
sequence of CQL3 rows in a partition, and then the clustering column key is
added when you wish to access that specific aspect of the object. What's
missing? Again, just store the partition key to reference the full object -
no pollution required!

And please note that any number of clustering columns can be specified, so
more structured "dynamic columns" can be supported. For example, you could
have a timestamp as a separate clustering column to maintain temporal state
of the database. The partition key can also be structured from multiple
columns as a composite partition key as well.

As far as all these static columns, consider them optional and merely an
optimization. If you wish to have a 100% opaque object model, you wouldn't
have any static columns and the only non-primary key column would be the
blob value field. Every object attribute would be specified using another
clustering column name and blob value. Presto, everything you need for a
pure, opaque, fully-generalized object management system - all with just
CQL3. Maybe we should include such an example in the doc and with the
project to more strongly emphasize this capability to fully model
arbitrarily complex object structures - including temporal structures.

Anything else missing?

As a general proposition, you can use the term "clustering column" in CQL3
wherever you might have used "dynamic column" in Thrift. The point in CQL3
is not to eliminate a useful feature, dynamic column, but to repackage the
feature to make a lot more sense for the vast majority of use cases. Maybe
there are some cases that doesn't exactly fit as well as desired, but feel
free to specifically identify such cases so that we can elaborate how we
think they are covered or at least covered well enough for most users.


-- Jack Krupansky

On Wed, Jan 21, 2015 at 12:19 PM, Peter Lin <woolfel@gmail.com> wrote:

>
> the example you provided does not work for for my use case.
>
>   CREATE TABLE t (
>     key blob,
>     static my_static_column_1 int,
>     static my_static_column_2 float,
>     static my_static_column_3 blob,
>     ....,
>     dynamic_column_name blob,
>     dynamic_column_value blob,
>     PRIMARY KEY (key, dynamic_column_name);
>   )
>
> the dynamic column can't be part of the primary key. The temporal entity
> key can be the default UUID or the user can choose the field in their
> object. Within our framework, we have concept of temporal links between one
> or more temporal entities. Poluting the primary key with the dynamic column
> wouldn't work.
>
> Please excuse the confusing RDB comparison. My point is that Cassandra's
> dynamic column feature is the "unique" feature that makes it better than
> traditional RDB or newSql like VoltDB for building temporal databases. With
> databases that require static schema + alter table for managing schema
> evolution, it makes it harder and results in down time.
>
> One of the challenges of data management over time is evolving the data
> model and making queries simple. If the record is 5 years old, it probably
> has a difference schema than a record inserted this week. With temporal
> databases, every update is an insert, so it's a little bit more complex
> than just "use a blob". There's a whole level of complication with temporal
> data and CQL3 custom types isn't clear to me. I've read the CQL3
> documentation on the custom types several times and it is rather poor. It
> gives me the impression there's still work needed to get custom types in
> good shape.
>
> With regard to examples others have told me, your advice is fair. A few
> minutes with google and some blogs should pop up. The reason I bring these
> things up isn't to put down CQL. It's because I care and want to help
> improve Cassandra by sharing my experience. I consistently recommend new
> users learn and understand both Thrift and CQL.
>
>
>
> On Wed, Jan 21, 2015 at 11:45 AM, Sylvain Lebresne <sylvain@datastax.com>
> wrote:
>
>> On Wed, Jan 21, 2015 at 4:44 PM, Peter Lin <woolfel@gmail.com> wrote:
>>
>>> I don't remember other people's examples in detail due to my shitty
>>> memory, so I'd rather not misquote.
>>>
>>
>> Fair enough, but maybe you shouldn't use "people's examples you don't
>> remenber" as argument then. Those examples might be wrong or outdated and
>> that kind of stuff creates confusion for everyone.
>>
>>
>>>
>>> In my case, I mix static and dynamic columns in a single column family
>>> with primitives and objects. The objects are temporal object graphs with a
>>> known type. Doing this type of stuff is basically transparent for me, since
>>> I'm using thrift and our data modeler generates helper classes. Our tooling
>>> seamlessly convert the bytes back to the target object. We have a few
>>> standard static columns related to temporal metadata. At any time, dynamic
>>> columns can be added and they can be primitives or objects.
>>>
>>
>> I don't see anything in that that cannot be done with CQL. You can mix
>> static and dynamic columns in CQL thanks to static columns. More precisely,
>> you can do what you're describing with a table looking a bit like this:
>>   CREATE TABLE t (
>>     key blob,
>>     static my_static_column_1 int,
>>     static my_static_column_2 float,
>>     static my_static_column_3 blob,
>>     ....,
>>     dynamic_column_name blob,
>>     dynamic_column_value blob,
>>     PRIMARY KEY (key, dynamic_column_name);
>>   )
>>
>> And your helper classes will serialize your objects as they probably do
>> today (if you use a custom comparator, you can do that too). And let it be
>> clear that I'm not pretending that doing it this way is tremendously
>> simpler than thrift. But I'm saying that 1) it's possible and 2) while it's
>> not meaningfully simpler than thriftMy , it's not really harder either (and
>> in fact, it's actually less verbose with CQL than with raw thrift).
>>
>>
>>>
>>> For the record, doing this kind of stuff in a relational database sucks
>>> horribly.
>>>
>>
>> I don't know what that has to do with CQL to be honest. If you're doing
>> relational with CQL you're doing it wrong. And please note that I'm not
>> saying CQL is the perfect API for modeling temporal data. But I don't get
>> how thrift, which is very crude API, is a much better API at that than CQL
>> (or, again, how it allows you to do things you can't with CQL).
>>
>> --
>> Sylvain
>>
>
>

Mime
View raw message