cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Peter Lin <wool...@gmail.com>
Subject Re: Re: Dynamic Columns
Date Wed, 21 Jan 2015 22:00:11 GMT
I apologize if I've offended you, but I clearly stated CQL3 supports
dynamic columns. How it supports dynamic columns is different. If I'm
reading you correctly, I believe we agree both thrift and CQL3 support
dynamic columns. Where we differ that I feel the coverage for existing
thrift use cases isn't 100%. That may be right or wrong, but it is my
impression. I agree with you that CQL3 supports the majority of dynamic
column use cases, but in a slightly different way. There are cases like
mine which fit better in thrift.

Could I rip out all the stuff I did and replace it with CQL3 with a major
redesign? Yes, I could but honestly I see some downsides with that
proposition.

1. for modeling tools like mine an object API is a far better fit in my
bias opinion
2. text based languages like SQL and CQL could "in theory" provide similar
object safety, but it's so much work that most people don't bother. This is
from first hand experience building 3 orms and using most of the open
source orms in the java space. I've also used several orms in .Net and they
all suffer from this pain point. There's a reason why microsoft created
Linq.
3. the structure and syntax of SQL  and all variations of SQL are not
ideally suited to complex data structures that are graphs. A temporal
entity is an object graph that may be shallow (3-8 levels) or deep (15+).
SQL is ideally suited to tables. CQL in this regard is more flexible and
supports collections, but it's still not ideal for things like insurance
policies. Look at the Acord standard for property insurance, if you want to
get a better understanding. For example, a temporal record using ORM could
result in 500 rows of data in a dozen tables for a small entity to 50K+
rows for a large entity. The mailing list isn't the right place to go into
the theory and practice of temporal databases, but a lot of the design
choices I made is based on formal logic.



On Wed, Jan 21, 2015 at 4:06 PM, Sylvain Lebresne <sylvain@datastax.com>
wrote:

> On Wed, Jan 21, 2015 at 6:19 PM, Peter Lin <woolfel@gmail.com> wrote:
>
>> the dynamic column can't be part of the primary key. The temporal entity
>> key can be the default UUID or the user can choose the field in their
>> object. Within our framework, we have concept of temporal links between one
>> or more temporal entities. Poluting the primary key with the dynamic column
>> wouldn't work.
>>
>
> Not totally sure I understand. Are you talking about the underlying
> storage space used? If you are, we can discuss it (it's not too hard to
> remedy it in CQL, I was mainly trying to illustrating my point, not
> pretending this was a drop-in solution for your use case) but it's more of
> a performance discussion, and I think we've somewhat quit the realm of
> "there's things CQL3 doesn't support".
>
>
>> Please excuse the confusing RDB comparison. My point is that Cassandra's
>> dynamic column feature is the "unique" feature that makes it better than
>> traditional RDB or newSql like VoltDB for building temporal databases. With
>> databases that require static schema + alter table for managing schema
>> evolution, it makes it harder and results in down time.
>>
>
> Here again you seem you imply that CQL doesn't support dynamic columns, or
> has a somewhat inferior support, but that's just not true.
>
>
>> One of the challenges of data management over time is evolving the data
>> model and making queries simple. If the record is 5 years old, it probably
>> has a difference schema than a record inserted this week. With temporal
>> databases, every update is an insert, so it's a little bit more complex
>> than just "use a blob". There's a whole level of complication with temporal
>> data and CQL3 custom types isn't clear to me. I've read the CQL3
>> documentation on the custom types several times and it is rather poor. It
>> gives me the impression there's still work needed to get custom types in
>> good shape.
>>
>
> I'm sorry but that's a bit of hand waving. Custom types (and by that I
> mean user-provided AbstractType implementations) works in CQL *exactly*
> like in thrift: they are not in a better or worse shape than in thrift. And
> while the documentation on CQL3 is indeed poor on this part, so is the
> thrift documentation on the same subject (besides, I don't think you're
> whole point is about saying that documentation could be improved). Again,
> what you can do in thrift, you can do in CQL.
>

Honestly I haven't I tried to use CQL3 user provided type. I read the
specification several times and had a ton of questions along with several
other people that were trying to under what it meant. If you want people to
use it, the documentation needs to improve. I did give a good faith effort
and spent a week trying to understand what the spec is trying to say, but
it only resulted in more questions. So yes, I am hand waving because it
left me frustrated. Having been part of apache community for many years,
writing great docs is hard and most of us hate doing it. Just to be clear,
I'm not blaming anyone for poor docs. I'm just as guilty as everyone else
when it comes to docs.


>
>
>> I consistently recommend new users learn and understand both Thrift and
>> CQL.
>>
>
> I understand that you do this with the best of intentions and don't take
> it the wrong way but it is my opinion that you are counterproductive by
> doing so, and this for 2 reasons:
> 1) you don't only recommend users to learn both API, you justify that
> advice by affirming that there is a whole family of important use cases
> that thrift supports and CQL do not. Except that I pretend tat this
> affirmation is technically incorrect, and so far I haven't seen much
> example proving me wrong.
>

honestly the only use cases that matter to me is my use case. I know a lot
of people that use temporal databases in financial and insurance sector.
They all kludge together broken designs starting with static schema and
alter the schema when it evolves. With dynamic columns of either flavor
(cql3 & thrift), people can avoid many of the issues. I happen to prefer
thrift for specific parts of my project and CQL3 for the rest of it. I see
nothing wrong with picking the right tool that fits each use case.

Honestly I don't care who is right or wrong, I care about sharing
knowledge. When I'm wrong, I freely admit it and thank people for pointing
it out.


> 2) there is a wealth of evidence that trying to learn both thrift and CQL
> confuses the hell out of new users. Which is btw not surprising, both API
> presents the same concepts in seemingly different way (even though they do
> are the same concepts) and even have conflicting vocabulary, so it's
> obviously confusing when you try to learn those concepts in the first
> place. Trying to learn CQL when you know thrift well is fine, and why not
> learn thrift once you know and understand CQL well, but learning both is
> imo a bad advice. It could maybe (maybe) be justified if what you say about
> having whole family of use cases not being doable with CQL was true, but
> it's not.
>
>>
>> For the record, doing this kind of stuff in a relational database sucks
>> horribly.
>>
>
> I don't know what that has to do with CQL to be honest. If you're doing
> relational with CQL you're doing it wrong. And please note that I'm not
> saying CQL is the perfect API for modeling temporal data. But I don't get
> how thrift, which is very crude API, is a much better API at that than CQL
> (or, again, how it allows you to do things you can't with CQL).
>
>
I think you're reading too much into it. Since I did a horrible job
explaining it, I'll try again. My point is this. People who come from a SQL
world prefer CQL because it is conceptually similar and less scary. From my
experience, projects that need dynamic columns have a lot of subtlety and
it isn't always clear which approach is best. It may be that CQL3 dynamic
columns is perfectly fine. But here's the thing, unless someone takes the
time to learn and study the subject thoroughly, it's a blind guess. The
point isn't to use Cassandra as a relational database, even if some people
are basically doing that. I share my experience in the hopes that others
can avoid my mistakes



> --
> Sylvain
>
>>
>>

Mime
View raw message