incubator-cassandra-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Evan Weaver <>
Subject Re: Fixing the data model names
Date Thu, 13 Aug 2009 00:23:35 GMT
Re. Jonathan on "database": oracle/sqlserver/mysql/postgres call it a
database. I guess "schema" is ok, but it seems like a case of "why be
different" (see, I can play both sides here :-p ). I didn't know that
"row" was considered a real thing in Cassandra.

Re. Jonathan on "columns": it would make more sense if "column family"
was actually called "sparse table". But super columns break the
tabular model, so I don't think pretending to be tabular is a good
answer. Personally I prefer the terms borrowed from document databases
(I didn't realize that "attribute" was the relational-theory term).
Maybe "field" and "field set" is better.

I agree that individually, the current names are technically accurate
in their specific contexts. But taken as a whole, they make
practically no sense to someone starting out, as Ryan mentions. I'll
poke around try to come up with some other possible term sets. The
point isn't that they are *this* specific set, just that they are
internally consistent, and analogous to things widely understood.


On Wed, Aug 12, 2009 at 6:05 PM, Ryan King<> wrote:
> I'm not going to go into my full position on this issue, because I
> agree with Evan (we developed the proposal together).
> I would like to reiterate, one of our main motivations behind renaming
> the data model is to make it easier for people to get up to speed with
> Cassandra.
> Evan and I both had problems understanding the data model and we've
> seen the same struggles over and over as we try and explain the data
> model to other engineers here at twitter. So, after developing this
> proposal for a new naming scheme, we tested it with more engineers, to
> see if it was, in fact, easier to explain. We didn't do a rigorous
> study, but without a doubt it was clearer and easier to understand.
> And these are all people who've read the BigTable and Dynamo papers,
> most of whom have CS (bachelors' or masters') degrees and are
> generally smart.
> I'm not saying this is a definitive study, but I think we need to try
> and understand the perspective of the n00bs.
> On Wed, Aug 12, 2009 at 11:52 AM, Jonathan Ellis<> wrote:
>> My brief two cents:
>> I think terminology + api changes need to be a big improvement to be
>> worth breaking things at this point, and I don't think this proposal
>> meets that bar.  In fact I'm not sure any proposal could.
>> On the specifics:
>> * Keyspace vs Database
>> Actually the right concept from the rdb world is "schema."  (Maybe it
>> is a mysql-ism to call these "databases?")
>> I deliberately avoided that term though, possibly mistakenly.
>> * ColumnFamily vs Record collection
>> -1.  CF correctly implies "group of columns" to me without being so
>> generic it could apply to anything.
> But a CF isn't a "group of columns", it's a group of <thing without a
> name>'s, which contain columns. This naming caused me to believe that
> you have something (row/record) that spans multiple column families.
>> * Record vs Row
>> I don't really care, I guess, but row never really seemed confusing to me.
>> * Column vs Attribute
>> Definitely -1 on this too.  Both imply "a named value" but column is
>> from the database world but attribute is from OO.  The connotations
>> are wrong.  Here the baggage from a relational background is mostly
>> correct.  As Evan notes the difference is that ColumnFamilies are
>> sparse, but that is a difference between CFs and Tables not between
>> the different concepts of Columns per se.
> I think my problem with using column here is that it implies that you
> can do stuff with columns from multiple rows/records.
>> * SuperColumn vs Attribute Collection
>> SuperColumn is probably the worst name here, but calling it a
>> ColumnCollection would not be an improvement.  (I can have a
>> Collection<Column> in my own code, and do, but that is not the same
>> thing at all.)
>> So having thought it through I think I would have to say I think the
>> current names, if not perfect, are underrated.  Even if making the
>> change were free, and it's obviously not, I would prefer the existing
>> terminology.
> I think, overall, the naming is a significant barrier to entry for new
> cassandra users. This proposal will certainly be expensive, both in
> terms of the work (which we at twitter are willing to do) and the
> disruption. However, we're still early in Cassandra's life and this
> may be our only chance to improve this situation.
> -ryan

Evan Weaver

View raw message