incubator-cassandra-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ryan King <r...@twitter.com>
Subject Re: Fixing the data model names
Date Wed, 12 Aug 2009 22:05:19 GMT
I'm not going to go into my full position on this issue, because I
agree with Evan (we developed the proposal together).

I would like to reiterate, one of our main motivations behind renaming
the data model is to make it easier for people to get up to speed with
Cassandra.

Evan and I both had problems understanding the data model and we've
seen the same struggles over and over as we try and explain the data
model to other engineers here at twitter. So, after developing this
proposal for a new naming scheme, we tested it with more engineers, to
see if it was, in fact, easier to explain. We didn't do a rigorous
study, but without a doubt it was clearer and easier to understand.
And these are all people who've read the BigTable and Dynamo papers,
most of whom have CS (bachelors' or masters') degrees and are
generally smart.

I'm not saying this is a definitive study, but I think we need to try
and understand the perspective of the n00bs.

On Wed, Aug 12, 2009 at 11:52 AM, Jonathan Ellis<jbellis@gmail.com> wrote:
> My brief two cents:
>
> I think terminology + api changes need to be a big improvement to be
> worth breaking things at this point, and I don't think this proposal
> meets that bar.  In fact I'm not sure any proposal could.
>
> On the specifics:
>
> * Keyspace vs Database
>
> Actually the right concept from the rdb world is "schema."  (Maybe it
> is a mysql-ism to call these "databases?")
>
> I deliberately avoided that term though, possibly mistakenly.
>
> * ColumnFamily vs Record collection
>
> -1.  CF correctly implies "group of columns" to me without being so
> generic it could apply to anything.

But a CF isn't a "group of columns", it's a group of <thing without a
name>'s, which contain columns. This naming caused me to believe that
you have something (row/record) that spans multiple column families.

> * Record vs Row
>
> I don't really care, I guess, but row never really seemed confusing to me.
>
> * Column vs Attribute
>
> Definitely -1 on this too.  Both imply "a named value" but column is
> from the database world but attribute is from OO.  The connotations
> are wrong.  Here the baggage from a relational background is mostly
> correct.  As Evan notes the difference is that ColumnFamilies are
> sparse, but that is a difference between CFs and Tables not between
> the different concepts of Columns per se.

I think my problem with using column here is that it implies that you
can do stuff with columns from multiple rows/records.

> * SuperColumn vs Attribute Collection
>
> SuperColumn is probably the worst name here, but calling it a
> ColumnCollection would not be an improvement.  (I can have a
> Collection<Column> in my own code, and do, but that is not the same
> thing at all.)
>
> So having thought it through I think I would have to say I think the
> current names, if not perfect, are underrated.  Even if making the
> change were free, and it's obviously not, I would prefer the existing
> terminology.

I think, overall, the naming is a significant barrier to entry for new
cassandra users. This proposal will certainly be expensive, both in
terms of the work (which we at twitter are willing to do) and the
disruption. However, we're still early in Cassandra's life and this
may be our only chance to improve this situation.

-ryan

Mime
View raw message