incubator-cassandra-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Koziarski <>
Subject Re: Fixing the data model names
Date Thu, 13 Aug 2009 00:52:12 GMT
> I agree that individually, the current names are technically accurate
> in their specific contexts. But taken as a whole, they make
> practically no sense to someone starting out, as Ryan mentions. I'll
> poke around try to come up with some other possible term sets. The
> point isn't that they are *this* specific set, just that they are
> internally consistent, and analogous to things widely understood.

I don't want to weigh in on the cost-side of the equation, I'm not
qualified to know how much work is involved in that.

However I think it's worth considering this from a strategic
perspective, looking at how we want the project do grow and change,
rather than just as it is right now.  The key to successful adoption
is having a successful elevator pitch,  you can start using a database
without understanding relational-algebra because 'table' and 'column'
are such simple ways to reason about the tool.  As it stands
cassandra's takes a whiteboard and 15 minutes, before people get what
you're talking about

Assuming the project gets anything like the adoption it deserves, the
users we have today will be a *tiny minority* of the users we have in
the future.  So imposing costs on the current userbase which will give
huge benefits to future users, should be something we're willing to
do.  In fact it's something that has been done repeatedly over the
last few weeks.

SuperColumnFamilies have had their behaviour *completely* change with
the addition of comparators for subcolumns, the on disk format has
changed and the configuration file format is completely different.
All of these changes have been great and are huge positive
improvements, but have imposed significant taxes on existing users.
Given those changes went in without debate, I'm not sure what the
reluctance is for making changes to the nomenclature for the project.

Speaking as someone who's only been doing this a month, the naming is
*still* confusing, and when I talk with people who wonder what
cassandra is all about I get blank looks when telling them what things
are called.  If you step back and want to tell someone how you'd
insert a tweet into someone's timeline using evan's weblog post:

  "You just take the user's key, and use that to insert into the
SuperColumnFamily 'UserAssociations' at SubColumn 'user_timeline', a
ColumnName of a time based uuid representing now, and a value of the
new tweet's key"

Column is in the name of 3 of the 5 concepts expressed, and in each
cases it's different.  In none of the cases does it correspond to what
users coming from an RDBMS background think of a column.  Additionally
the names SuperColumnFamily and ColumnFamily don't cover the main
difference, it just makes one sound scalier than the other.

I have no idea what alternative names are, and am reluctant to try as
I'm still a newbie here and have precisely one rejected patch to my
name, but I do strongly think that at the very least we should
strongly consider renaming anything with 'Column'  in the name.



View raw message