incubator-cassandra-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Evan Weaver <>
Subject Re: Fixing the data model names
Date Thu, 13 Aug 2009 01:51:39 GMT
Hmm, my Ruby client internally refers to columns and subcolumns,
rather than supercolumns and columns...mainly because the subcolumn
position is optional, but the column_or_supercolumn position is not.
So there is something we agree on.

Do you think the lack of a timestamp in the supercolumn is confusing?
It's still not exactly a kind of column.


On Wed, Aug 12, 2009 at 9:47 PM, Jonathan Ellis<> wrote:
> I agree with the proposition that the SuperColumn name is weak.
> (Although not, as I mentioned, Column or ColumnFamily.)  And I could
> go with schema over keyspace.
> One option to deal with SC would be to excise the term SC (and SCF
> from the config) and instead just have Columns, which may or may not
> have SubColumns.  You would define this as
> <ColumnFamily withSubColumns="true" .../>
> "Insert a subcolumn named A into the Column named B" fits pretty well
> with how I think of things working.  And now you just have Rows and
> Columns!  Just like a RDB! :P
> -Jonathan
> On Wed, Aug 12, 2009 at 8:34 PM, Evan Weaver<> wrote:
>> Points taken, and I agree, except in my experience the current names
>> are not Pretty Good but rather Pretty Weird; the primary issues being
>> column family and super column.
>> If we go by the shorter-is-better principle, we might get:
>> Cluster
>> Schema
>> Row set
>> Row w/key
>> Field set
>> Field
>> "You take the user's key, and use that to insert into the Row Set
>> 'user_associations' at Field Set 'user_timeline,' a field named with a
>> time-based UUID representing now, and with a value of the new tweet's
>> key."
>> But let me study for a while and come up with a more researched proposal.
>> Evan
>> On Wed, Aug 12, 2009 at 9:21 PM, Jonathan Ellis<> wrote:
>>> On Wed, Aug 12, 2009 at 7:52 PM, Michael Koziarski<>
>>>> However I think it's worth considering this from a strategic
>>>> perspective, looking at how we want the project do grow and change,
>>>> rather than just as it is right now.  The key to successful adoption
>>>> is having a successful elevator pitch,  you can start using a database
>>>> without understanding relational-algebra because 'table' and 'column'
>>>> are such simple ways to reason about the tool.  As it stands
>>>> cassandra's takes a whiteboard and 15 minutes, before people get what
>>>> you're talking about.
>>> If you want to explain it as "sort of like a relational db" then
>>> table -> CF
>>> column -> column
>>> key -> key
>>> row -> row
>>> That's the simple case, then all you have is "supercolumns can contain
>>> a list of simple columns."
>>> That really doesn't seem so hard to me.  I have explained this to *managers*.
>>>> Assuming the project gets anything like the adoption it deserves, the
>>>> users we have today will be a *tiny minority* of the users we have in
>>>> the future.  So imposing costs on the current userbase which will give
>>>> huge benefits to future users, should be something we're willing to
>>>> do.  In fact it's something that has been done repeatedly over the
>>>> last few weeks.
>>> I agree.  But as I said before I just don't see this as being an improvement.
>>>> Given those changes went in without debate, I'm not sure what the
>>>> reluctance is for making changes to the nomenclature for the project.
>>> As above.
>>>> Speaking as someone who's only been doing this a month, the naming is
>>>> *still* confusing, and when I talk with people who wonder what
>>>> cassandra is all about I get blank looks when telling them what things
>>>> are called.  If you step back and want to tell someone how you'd
>>>> insert a tweet into someone's timeline using evan's weblog post:
>>>>  "You just take the user's key, and use that to insert into the
>>>> SuperColumnFamily 'UserAssociations' at SubColumn 'user_timeline', a
>>>> ColumnName of a time based uuid representing now, and a value of the
>>>> new tweet's key"
>>>> Column is in the name of 3 of the 5 concepts expressed, and in each
>>>> cases it's different.
>>> When you're inserting something nested 3 levels deep a certain amount
>>> of verbosity is unavoidable.  With Evan's nomenclature,
>>> "You take the user's record ID, and use that to insert into the Record
>>> Collection 'user associations' at Attribute Collection
>>> 'user_timeline,' an Attribute named with a time based uuid
>>> representing now, and with a value of the new tweet's key."
>>> I think that is a negative improvement.  Yay, now we are talking about
>>> Attribute Collections and Attributes instead of SuperColumns and
>>> Columns.  The same objections ("one object's name contains the
>>> other's!) apply, plus the new one of sounding so generic that it could
>>> apply to practically any system.
>>> -Jonathan
>> --
>> Evan Weaver

Evan Weaver

View raw message