incubator-cassandra-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Arin Sarkissian <>
Subject Re: Fixing the data model names
Date Thu, 13 Aug 2009 05:17:15 GMT
Row? What are you guys referring to as a row?

no - this isnt a joke


On Wed, Aug 12, 2009 at 9:39 PM, Evan Weaver<> wrote:
> PS. How's Avro these days? Or could we patch Thrift? Haven't looked at
> the internals but assume they're scary.
> On Thu, Aug 13, 2009 at 12:23 AM, Evan Weaver<> wrote:
>> Incidentally, is there any specific reason the collation has to be
>> pre-defined at the CF? What if any column could be an optional
>> supercolumn with a collation set at runtime? Then all CFs would be the
>> same.
>> Evan
>> On Wed, Aug 12, 2009 at 10:02 PM, Jonathan Ellis<> wrote:
>>> If thrift were sane it would look something like
>>> struct Column {
>>>  byte[] name,
>>>  optional list<Column> subcolumns,
>>>  optional int64 timestamp,
>>>  optional byte[] value
>>> }
>>> "you can either have the subcolumns, or the timestamp and value" seems
>>> reasonable to me.
>>> of course in the real world, thrift can't do recursive structures, so
>>> we'd have to go with Column/SubColumn like SuperColumn/Column today.
>>> So... maybe not really an improvement after all. :)
>>> (Why am I not surprised to find out that protocol buffers does support
>>> this?  Sigh.)
>>> On Wed, Aug 12, 2009 at 8:51 PM, Evan Weaver<> wrote:
>>>> Hmm, my Ruby client internally refers to columns and subcolumns,
>>>> rather than supercolumns and columns...mainly because the subcolumn
>>>> position is optional, but the column_or_supercolumn position is not.
>>>> So there is something we agree on.
>>>> Do you think the lack of a timestamp in the supercolumn is confusing?
>>>> It's still not exactly a kind of column.
>>>> Evan
>>>> On Wed, Aug 12, 2009 at 9:47 PM, Jonathan Ellis<>
>>>>> I agree with the proposition that the SuperColumn name is weak.
>>>>> (Although not, as I mentioned, Column or ColumnFamily.)  And I could
>>>>> go with schema over keyspace.
>>>>> One option to deal with SC would be to excise the term SC (and SCF
>>>>> from the config) and instead just have Columns, which may or may not
>>>>> have SubColumns.  You would define this as
>>>>> <ColumnFamily withSubColumns="true" .../>
>>>>> "Insert a subcolumn named A into the Column named B" fits pretty well
>>>>> with how I think of things working.  And now you just have Rows and
>>>>> Columns!  Just like a RDB! :P
>>>>> -Jonathan
>>>>> On Wed, Aug 12, 2009 at 8:34 PM, Evan Weaver<>
>>>>>> Points taken, and I agree, except in my experience the current names
>>>>>> are not Pretty Good but rather Pretty Weird; the primary issues being
>>>>>> column family and super column.
>>>>>> If we go by the shorter-is-better principle, we might get:
>>>>>> Cluster
>>>>>> Schema
>>>>>> Row set
>>>>>> Row w/key
>>>>>> Field set
>>>>>> Field
>>>>>> "You take the user's key, and use that to insert into the Row Set
>>>>>> 'user_associations' at Field Set 'user_timeline,' a field named with
>>>>>> time-based UUID representing now, and with a value of the new tweet's
>>>>>> key."
>>>>>> But let me study for a while and come up with a more researched proposal.
>>>>>> Evan
>>>>>> On Wed, Aug 12, 2009 at 9:21 PM, Jonathan Ellis<>
>>>>>>> On Wed, Aug 12, 2009 at 7:52 PM, Michael Koziarski<>
>>>>>>>> However I think it's worth considering this from a strategic
>>>>>>>> perspective, looking at how we want the project do grow and
>>>>>>>> rather than just as it is right now.  The key to successful
>>>>>>>> is having a successful elevator pitch,  you can start using
a database
>>>>>>>> without understanding relational-algebra because 'table'
and 'column'
>>>>>>>> are such simple ways to reason about the tool.  As it stands
>>>>>>>> cassandra's takes a whiteboard and 15 minutes, before people
get what
>>>>>>>> you're talking about.
>>>>>>> If you want to explain it as "sort of like a relational db" then
>>>>>>> table -> CF
>>>>>>> column -> column
>>>>>>> key -> key
>>>>>>> row -> row
>>>>>>> That's the simple case, then all you have is "supercolumns can
>>>>>>> a list of simple columns."
>>>>>>> That really doesn't seem so hard to me.  I have explained this
to *managers*.
>>>>>>>> Assuming the project gets anything like the adoption it deserves,
>>>>>>>> users we have today will be a *tiny minority* of the users
we have in
>>>>>>>> the future.  So imposing costs on the current userbase which
will give
>>>>>>>> huge benefits to future users, should be something we're
willing to
>>>>>>>> do.  In fact it's something that has been done repeatedly
over the
>>>>>>>> last few weeks.
>>>>>>> I agree.  But as I said before I just don't see this as being
an improvement.
>>>>>>>> Given those changes went in without debate, I'm not sure
what the
>>>>>>>> reluctance is for making changes to the nomenclature for
the project.
>>>>>>> As above.
>>>>>>>> Speaking as someone who's only been doing this a month, the
naming is
>>>>>>>> *still* confusing, and when I talk with people who wonder
>>>>>>>> cassandra is all about I get blank looks when telling them
what things
>>>>>>>> are called.  If you step back and want to tell someone how
>>>>>>>> insert a tweet into someone's timeline using evan's weblog
>>>>>>>>  "You just take the user's key, and use that to insert into
>>>>>>>> SuperColumnFamily 'UserAssociations' at SubColumn 'user_timeline',
>>>>>>>> ColumnName of a time based uuid representing now, and a value
of the
>>>>>>>> new tweet's key"
>>>>>>>> Column is in the name of 3 of the 5 concepts expressed, and
in each
>>>>>>>> cases it's different.
>>>>>>> When you're inserting something nested 3 levels deep a certain
>>>>>>> of verbosity is unavoidable.  With Evan's nomenclature,
>>>>>>> "You take the user's record ID, and use that to insert into the
>>>>>>> Collection 'user associations' at Attribute Collection
>>>>>>> 'user_timeline,' an Attribute named with a time based uuid
>>>>>>> representing now, and with a value of the new tweet's key."
>>>>>>> I think that is a negative improvement.  Yay, now we are talking
>>>>>>> Attribute Collections and Attributes instead of SuperColumns
>>>>>>> Columns.  The same objections ("one object's name contains the
>>>>>>> other's!) apply, plus the new one of sounding so generic that
it could
>>>>>>> apply to practically any system.
>>>>>>> -Jonathan
>>>>>> --
>>>>>> Evan Weaver
>>>> --
>>>> Evan Weaver
>> --
>> Evan Weaver
> --
> Evan Weaver

View raw message