incubator-cassandra-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jonathan Ellis <>
Subject Re: Fixing the data model names
Date Thu, 13 Aug 2009 13:15:15 GMT
A row is the data associated with a key in a given CF.

On Thu, Aug 13, 2009 at 12:17 AM, Arin Sarkissian<> wrote:
> Row? What are you guys referring to as a row?
> no - this isnt a joke
> Arin
> On Wed, Aug 12, 2009 at 9:39 PM, Evan Weaver<> wrote:
>> PS. How's Avro these days? Or could we patch Thrift? Haven't looked at
>> the internals but assume they're scary.
>> On Thu, Aug 13, 2009 at 12:23 AM, Evan Weaver<> wrote:
>>> Incidentally, is there any specific reason the collation has to be
>>> pre-defined at the CF? What if any column could be an optional
>>> supercolumn with a collation set at runtime? Then all CFs would be the
>>> same.
>>> Evan
>>> On Wed, Aug 12, 2009 at 10:02 PM, Jonathan Ellis<> wrote:
>>>> If thrift were sane it would look something like
>>>> struct Column {
>>>>  byte[] name,
>>>>  optional list<Column> subcolumns,
>>>>  optional int64 timestamp,
>>>>  optional byte[] value
>>>> }
>>>> "you can either have the subcolumns, or the timestamp and value" seems
>>>> reasonable to me.
>>>> of course in the real world, thrift can't do recursive structures, so
>>>> we'd have to go with Column/SubColumn like SuperColumn/Column today.
>>>> So... maybe not really an improvement after all. :)
>>>> (Why am I not surprised to find out that protocol buffers does support
>>>> this?  Sigh.)
>>>> On Wed, Aug 12, 2009 at 8:51 PM, Evan Weaver<> wrote:
>>>>> Hmm, my Ruby client internally refers to columns and subcolumns,
>>>>> rather than supercolumns and columns...mainly because the subcolumn
>>>>> position is optional, but the column_or_supercolumn position is not.
>>>>> So there is something we agree on.
>>>>> Do you think the lack of a timestamp in the supercolumn is confusing?
>>>>> It's still not exactly a kind of column.
>>>>> Evan
>>>>> On Wed, Aug 12, 2009 at 9:47 PM, Jonathan Ellis<>
>>>>>> I agree with the proposition that the SuperColumn name is weak.
>>>>>> (Although not, as I mentioned, Column or ColumnFamily.)  And I could
>>>>>> go with schema over keyspace.
>>>>>> One option to deal with SC would be to excise the term SC (and SCF
>>>>>> from the config) and instead just have Columns, which may or may
>>>>>> have SubColumns.  You would define this as
>>>>>> <ColumnFamily withSubColumns="true" .../>
>>>>>> "Insert a subcolumn named A into the Column named B" fits pretty
>>>>>> with how I think of things working.  And now you just have Rows
>>>>>> Columns!  Just like a RDB! :P
>>>>>> -Jonathan
>>>>>> On Wed, Aug 12, 2009 at 8:34 PM, Evan Weaver<>
>>>>>>> Points taken, and I agree, except in my experience the current
>>>>>>> are not Pretty Good but rather Pretty Weird; the primary issues
>>>>>>> column family and super column.
>>>>>>> If we go by the shorter-is-better principle, we might get:
>>>>>>> Cluster
>>>>>>> Schema
>>>>>>> Row set
>>>>>>> Row w/key
>>>>>>> Field set
>>>>>>> Field
>>>>>>> "You take the user's key, and use that to insert into the Row
>>>>>>> 'user_associations' at Field Set 'user_timeline,' a field named
with a
>>>>>>> time-based UUID representing now, and with a value of the new
>>>>>>> key."
>>>>>>> But let me study for a while and come up with a more researched
>>>>>>> Evan
>>>>>>> On Wed, Aug 12, 2009 at 9:21 PM, Jonathan Ellis<>
>>>>>>>> On Wed, Aug 12, 2009 at 7:52 PM, Michael Koziarski<>
>>>>>>>>> However I think it's worth considering this from a strategic
>>>>>>>>> perspective, looking at how we want the project do grow
and change,
>>>>>>>>> rather than just as it is right now.  The key to successful
>>>>>>>>> is having a successful elevator pitch,  you can start
using a database
>>>>>>>>> without understanding relational-algebra because 'table'
and 'column'
>>>>>>>>> are such simple ways to reason about the tool.  As it
>>>>>>>>> cassandra's takes a whiteboard and 15 minutes, before
people get what
>>>>>>>>> you're talking about.
>>>>>>>> If you want to explain it as "sort of like a relational db"
>>>>>>>> table -> CF
>>>>>>>> column -> column
>>>>>>>> key -> key
>>>>>>>> row -> row
>>>>>>>> That's the simple case, then all you have is "supercolumns
can contain
>>>>>>>> a list of simple columns."
>>>>>>>> That really doesn't seem so hard to me.  I have explained
this to *managers*.
>>>>>>>>> Assuming the project gets anything like the adoption
it deserves, the
>>>>>>>>> users we have today will be a *tiny minority* of the
users we have in
>>>>>>>>> the future.  So imposing costs on the current userbase
which will give
>>>>>>>>> huge benefits to future users, should be something we're
willing to
>>>>>>>>> do.  In fact it's something that has been done repeatedly
over the
>>>>>>>>> last few weeks.
>>>>>>>> I agree.  But as I said before I just don't see this as
being an improvement.
>>>>>>>>> Given those changes went in without debate, I'm not sure
what the
>>>>>>>>> reluctance is for making changes to the nomenclature
for the project.
>>>>>>>> As above.
>>>>>>>>> Speaking as someone who's only been doing this a month,
the naming is
>>>>>>>>> *still* confusing, and when I talk with people who wonder
>>>>>>>>> cassandra is all about I get blank looks when telling
them what things
>>>>>>>>> are called.  If you step back and want to tell someone
how you'd
>>>>>>>>> insert a tweet into someone's timeline using evan's weblog
>>>>>>>>>  "You just take the user's key, and use that to insert
into the
>>>>>>>>> SuperColumnFamily 'UserAssociations' at SubColumn 'user_timeline',
>>>>>>>>> ColumnName of a time based uuid representing now, and
a value of the
>>>>>>>>> new tweet's key"
>>>>>>>>> Column is in the name of 3 of the 5 concepts expressed,
and in each
>>>>>>>>> cases it's different.
>>>>>>>> When you're inserting something nested 3 levels deep a certain
>>>>>>>> of verbosity is unavoidable.  With Evan's nomenclature,
>>>>>>>> "You take the user's record ID, and use that to insert into
the Record
>>>>>>>> Collection 'user associations' at Attribute Collection
>>>>>>>> 'user_timeline,' an Attribute named with a time based uuid
>>>>>>>> representing now, and with a value of the new tweet's key."
>>>>>>>> I think that is a negative improvement.  Yay, now we are
talking about
>>>>>>>> Attribute Collections and Attributes instead of SuperColumns
>>>>>>>> Columns.  The same objections ("one object's name contains
>>>>>>>> other's!) apply, plus the new one of sounding so generic
that it could
>>>>>>>> apply to practically any system.
>>>>>>>> -Jonathan
>>>>>>> --
>>>>>>> Evan Weaver
>>>>> --
>>>>> Evan Weaver
>>> --
>>> Evan Weaver
>> --
>> Evan Weaver

View raw message