incubator-cassandra-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Arin Sarkissian <a...@rspot.net>
Subject Re: Fixing the data model names
Date Wed, 12 Aug 2009 05:40:24 GMT
Mark I can work on that with you.
We should do this regardless of naming changes etc.
I'll even volunteer to do a PHP app based on the data model we mock up.

if you wanna coordinate some work on this you can reach me at:
email: arin@rspot.net (or arin@digg.com)
IM/Twitter/IRC/just_about_everything_online: phatduckk

- Arin


On Tue, Aug 11, 2009 at 10:36 PM, Mark McBride<mark.mcbride@gmail.com> wrote:
> It seems to me that what would be most helpful, regardless of changes,
> is having a document that describes the data model in more detail than
> the current data model wiki page.  I can take a stab at creating a new
> page that includes examples if that would be useful.
>
> On Tue, Aug 11, 2009 at 10:34 PM, Arin Sarkissian<arin@rspot.net> wrote:
>> I agree that the names are pretty horrible for a newbie...
>>
>> I'll echo the concerns that the RDBMS vernacular messes with a
>> newcomer's head. I feel like the words "Row" and "Column" are way too
>> loaded since most people have an RDBMS background... BUT
>>
>> In the BigTable paper we've got the term "Column Family". This term is
>> also used in HBase and Hypertable. Since the term's out there in the
>> wild I wouldn't feel comfortable ditching it and making something up
>> to fill its spot. That would lead to a scenario where folks with
>> experience with Hbase, Hypertable and Bigtable get confused (or think
>> the naming is dumb) but would lesson the confusion for RDBMS peeps.
>> Doesn't sound like the right tradeoff: 4 sets of folks have something
>> new to digest instead of 1.
>>
>> The "bad" terms are "column" and "row". That's where the real issues
>> arise... but given the fact that I believe we should keep "column
>> family" i have no idea what we'd call the things inside the CF? It
>> would be odd as hell to have a CF contain "records" etc. Does that
>> mean we should keep it called "column"? IMO w/o an awesome
>> alternative, yes.
>>
>> The word "row" should go away tho...
>> When I first started using cassandra I thought that: a key pointed to
>> a row and that row had one of each column family. This isn't the case
>> but the RDBMS terms + SQL-ish thinking caused me and many other to
>> assume as much. Took us a while to figure that out...
>>
>> But realistically how much of this confusion could be avoided with a
>> legit example? Once you see a good example you start getting it. A lot
>> of people have been pointed towards the ThriftIterface page on the
>> wiki which clears up next to nothing:
>> http://wiki.apache.org/cassandra/ThriftInterface . There's stuff like
>> "edges", "base_attributes" etc. It's next door to nonsensical..
>>
>> What if we had a real example that people could relate to... a model a
>> blog or something along those lines & update the
>> http://wiki.apache.org/cassandra/ThriftInterface page to show how each
>> on the API methods would be used to accomplish basic tasks... ex: get
>> all comments for a blog entry, list entires in time order, list
>> entries tagged "bar", find all entries with "foo" in the body (kinda
>> like the Facebook mail search example).
>>
>> -Arin
>>
>>
>>
>> On Tue, Aug 11, 2009 at 10:09 PM, Curt Micol<asenchi@gmail.com> wrote:
>>> Hello,
>>>
>>> I am hardly a developer, so this isn't directly addressed to me, but
>>> if I may comment on a couple of things from an outsider's
>>> (non-developer, new to this scale of database) perspective.
>>>
>>> On Wed, Aug 12, 2009 at 12:38 AM, Eric Evans<eevans@rackspace.com> wrote:
>>>> On Tue, 2009-08-11 at 10:37 -0700, Evan Weaver wrote:
>>>>> In my experience, the naming of the data model has been a huge barrier
>>>>> to entry for users of Cassandra. This goes both for people familiar
>>>>> with SQL, and for people familiar with BigTable. I would like to
>>>>> change this before 0.4, since the 0.3 to 0.4 transition is the Great
>>>>> API Breakening.
>>>
>>> I agree that there is a barrier, specifically because most people have
>>> no experience with this type of data structure and as you mention are
>>> coming from SQL.  Clearer names along with more documentation/examples
>>> will help grow the user base of Cassandra quite a bit.
>>>
>>>>> So technically this is not a bikeshed, because I'm happy to do all the
>>>>> work. I'll even submit a patch for Digg's Python client. Since there
>>>>> are no production deployments of ASF, and only a couple
>>>>> well-maintained clients, now is the time to break the world. A few
>>>>> hours of work now will pay off richly in terms of community
>>>>> involvement and reduced noob-explanation-time.
>>>
>>> I would offer my services here also if a change were accepted.
>>>
>>> And while I don't know what the exact names should be (nor am I
>>> qualified tbh), I think they should be clearer than they are. At this
>>> point they seem to be a mixture of RDBMS and Document DB terms.  The
>>> change to 'keyspace' from 'table' I think was a first step in this
>>> process, but it should be taken further and all names normalized
>>> across the board to properly represent their relationship with each
>>> other. At least that's my very humble opinion.
>>>
>>> In response to Mr. Evan's comment regarding the Bigtable paper, does
>>> the Cassandra community want this to be a requirement for using the
>>> software? I would think not.  Sure, most early adopters are coming
>>> from that paper, but it shouldn't be a source of entry to use the
>>> database, but rather to develop it.
>>>
>>> Again, my opinion carries little weight, but +1 from this user.
>>>
>>> Thanks for everyone's hard work, I am really excited to see how this
>>> project continues to progress.
>>>
>>> --
>>> # Curt Micol
>>>
>>
>

Mime
View raw message