incubator-cassandra-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Arin Sarkissian <a...@rspot.net>
Subject Re: Fixing the data model names
Date Wed, 12 Aug 2009 05:34:55 GMT
I agree that the names are pretty horrible for a newbie...

I'll echo the concerns that the RDBMS vernacular messes with a
newcomer's head. I feel like the words "Row" and "Column" are way too
loaded since most people have an RDBMS background... BUT

In the BigTable paper we've got the term "Column Family". This term is
also used in HBase and Hypertable. Since the term's out there in the
wild I wouldn't feel comfortable ditching it and making something up
to fill its spot. That would lead to a scenario where folks with
experience with Hbase, Hypertable and Bigtable get confused (or think
the naming is dumb) but would lesson the confusion for RDBMS peeps.
Doesn't sound like the right tradeoff: 4 sets of folks have something
new to digest instead of 1.

The "bad" terms are "column" and "row". That's where the real issues
arise... but given the fact that I believe we should keep "column
family" i have no idea what we'd call the things inside the CF? It
would be odd as hell to have a CF contain "records" etc. Does that
mean we should keep it called "column"? IMO w/o an awesome
alternative, yes.

The word "row" should go away tho...
When I first started using cassandra I thought that: a key pointed to
a row and that row had one of each column family. This isn't the case
but the RDBMS terms + SQL-ish thinking caused me and many other to
assume as much. Took us a while to figure that out...

But realistically how much of this confusion could be avoided with a
legit example? Once you see a good example you start getting it. A lot
of people have been pointed towards the ThriftIterface page on the
wiki which clears up next to nothing:
http://wiki.apache.org/cassandra/ThriftInterface . There's stuff like
"edges", "base_attributes" etc. It's next door to nonsensical..

What if we had a real example that people could relate to... a model a
blog or something along those lines & update the
http://wiki.apache.org/cassandra/ThriftInterface page to show how each
on the API methods would be used to accomplish basic tasks... ex: get
all comments for a blog entry, list entires in time order, list
entries tagged "bar", find all entries with "foo" in the body (kinda
like the Facebook mail search example).

-Arin



On Tue, Aug 11, 2009 at 10:09 PM, Curt Micol<asenchi@gmail.com> wrote:
> Hello,
>
> I am hardly a developer, so this isn't directly addressed to me, but
> if I may comment on a couple of things from an outsider's
> (non-developer, new to this scale of database) perspective.
>
> On Wed, Aug 12, 2009 at 12:38 AM, Eric Evans<eevans@rackspace.com> wrote:
>> On Tue, 2009-08-11 at 10:37 -0700, Evan Weaver wrote:
>>> In my experience, the naming of the data model has been a huge barrier
>>> to entry for users of Cassandra. This goes both for people familiar
>>> with SQL, and for people familiar with BigTable. I would like to
>>> change this before 0.4, since the 0.3 to 0.4 transition is the Great
>>> API Breakening.
>
> I agree that there is a barrier, specifically because most people have
> no experience with this type of data structure and as you mention are
> coming from SQL.  Clearer names along with more documentation/examples
> will help grow the user base of Cassandra quite a bit.
>
>>> So technically this is not a bikeshed, because I'm happy to do all the
>>> work. I'll even submit a patch for Digg's Python client. Since there
>>> are no production deployments of ASF, and only a couple
>>> well-maintained clients, now is the time to break the world. A few
>>> hours of work now will pay off richly in terms of community
>>> involvement and reduced noob-explanation-time.
>
> I would offer my services here also if a change were accepted.
>
> And while I don't know what the exact names should be (nor am I
> qualified tbh), I think they should be clearer than they are. At this
> point they seem to be a mixture of RDBMS and Document DB terms.  The
> change to 'keyspace' from 'table' I think was a first step in this
> process, but it should be taken further and all names normalized
> across the board to properly represent their relationship with each
> other. At least that's my very humble opinion.
>
> In response to Mr. Evan's comment regarding the Bigtable paper, does
> the Cassandra community want this to be a requirement for using the
> software? I would think not.  Sure, most early adopters are coming
> from that paper, but it shouldn't be a source of entry to use the
> database, but rather to develop it.
>
> Again, my opinion carries little weight, but +1 from this user.
>
> Thanks for everyone's hard work, I am really excited to see how this
> project continues to progress.
>
> --
> # Curt Micol
>

Mime
View raw message