Return-Path: Delivered-To: apmail-incubator-cassandra-dev-archive@minotaur.apache.org Received: (qmail 44822 invoked from network); 18 Aug 2009 14:36:45 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 18 Aug 2009 14:36:45 -0000 Received: (qmail 8065 invoked by uid 500); 18 Aug 2009 14:37:04 -0000 Delivered-To: apmail-incubator-cassandra-dev-archive@incubator.apache.org Received: (qmail 8054 invoked by uid 500); 18 Aug 2009 14:37:04 -0000 Mailing-List: contact cassandra-dev-help@incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: cassandra-dev@incubator.apache.org Delivered-To: mailing list cassandra-dev@incubator.apache.org Received: (qmail 8044 invoked by uid 99); 18 Aug 2009 14:37:04 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 18 Aug 2009 14:37:04 +0000 X-ASF-Spam-Status: No, hits=-0.0 required=10.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of eweaver@gmail.com designates 209.85.211.182 as permitted sender) Received: from [209.85.211.182] (HELO mail-yw0-f182.google.com) (209.85.211.182) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 18 Aug 2009 14:36:53 +0000 Received: by ywh12 with SMTP id 12so4902239ywh.12 for ; Tue, 18 Aug 2009 07:36:32 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:in-reply-to:references :from:date:message-id:subject:to:content-type :content-transfer-encoding; bh=0menOjp4dox0e9bJmiToepxEUoxHb2oebPUmrnNOpo4=; b=au0Z2qfhoEHZDpO8fO6IchJps5ABDDz446E39w2z9TOQk0BUj747mrNtBqrzjMkCPp 2iLGBstKl+FMF39C2G+rnegGSpF++/TRlIU/CsGqXizQ4GpKh1EnFs8znKDwxEtpwMas A38EhHE5XymI/Qj7iWR8imcWfY554xtLkdclM= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :content-type:content-transfer-encoding; b=X3v7yp3QZgpeKLVnWsJ5j8kK75fhuTg/KDT7Zsx7rvrCGArpINMeWfUBsr0hSjqMpQ sgLr5HpKPoQN5MnfJptKA3G8fu4OyzNTO9zA3hXkeL9JgAS4Tp12sjxGKwRDUVSaFfmd 4vxS2VRxeyovFtNcHEBJpUv7o/B79i4wy6+Hs= MIME-Version: 1.0 Received: by 10.151.28.10 with SMTP id f10mr8210259ybj.71.1250606192081; Tue, 18 Aug 2009 07:36:32 -0700 (PDT) In-Reply-To: <8d9c091a0908172233k599519cfoa42aa2c6497126b9@mail.gmail.com> References: <8d9c091a0908172233k599519cfoa42aa2c6497126b9@mail.gmail.com> From: Evan Weaver Date: Tue, 18 Aug 2009 10:36:12 -0400 Message-ID: Subject: Re: Cassandra data model misconceptions, and their sources To: cassandra-dev@incubator.apache.org, asenchi@asenchi.com Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable X-Virus-Checked: Checked by ClamAV on apache.org Did you read the previous thread about this? http://markmail.org/thread/qbocotgkan4mg73w I don't think your proposals are too good...I have a new proposal based on feedback in the previous thread, that I will send soon. But I wanted some comments on the misconceptions themselves. Evan On Tue, Aug 18, 2009 at 1:33 AM, Curt Micol wrote: > I've been thinking about this for a number of days, and again, while I am= not a > developer I thought I might toss in a proposal if that's okay. > > Since putting together a schema diagram and having a number of people rev= iew > it, I think a change is warranted. Too many people are coming from the RD= BMS > world and the terms used by Cassandra are conflicting with those terms th= ey > are already familiar with. > > The TLDR version is as follows: > > Object (Column) > ObjectFamily (ColumnFamily) > Directory (Row) > ObjectContainer (SuperColumn) > Namespace (Keyspace) > > The long version... > > Object (Column) > As Evan has stated repeatedly, column is a bit misleading especially when > compared to other types of database systems. =A0I think this is probably = the > most important change to the data model names, and exactly where I starte= d > since this is the 'core' of Cassandra. =A0Object gives the impression tha= t this > is a piece of data, it's relatively structured but the name gives no > impression how strict that structure is. 'Objects' have names that have v= alues > and timestamps. Simple and too the point. 'Object' doesn't come with the > preconceived notions that 'column' comes with and leaves room for Cassand= ra to > define what an 'object' is without any conflict to preexisting data > structures. > > By changing this, we can move up the ladder to other data types and > easily rename them to something that 'contains objects' or 'accesses obje= cts'. > This allows us to describe the data model in the name structure without > having to get too deep into the definition. > > Directory (Row) > 'row' is currently unnamed, but still a structure that exists in the mode= l. > It's not specifically data itself, but more of a mapping of how to get to > objects (using keys). 'Directory' fills this void quite well. It is easil= y > explained as a path to get to data and not data itself. > > ObjectFamily (ColumnFamily) > There's no argument that the one direct link to the BigTable paper is 'co= lumn > families'. It's perhaps the only structure that is virtually the same in = both > pieces of software. =A0Considering this, I think we need to avoid too dra= stic a > change. =A0With that said, I think a change is necessary due to the diffe= rences > in columns between the two databases. 'object family' is descriptive of t= he > relation between objects and removes any reference to tabular structures = while > keeping a loose relationship to 'column family' in the BigTable paper. > > ObjectContainer (SuperColumn) > I could see this being shortened to 'container' in every day conversation= . > However, 'objectcontainer' fits nicely with the rest of the data model na= mes > and is descriptive of it's purpose and use. Ultimately a 'supercolumn' is > nothing more than a named container of columns (and I've seen on at least= 3 > different occasions the word container used to describe supercolumns). > 'supercolumn' had no real connection to what exactly it was defining, but= with > 'object container' we have a clear understanding that we are naming the > structure that holds objects. Or as I explained it to a friend, we are na= ming > the 'jar' and not the 'honey'. :) > > Namespace (Keyspace) > This one I go back and forth on. I know it's been changed from 'Table' to > 'keyspace' and Evan proposed 'database', but I think that 'namespace' is > really what it is we are talking about. Wikipedia has this as the first l= ine > to describe 'namespace': > > A namespace is an abstract container or environment created to hold a > logical grouping of unique identifiers or symbols (i.e., names). > > Originally I thought 'objectspace' would fit better, but I think 'namespa= ce' > comes with a better history and is clearer to what this structure really = is. > Especially when you relate the name namespace to how it is used in Ruby, = Python > and Java. Ultimately though, I think I prefer 'keyspace' over 'table' > or 'database'. > > The only issue I see with all of these names is the potential conflict wi= th > programming languages and their objects. I know next to nothing about Jav= a so > I don't know if there would be a conflict here. I've ran the following Go= ogle > search 'reserved words in *' where '*' is Ruby, Python, Java and C++ and > received no mention of 'object' being a reserved word in any of those > languages. > > I also grep'd through current source code and there doesn't seem to be an= y > real conflicts that couldn't be named something else so as not to conflic= t > with this naming structure. > > In the end, I think it's a good idea to look at this and work out a solut= ion. > Documentation and tutorials are going to help, but I think people are so > entrenched in the RDBMS world that there is somewhat of a barrier to > understanding Cassandra's data model. > > Thanks for your time, > > -- > # Curt Micol > --=20 Evan Weaver