Return-Path: Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: (qmail 28060 invoked from network); 9 May 2010 17:20:58 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 9 May 2010 17:20:58 -0000 Received: (qmail 17526 invoked by uid 500); 9 May 2010 17:20:57 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 17511 invoked by uid 500); 9 May 2010 17:20:57 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 17501 invoked by uid 99); 9 May 2010 17:20:57 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 09 May 2010 17:20:57 +0000 X-ASF-Spam-Status: No, hits=0.9 required=10.0 tests=AWL,FREEMAIL_FROM,RCVD_IN_DNSWL_NONE,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of jshook@gmail.com designates 209.85.221.192 as permitted sender) Received: from [209.85.221.192] (HELO mail-qy0-f192.google.com) (209.85.221.192) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 09 May 2010 17:20:53 +0000 Received: by qyk30 with SMTP id 30so5074186qyk.16 for ; Sun, 09 May 2010 10:20:32 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:received:in-reply-to :references:date:message-id:subject:from:to:cc:content-type :content-transfer-encoding; bh=w+4wH2rsBfbcypnbb59+C+UHMft1Y8KfaS3cN9N6J6w=; b=J7QkXgAh6liL8qBC5wBiR4toQLMrYB0cScP/BezIdJuSVIgM/GOXTCS9jqv4tQuKsQ ZztmRe4Bm20uAOqWZDxvRqeh8xbKSiVfmzSg7I7S92UkmJ7u0mVOZTYPm6r5ylmlfvPv vIiO/niUI/GIzggjKgxYrUjPsYCkZa5CPSvQU= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type:content-transfer-encoding; b=UJezH5sKuECQbRWEwMQsZmfVe1zSMSSLq2R1JZy5Lf5sWoYrT/RQzrjsk2BE0ggngD khoJ4V5kzH59aJlhRsLZoZ82E8lDOqxw7pm18CAQGxlJLPwRj0CUYBxyLI16SEN4xTP0 lGAy4lRO15wzqm4IUQ0d+qtKpBe4ZDfYB6d54= MIME-Version: 1.0 Received: by 10.229.237.66 with SMTP id kn2mr630169qcb.60.1273425632394; Sun, 09 May 2010 10:20:32 -0700 (PDT) Received: by 10.229.95.132 with HTTP; Sun, 9 May 2010 10:20:32 -0700 (PDT) In-Reply-To: References: Date: Sun, 9 May 2010 12:20:32 -0500 Message-ID: Subject: Re: Is SuperColumn necessary? From: Jonathan Shook To: user@cassandra.apache.org Cc: dev@cassandra.apache.org Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable I'm not sure this is much of an improvement. It does illustrate, however, the desire to couch the concepts in terms that each is already comfortable with. Nearly every set of terms which come from an existing system will have baggage which doesn't map appropriately. Not that the "sparse multidimensional arrays" is an unfamiliar construct. It's more that "sparse" may or may not apply depending on the part of your data you are describing. "Multidimensional" implies uniformity of structure, which is not to be taken for granted. Arrays are just one way to think of the structures. They also serve well as maps and sets (Which can be modeled using arrays as well). There are certain semantics of sets, lists, and maps which people have wired into their brains, and reducing it all to "arrays" is likely to create more confusion. I think if we want to borrow terms form another system, it shouldn't be a computing system, or at least should be so different or fundamental that the terms have to be re-understood free of baggage. On Sun, May 9, 2010 at 1:30 AM, David Boxenhorn wrote: > Guys, this is beginning to sound like MUMPS! > http://en.wikipedia.org/wiki/MUMPS > > In MUMPS, all variables are sparse, multidimensional arrays, which can be > stored to disk. > > It is an arcane, and archaic, language (does anyone but me remember it?), > but it has been used successfully for years. Maybe we can learn something > from it. > > I like the terminology of sparse multidimensional arrays very much - it > really clarifies my thinking. A column family would just be a variable. > > On Fri, May 7, 2010 at 7:06 PM, Ed Anuff wrote: >> >> On Thu, May 6, 2010 at 11:10 PM, Mike Malone wrote: >>> >>> The upshot is, the Cassandra data model would go from being "it's a >>> nested >>> dictionary, just kidding no it's not!" to being "it's a nested >>> dictionary, >>> for serious." Again, these are all just ideas... but I think this >>> simplified >>> data model would allow you to express pretty much any query in a graph = of >>> simple primitives like Predicates, Filters, Aggregations, >>> Transformations, >>> etc. The indexes would allow you to cheat when evaluating certain types >>> of >>> queries - if you get a SlicePredicate on an indexed "thingy" you don't >>> have >>> to enumerate the entire set of "sub-thingies" for example. >>> >> >> This would be my dream implementation. I'm working an an application tha= t >> needs that sort of capability.=A0 SuperColumns lead you to thinking that >> should be done in the cassandra tier but then fall short, so my thought = was >> that I was just going to do everything that was in Cassandra as regular >> columnfamilies and columns using composite keys and composite column nam= es >> ala the code I shared above, and then implement the n-level hierarchy in= the >> app tier.=A0 It looks like your suggestion is to take it in the other >> direction and make it part of the fundamental data model, which would be >> very useful if it could be made to work without big tradeoffs. >> >> >