incubator-cassandra-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jonathan Shook <jsh...@gmail.com>
Subject Re: Is SuperColumn necessary?
Date Sun, 09 May 2010 17:20:32 GMT
I'm not sure this is much of an improvement. It does illustrate,
however, the desire to couch the concepts in terms that each is
already comfortable with. Nearly every set of terms which come from an
existing system will have baggage which doesn't map appropriately. Not
that the "sparse multidimensional arrays" is an unfamiliar construct.
It's more that "sparse" may or may not apply depending on the part of
your data you are describing. "Multidimensional" implies uniformity of
structure, which is not to be taken for granted. Arrays are just one
way to think of the structures. They also serve well as maps and sets
(Which can be modeled using arrays as well). There are certain
semantics of sets, lists, and maps which people have wired into their
brains, and reducing it all to "arrays" is likely to create more
confusion.

I think if we want to borrow terms form another system, it shouldn't
be a computing system, or at least should be so different or
fundamental that the terms have to be re-understood free of baggage.

On Sun, May 9, 2010 at 1:30 AM, David Boxenhorn <david@lookin2.com> wrote:
> Guys, this is beginning to sound like MUMPS!
> http://en.wikipedia.org/wiki/MUMPS
>
> In MUMPS, all variables are sparse, multidimensional arrays, which can be
> stored to disk.
>
> It is an arcane, and archaic, language (does anyone but me remember it?),
> but it has been used successfully for years. Maybe we can learn something
> from it.
>
> I like the terminology of sparse multidimensional arrays very much - it
> really clarifies my thinking. A column family would just be a variable.
>
> On Fri, May 7, 2010 at 7:06 PM, Ed Anuff <ed@anuff.com> wrote:
>>
>> On Thu, May 6, 2010 at 11:10 PM, Mike Malone <mike@simplegeo.com> wrote:
>>>
>>> The upshot is, the Cassandra data model would go from being "it's a
>>> nested
>>> dictionary, just kidding no it's not!" to being "it's a nested
>>> dictionary,
>>> for serious." Again, these are all just ideas... but I think this
>>> simplified
>>> data model would allow you to express pretty much any query in a graph of
>>> simple primitives like Predicates, Filters, Aggregations,
>>> Transformations,
>>> etc. The indexes would allow you to cheat when evaluating certain types
>>> of
>>> queries - if you get a SlicePredicate on an indexed "thingy" you don't
>>> have
>>> to enumerate the entire set of "sub-thingies" for example.
>>>
>>
>> This would be my dream implementation. I'm working an an application that
>> needs that sort of capability.  SuperColumns lead you to thinking that
>> should be done in the cassandra tier but then fall short, so my thought was
>> that I was just going to do everything that was in Cassandra as regular
>> columnfamilies and columns using composite keys and composite column names
>> ala the code I shared above, and then implement the n-level hierarchy in the
>> app tier.  It looks like your suggestion is to take it in the other
>> direction and make it part of the fundamental data model, which would be
>> very useful if it could be made to work without big tradeoffs.
>>
>>
>

Mime
View raw message