incubator-cassandra-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mike Malone <>
Subject Re: Is SuperColumn necessary?
Date Fri, 07 May 2010 06:10:05 GMT
On Thu, May 6, 2010 at 5:38 PM, Vijay <> wrote:

> I would rather be interested in Tree type structure where supercolumns have
> supercolumns in it..... you dont need to compare all the columns to find a
> set of columns and will also reduce the bytes transfered for separator, at
> least string concatenation (Or something like that) for read and write
> column name generation. it is more logically stored and structured by this
> way.... and also we can make caching work better by selectively caching the
> tree (User defined if you will)....
> But nothing wrong in supporting both :)

I'm 99% sure we're talking about the same thing and we don't need to support
both. How names/values are separated is pretty irrelevant. It has to happen
somewhere. I agree that it'd be nice if it happened on the server, but doing
it in the client makes it easier to explore ideas.

On Thu, May 6, 2010 at 5:27 PM, philip andrew <> wrote:

> Please create a new term word if the existing terms are misleading, if its
> not a file system then its not good to call it a file system.

While it's seriously bikesheddy, I guess you're right.

Let's call them "thingies" for now, then. So you can have a top-level
"thingy" and it can have an arbitrarily nested tree of sub-"thingies." Each
"thingy" has a "thingy type" [1]. You can also tell Cassandra if you want a
particular level of "thingy" to be indexed. At one (or maybe more) levels
you can tell Cassandra you want your "thingies" to be split onto separate
nodes in your cluster. At one (or maybe more) levels you could also tell
Cassandra that you want your "thingies" split into separate files [2].

The upshot is, the Cassandra data model would go from being "it's a nested
dictionary, just kidding no it's not!" to being "it's a nested dictionary,
for serious." Again, these are all just ideas... but I think this simplified
data model would allow you to express pretty much any query in a graph of
simple primitives like Predicates, Filters, Aggregations, Transformations,
etc. The indexes would allow you to cheat when evaluating certain types of
queries - if you get a SlicePredicate on an indexed "thingy" you don't have
to enumerate the entire set of "sub-thingies" for example.

So, you'd query your "thingies" by building out a predicate,
transformations, filters, etc., serializing the graph of primitives, and
sending it over the wire to Cassandra. Cassandra would rebuild the graph and
run it over your dataset.

So instead of:

    slice_predicate=SlicePredicate(column_names=['username', 'dob']),
    range=KeyRange(start_key='a', end_key='m'),

You'd do something like:

                SlicePredicate(start="a", end="m"),
                NamePredicate(names=["username", "dob"])

Which seems complicated, but it's basically just [(user['username'],
user['dob']) for user in Cassandra['AwesomeApp']['user'].slice('a', 'm')]
and could probably be expressed that way in a client library.

I think batch_mutate is awesome the way it is and should be the only way to
insert/update data. I'd rename it mutate. So our interface becomes:

  Cassandra.query(query, consistency_level)
  Cassandra.mutate(mutation, consistency_level)


Anyways, I was trying to avoid writing all of this out in prose and try
mocking some of it up in code instead. I guess this this works too. Either
way, I do think something like this would simplify the codebase, simplify
the data model, simplify the interface, make the entire system more
flexible, and be generally awesome.


[1] These can be subclasses of Thingy in Java... or maybe they'd implement
IThingy. But either way they'd handle serialization and probably implement
compareTo to define natural ordering. So you'd have classes like
ASCIIThingy, UTF8Thingy, and LongThingy (ahem) - these would replace

[2] I think there's another simplification here. Splitting into separate
files is really very similar to splitting onto separate nodes. There might
be a way around some of the row size limitations with this sort of concept.
And we may be able to get better utilization of multiple disks by giving
each disk (or data directory) a subset of the node's token range. Caveat:
thought not fully baked.

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message