cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Shaun Cutts <sh...@cuttshome.net>
Subject Re: Do supercolumns have a purpose?
Date Tue, 08 Feb 2011 01:53:48 GMT

I'm a newbie here, but, with apologies for my presumptuousness, I think you should deprecate
SuperColumns. They are already distracting you, and as the years go by the cost of supporting
them as you add more and more functionality is only likely to get worse. It would be better
to concentrate on making the "core" column families better (and I'm sure we can all think
of lots of things we'd like).

Just dropping SuperColumns would be bad for your reputation -- and for users like David who
are currently using them. But if you mark them clearly as deprecated and explain why and what
to do instead (perhaps putting a bit of effort into migration tools... or even a "virtual"
layer supporting arbitrary hierarchical data), then you can drop them in a few years (when
you get to 1.0, say), without people feeling betrayed.

-- Shaun

On Feb 6, 2011, at 3:48 AM, David Boxenhorn wrote:

> "My main point was to say that it's think it is better to create tickets for what you
want, rather than for something else completely different that would, as a by-product, give
you what you want."
> 
> Then let me say what I want: I want supercolumn families to have any feature that regular
column families have. 
> 
> My data model is full of supercolumns. I used them, even though I knew it didn't *have
to*, "because they were there", which implied to me that I was supposed to use them for some
good reason. Now I suspect that they will gradually become less and less functional, as features
are added to regular column families and not supported for supercolumn families. 
> 
> 
> On Fri, Feb 4, 2011 at 10:58 AM, Sylvain Lebresne <sylvain@datastax.com> wrote:
> On Fri, Feb 4, 2011 at 12:35 AM, Mike Malone <mike@simplegeo.com> wrote:
> On Thu, Feb 3, 2011 at 6:44 AM, Sylvain Lebresne <sylvain@datastax.com> wrote:
> On Thu, Feb 3, 2011 at 3:00 PM, David Boxenhorn <david@lookin2.com> wrote:
> The advantage would be to enable secondary indexes on supercolumn families.
> 
> Then I suggest opening a ticket for adding secondary indexes to supercolumn families
and voting on it. This will be 1 or 2 order of magnitude less work than getting rid of super
column internally, and probably a much better solution anyway.
> 
> I realize that this is largely subjective, and on such matters code speaks louder than
words, but I don't think I agree with you on the issue of which alternative is less work,
or even which is a better solution.
> 
> You are right, I put probably too much emphase in that sentence. My main point was to
say that it's think it is better to create tickets for what you want, rather than for something
else completely different that would, as a by-product, give you what you want.
> Then I suspect that *if* the only goal is to get secondary indexes on super columns,
then there is a good chance this would be less work than getting rid of super columns. But
to be fair, secondary indexes on super columns may not make too much sense without #598, which
itself would require quite some work, so clearly I spoke a bit quickly.
>  
> If the goal is to have a hierarchical model, limiting the depth to two seems arbitrary.
Why not go all the way and allow an arbitrarily deep hierarchy?
> 
> If a more sophisticated hierarchical model is deemed unnecessary, or impractical, allowing
a depth of two seems inconsistent and unnecessary. It's pretty trivial to overlay a hierarchical
model on top of the map-of-sorted-maps model that Cassandra implements. Ed Anuff has implemented
a custom comparator that does the job [1]. Google's Megastore has a similar architecture and
goes even further [2].
> 
> It seems to me that super columns are a historical artifact from Cassandra's early life
as Facebook's inbox storage system. They needed posting lists of messages, sharded by user.
So that's what they built. In my dealings with the Cassandra code, super columns end up making
a mess all over the place when algorithms need to be special cased and branch based on the
column/supercolumn distinction.
> 
> I won't even mention what it does to the thrift interface.
> 
> Actually, I agree with you, more than you know. If I were to start coding Cassandra now,
I wouldn't include super columns (and I would probably not go for a depth unlimited hierarchical
model either). But it's there and I'm not sure getting rid of them fully (meaning, including
in thrift) is an option (it would be a big compatibility breakage). And (even though I certainly
though about this more than once :)) I'm slightly less enthusiastic about keeping them in
thrift but encoding them in regular column family internally: it would still be a lot of work
but we would still probably end up with nasty tricks to stick to the thrift api. 
>  
> --
> Sylvain
> 
> 
> Mike
> 
> [1] http://www.anuff.com/2010/07/secondary-indexes-in-cassandra.html
> [2] http://www.cidrdb.org/cidr2011/Papers/CIDR11_Paper32.pdf
> 
> 


Mime
View raw message