cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From David Boxenhorn <>
Subject Re: Do supercolumns have a purpose?
Date Tue, 08 Feb 2011 10:03:44 GMT
Shaun, I agree with you, but marking them as deprecated is not good enough
for me. I can't easily stop using supercolumns. I need an upgrade path.

On Tue, Feb 8, 2011 at 3:53 AM, Shaun Cutts <> wrote:

> I'm a newbie here, but, with apologies for my presumptuousness, I think you
> should deprecate SuperColumns. They are already distracting you, and as the
> years go by the cost of supporting them as you add more and more
> functionality is only likely to get worse. It would be better to concentrate
> on making the "core" column families better (and I'm sure we can all think
> of lots of things we'd like).
> Just dropping SuperColumns would be bad for your reputation -- and for
> users like David who are currently using them. But if you mark them clearly
> as deprecated and explain why and what to do instead (perhaps putting a bit
> of effort into migration tools... or even a "virtual" layer supporting
> arbitrary hierarchical data), then you can drop them in a few years (when
> you get to 1.0, say), without people feeling betrayed.
> -- Shaun
> On Feb 6, 2011, at 3:48 AM, David Boxenhorn wrote:
> "My main point was to say that it's think it is better to create tickets
> for what you want, rather than for something else completely different that
> would, as a by-product, give you what you want."
> Then let me say what I want: I want supercolumn families to have any
> feature that regular column families have.
> My data model is full of supercolumns. I used them, even though I knew it
> didn't *have to*, "because they were there", which implied to me that I was
> supposed to use them for some good reason. Now I suspect that they will
> gradually become less and less functional, as features are added to regular
> column families and not supported for supercolumn families.
> On Fri, Feb 4, 2011 at 10:58 AM, Sylvain Lebresne <>wrote:
>> On Fri, Feb 4, 2011 at 12:35 AM, Mike Malone <> wrote:
>>> On Thu, Feb 3, 2011 at 6:44 AM, Sylvain Lebresne <>wrote:
>>>> On Thu, Feb 3, 2011 at 3:00 PM, David Boxenhorn <>wrote:
>>>>> The advantage would be to enable secondary indexes on supercolumn
>>>>> families.
>>>> Then I suggest opening a ticket for adding secondary indexes to
>>>> supercolumn families and voting on it. This will be 1 or 2 order of
>>>> magnitude less work than getting rid of super column internally, and
>>>> probably a much better solution anyway.
>>> I realize that this is largely subjective, and on such matters code
>>> speaks louder than words, but I don't think I agree with you on the issue of
>>> which alternative is less work, or even which is a better solution.
>> You are right, I put probably too much emphase in that sentence. My main
>> point was to say that it's think it is better to create tickets for what you
>> want, rather than for something else completely different that would, as a
>> by-product, give you what you want.
>> Then I suspect that *if* the only goal is to get secondary indexes on
>> super columns, then there is a good chance this would be less work than
>> getting rid of super columns. But to be fair, secondary indexes on super
>> columns may not make too much sense without #598, which itself would require
>> quite some work, so clearly I spoke a bit quickly.
>>> If the goal is to have a hierarchical model, limiting the depth to two
>>> seems arbitrary. Why not go all the way and allow an arbitrarily deep
>>> hierarchy?
>>> If a more sophisticated hierarchical model is deemed unnecessary, or
>>> impractical, allowing a depth of two seems inconsistent and
>>> unnecessary. It's pretty trivial to overlay a hierarchical model on top of
>>> the map-of-sorted-maps model that Cassandra implements. Ed Anuff has
>>> implemented a custom comparator that does the job [1]. Google's Megastore
>>> has a similar architecture and goes even further [2].
>>> It seems to me that super columns are a historical artifact from
>>> Cassandra's early life as Facebook's inbox storage system. They needed
>>> posting lists of messages, sharded by user. So that's what they built. In my
>>> dealings with the Cassandra code, super columns end up making a mess all
>>> over the place when algorithms need to be special cased and branch based on
>>> the column/supercolumn distinction.
>>> I won't even mention what it does to the thrift interface.
>> Actually, I agree with you, more than you know. If I were to start coding
>> Cassandra now, I wouldn't include super columns (and I would probably not go
>> for a depth unlimited hierarchical model either). But it's there and I'm not
>> sure getting rid of them fully (meaning, including in thrift) is an option
>> (it would be a big compatibility breakage). And (even though I certainly
>> though about this more than once :)) I'm slightly less enthusiastic about
>> keeping them in thrift but encoding them in regular column family
>> internally: it would still be a lot of work but we would still probably end
>> up with nasty tricks to stick to the thrift api.
>> --
>> Sylvain
>>> Mike
>>> [1]
>>> [2]

View raw message