cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From David Boxenhorn <da...@lookin2.com>
Subject Re: Do supercolumns have a purpose?
Date Thu, 10 Feb 2011 07:32:20 GMT
Mike, my problem is that I have an database and codebase that already uses
supercolumns. If I had to do it over, it wouldn't use them, for the reasons
you point out. In fact, I have a feeling that over time supercolumns will
become deprecated de facto, if not de jure. That's why I would like to see
them represented internally as regular columns, with an upgrade path for
backward compatibility.

I would love to do it myself! (I haven't looked at the code base, but I
don't understand why it should be so hard.) But my employer has other
ideas...


On Wed, Feb 9, 2011 at 8:14 PM, Mike Malone <mike@simplegeo.com> wrote:

> On Tue, Feb 8, 2011 at 2:03 AM, David Boxenhorn <david@lookin2.com> wrote:
>
>> Shaun, I agree with you, but marking them as deprecated is not good enough
>> for me. I can't easily stop using supercolumns. I need an upgrade path.
>>
>
> David,
>
> Cassandra is open source and community developed. The right thing to do is
> what's best for the community, which sometimes conflicts with what's best
> for individual users. Such strife should be minimized, it will never be
> eliminated. Luckily, because this is an open source, liberal licensed
> project, if you feel strongly about something you should feel free to add
> whatever features you want yourself. I'm sure other people in your situation
> will thank you for it.
>
> At a minimum I think it would behoove you to re-read some of the comments
> here re: why super columns aren't really needed and take another look at
> your data model and code. I would actually be quite surprised to find a use
> of super columns that could not be trivially converted to normal columns. In
> fact, it should be possible to do at the framework/client library layer -
> you probably wouldn't even need to change any application code.
>
> Mike
>
> On Tue, Feb 8, 2011 at 3:53 AM, Shaun Cutts <shaun@cuttshome.net> wrote:
>>
>>>
>>> I'm a newbie here, but, with apologies for my presumptuousness, I think
>>> you should deprecate SuperColumns. They are already distracting you, and as
>>> the years go by the cost of supporting them as you add more and more
>>> functionality is only likely to get worse. It would be better to concentrate
>>> on making the "core" column families better (and I'm sure we can all think
>>> of lots of things we'd like).
>>>
>>> Just dropping SuperColumns would be bad for your reputation -- and for
>>> users like David who are currently using them. But if you mark them clearly
>>> as deprecated and explain why and what to do instead (perhaps putting a bit
>>> of effort into migration tools... or even a "virtual" layer supporting
>>> arbitrary hierarchical data), then you can drop them in a few years (when
>>> you get to 1.0, say), without people feeling betrayed.
>>>
>>> -- Shaun
>>>
>>> On Feb 6, 2011, at 3:48 AM, David Boxenhorn wrote:
>>>
>>> "My main point was to say that it's think it is better to create tickets
>>> for what you want, rather than for something else completely different that
>>> would, as a by-product, give you what you want."
>>>
>>> Then let me say what I want: I want supercolumn families to have any
>>> feature that regular column families have.
>>>
>>> My data model is full of supercolumns. I used them, even though I knew it
>>> didn't *have to*, "because they were there", which implied to me that I was
>>> supposed to use them for some good reason. Now I suspect that they will
>>> gradually become less and less functional, as features are added to regular
>>> column families and not supported for supercolumn families.
>>>
>>>
>>> On Fri, Feb 4, 2011 at 10:58 AM, Sylvain Lebresne <sylvain@datastax.com>wrote:
>>>
>>>> On Fri, Feb 4, 2011 at 12:35 AM, Mike Malone <mike@simplegeo.com>wrote:
>>>>
>>>>> On Thu, Feb 3, 2011 at 6:44 AM, Sylvain Lebresne <sylvain@datastax.com
>>>>> > wrote:
>>>>>
>>>>>> On Thu, Feb 3, 2011 at 3:00 PM, David Boxenhorn <david@lookin2.com>wrote:
>>>>>>
>>>>>>> The advantage would be to enable secondary indexes on supercolumn
>>>>>>> families.
>>>>>>>
>>>>>>
>>>>>> Then I suggest opening a ticket for adding secondary indexes to
>>>>>> supercolumn families and voting on it. This will be 1 or 2 order
of
>>>>>> magnitude less work than getting rid of super column internally,
and
>>>>>> probably a much better solution anyway.
>>>>>>
>>>>>
>>>>> I realize that this is largely subjective, and on such matters code
>>>>> speaks louder than words, but I don't think I agree with you on the issue
of
>>>>> which alternative is less work, or even which is a better solution.
>>>>>
>>>>
>>>> You are right, I put probably too much emphase in that sentence. My main
>>>> point was to say that it's think it is better to create tickets for what
you
>>>> want, rather than for something else completely different that would, as
a
>>>> by-product, give you what you want.
>>>> Then I suspect that *if* the only goal is to get secondary indexes on
>>>> super columns, then there is a good chance this would be less work than
>>>> getting rid of super columns. But to be fair, secondary indexes on super
>>>> columns may not make too much sense without #598, which itself would require
>>>> quite some work, so clearly I spoke a bit quickly.
>>>>
>>>>
>>>>> If the goal is to have a hierarchical model, limiting the depth to two
>>>>> seems arbitrary. Why not go all the way and allow an arbitrarily deep
>>>>> hierarchy?
>>>>>
>>>>> If a more sophisticated hierarchical model is deemed unnecessary, or
>>>>> impractical, allowing a depth of two seems inconsistent and
>>>>> unnecessary. It's pretty trivial to overlay a hierarchical model on top
of
>>>>> the map-of-sorted-maps model that Cassandra implements. Ed Anuff has
>>>>> implemented a custom comparator that does the job [1]. Google's Megastore
>>>>> has a similar architecture and goes even further [2].
>>>>>
>>>>> It seems to me that super columns are a historical artifact from
>>>>> Cassandra's early life as Facebook's inbox storage system. They needed
>>>>> posting lists of messages, sharded by user. So that's what they built.
In my
>>>>> dealings with the Cassandra code, super columns end up making a mess
all
>>>>> over the place when algorithms need to be special cased and branch based
on
>>>>> the column/supercolumn distinction.
>>>>>
>>>>> I won't even mention what it does to the thrift interface.
>>>>>
>>>>
>>>> Actually, I agree with you, more than you know. If I were to start
>>>> coding Cassandra now, I wouldn't include super columns (and I would probably
>>>> not go for a depth unlimited hierarchical model either). But it's there and
>>>> I'm not sure getting rid of them fully (meaning, including in thrift) is
an
>>>> option (it would be a big compatibility breakage). And (even though I
>>>> certainly though about this more than once :)) I'm slightly
>>>> less enthusiastic about keeping them in thrift but encoding them in regular
>>>> column family internally: it would still be a lot of work but we would still
>>>> probably end up with nasty tricks to stick to the thrift api.
>>>>
>>>> --
>>>> Sylvain
>>>>
>>>>
>>>>> Mike
>>>>>
>>>>> [1] http://www.anuff.com/2010/07/secondary-indexes-in-cassandra.html
>>>>> [2] http://www.cidrdb.org/cidr2011/Papers/CIDR11_Paper32.pdf
>>>>>
>>>>
>>>>
>>>
>>>
>>
>

Mime
View raw message