cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From David Boxenhorn <da...@lookin2.com>
Subject Re: Do supercolumns have a purpose?
Date Sun, 13 Feb 2011 08:09:16 GMT
I agree, that is the way to go. Then each piece of new functionality will
not have to be implemented twice.

On Sat, Feb 12, 2011 at 9:41 AM, Stu Hood <stuhood@gmail.com> wrote:

> I would like to continue to support super columns, but to slowly convert
> them into "compound column names", since that is really all they really are.
>
>
> On Thu, Feb 10, 2011 at 10:16 AM, Frank LoVecchio <frank@isidorey.com>wrote:
>
>> I've found super column families quite useful when using
>> RandomOrderedPartioner on a low-maintenance cluster (as opposed to
>> Byte/Ordered), e.g. returning ordered data from a TimeUUID comparator type;
>> try doing that with one regular column family and secondary indexes (you
>> could obviously sort on the client side, but that is tedious and not logical
>> for older data).
>>
>> On Thu, Feb 10, 2011 at 12:32 AM, David Boxenhorn <david@lookin2.com>wrote:
>>
>>> Mike, my problem is that I have an database and codebase that already
>>> uses supercolumns. If I had to do it over, it wouldn't use them, for the
>>> reasons you point out. In fact, I have a feeling that over time supercolumns
>>> will become deprecated de facto, if not de jure. That's why I would like to
>>> see them represented internally as regular columns, with an upgrade path for
>>> backward compatibility.
>>>
>>> I would love to do it myself! (I haven't looked at the code base, but I
>>> don't understand why it should be so hard.) But my employer has other
>>> ideas...
>>>
>>>
>>> On Wed, Feb 9, 2011 at 8:14 PM, Mike Malone <mike@simplegeo.com> wrote:
>>>
>>>> On Tue, Feb 8, 2011 at 2:03 AM, David Boxenhorn <david@lookin2.com>wrote:
>>>>
>>>>> Shaun, I agree with you, but marking them as deprecated is not good
>>>>> enough for me. I can't easily stop using supercolumns. I need an upgrade
>>>>> path.
>>>>>
>>>>
>>>> David,
>>>>
>>>> Cassandra is open source and community developed. The right thing to do
>>>> is what's best for the community, which sometimes conflicts with what's best
>>>> for individual users. Such strife should be minimized, it will never be
>>>> eliminated. Luckily, because this is an open source, liberal licensed
>>>> project, if you feel strongly about something you should feel free to add
>>>> whatever features you want yourself. I'm sure other people in your situation
>>>> will thank you for it.
>>>>
>>>> At a minimum I think it would behoove you to re-read some of the
>>>> comments here re: why super columns aren't really needed and take another
>>>> look at your data model and code. I would actually be quite surprised to
>>>> find a use of super columns that could not be trivially converted to normal
>>>> columns. In fact, it should be possible to do at the framework/client
>>>> library layer - you probably wouldn't even need to change any application
>>>> code.
>>>>
>>>> Mike
>>>>
>>>> On Tue, Feb 8, 2011 at 3:53 AM, Shaun Cutts <shaun@cuttshome.net>wrote:
>>>>>
>>>>>>
>>>>>> I'm a newbie here, but, with apologies for my presumptuousness, I
>>>>>> think you should deprecate SuperColumns. They are already distracting
you,
>>>>>> and as the years go by the cost of supporting them as you add more
and more
>>>>>> functionality is only likely to get worse. It would be better to
concentrate
>>>>>> on making the "core" column families better (and I'm sure we can
all think
>>>>>> of lots of things we'd like).
>>>>>>
>>>>>> Just dropping SuperColumns would be bad for your reputation -- and
for
>>>>>> users like David who are currently using them. But if you mark them
clearly
>>>>>> as deprecated and explain why and what to do instead (perhaps putting
a bit
>>>>>> of effort into migration tools... or even a "virtual" layer supporting
>>>>>> arbitrary hierarchical data), then you can drop them in a few years
(when
>>>>>> you get to 1.0, say), without people feeling betrayed.
>>>>>>
>>>>>> -- Shaun
>>>>>>
>>>>>> On Feb 6, 2011, at 3:48 AM, David Boxenhorn wrote:
>>>>>>
>>>>>> "My main point was to say that it's think it is better to create
>>>>>> tickets for what you want, rather than for something else completely
>>>>>> different that would, as a by-product, give you what you want."
>>>>>>
>>>>>> Then let me say what I want: I want supercolumn families to have
any
>>>>>> feature that regular column families have.
>>>>>>
>>>>>> My data model is full of supercolumns. I used them, even though I
knew
>>>>>> it didn't *have to*, "because they were there", which implied to
me that I
>>>>>> was supposed to use them for some good reason. Now I suspect that
they will
>>>>>> gradually become less and less functional, as features are added
to regular
>>>>>> column families and not supported for supercolumn families.
>>>>>>
>>>>>>
>>>>>> On Fri, Feb 4, 2011 at 10:58 AM, Sylvain Lebresne <
>>>>>> sylvain@datastax.com> wrote:
>>>>>>
>>>>>>> On Fri, Feb 4, 2011 at 12:35 AM, Mike Malone <mike@simplegeo.com>wrote:
>>>>>>>
>>>>>>>> On Thu, Feb 3, 2011 at 6:44 AM, Sylvain Lebresne <
>>>>>>>> sylvain@datastax.com> wrote:
>>>>>>>>
>>>>>>>>> On Thu, Feb 3, 2011 at 3:00 PM, David Boxenhorn <david@lookin2.com
>>>>>>>>> > wrote:
>>>>>>>>>
>>>>>>>>>> The advantage would be to enable secondary indexes
on supercolumn
>>>>>>>>>> families.
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Then I suggest opening a ticket for adding secondary
indexes to
>>>>>>>>> supercolumn families and voting on it. This will be 1
or 2 order of
>>>>>>>>> magnitude less work than getting rid of super column
internally, and
>>>>>>>>> probably a much better solution anyway.
>>>>>>>>>
>>>>>>>>
>>>>>>>> I realize that this is largely subjective, and on such matters
code
>>>>>>>> speaks louder than words, but I don't think I agree with
you on the issue of
>>>>>>>> which alternative is less work, or even which is a better
solution.
>>>>>>>>
>>>>>>>
>>>>>>> You are right, I put probably too much emphase in that sentence.
My
>>>>>>> main point was to say that it's think it is better to create
tickets for
>>>>>>> what you want, rather than for something else completely different
that
>>>>>>> would, as a by-product, give you what you want.
>>>>>>> Then I suspect that *if* the only goal is to get secondary indexes
on
>>>>>>> super columns, then there is a good chance this would be less
work than
>>>>>>> getting rid of super columns. But to be fair, secondary indexes
on super
>>>>>>> columns may not make too much sense without #598, which itself
would require
>>>>>>> quite some work, so clearly I spoke a bit quickly.
>>>>>>>
>>>>>>>
>>>>>>>> If the goal is to have a hierarchical model, limiting the
depth to
>>>>>>>> two seems arbitrary. Why not go all the way and allow an
arbitrarily deep
>>>>>>>> hierarchy?
>>>>>>>>
>>>>>>>> If a more sophisticated hierarchical model is deemed unnecessary,
or
>>>>>>>> impractical, allowing a depth of two seems inconsistent and
>>>>>>>> unnecessary. It's pretty trivial to overlay a hierarchical
model on top of
>>>>>>>> the map-of-sorted-maps model that Cassandra implements. Ed
Anuff has
>>>>>>>> implemented a custom comparator that does the job [1]. Google's
Megastore
>>>>>>>> has a similar architecture and goes even further [2].
>>>>>>>>
>>>>>>>> It seems to me that super columns are a historical artifact
from
>>>>>>>> Cassandra's early life as Facebook's inbox storage system.
They needed
>>>>>>>> posting lists of messages, sharded by user. So that's what
they built. In my
>>>>>>>> dealings with the Cassandra code, super columns end up making
a mess all
>>>>>>>> over the place when algorithms need to be special cased and
branch based on
>>>>>>>> the column/supercolumn distinction.
>>>>>>>>
>>>>>>>> I won't even mention what it does to the thrift interface.
>>>>>>>>
>>>>>>>
>>>>>>> Actually, I agree with you, more than you know. If I were to
start
>>>>>>> coding Cassandra now, I wouldn't include super columns (and I
would probably
>>>>>>> not go for a depth unlimited hierarchical model either). But
it's there and
>>>>>>> I'm not sure getting rid of them fully (meaning, including in
thrift) is an
>>>>>>> option (it would be a big compatibility breakage). And (even
though I
>>>>>>> certainly though about this more than once :)) I'm slightly
>>>>>>> less enthusiastic about keeping them in thrift but encoding them
in regular
>>>>>>> column family internally: it would still be a lot of work but
we would still
>>>>>>> probably end up with nasty tricks to stick to the thrift api.
>>>>>>>
>>>>>>> --
>>>>>>> Sylvain
>>>>>>>
>>>>>>>
>>>>>>>> Mike
>>>>>>>>
>>>>>>>> [1]
>>>>>>>> http://www.anuff.com/2010/07/secondary-indexes-in-cassandra.html
>>>>>>>> [2] http://www.cidrdb.org/cidr2011/Papers/CIDR11_Paper32.pdf
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>>
>> --
>> Frank LoVecchio
>> Senior Software Engineer | Isidorey, LLC
>> Google Voice +1.720.295.9179
>> isidorey.com | facebook.com/franklovecchio | franklovecchio.com |
>> rodsandricers.com
>>
>>
>

Mime
View raw message