Return-Path: Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: (qmail 40816 invoked from network); 10 Feb 2011 07:32:53 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 10 Feb 2011 07:32:53 -0000 Received: (qmail 26499 invoked by uid 500); 10 Feb 2011 07:32:51 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 26306 invoked by uid 500); 10 Feb 2011 07:32:49 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 26298 invoked by uid 99); 10 Feb 2011 07:32:48 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 10 Feb 2011 07:32:48 +0000 X-ASF-Spam-Status: No, hits=2.2 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (nike.apache.org: local policy) Received: from [209.85.210.172] (HELO mail-iy0-f172.google.com) (209.85.210.172) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 10 Feb 2011 07:32:42 +0000 Received: by iym1 with SMTP id 1so1080111iym.31 for ; Wed, 09 Feb 2011 23:32:20 -0800 (PST) MIME-Version: 1.0 Received: by 10.231.205.205 with SMTP id fr13mr21637669ibb.120.1297323140369; Wed, 09 Feb 2011 23:32:20 -0800 (PST) Received: by 10.231.15.72 with HTTP; Wed, 9 Feb 2011 23:32:20 -0800 (PST) X-Originating-IP: [80.179.102.198] In-Reply-To: References: <6E82BB36-A9AC-4AC3-911D-14D6F39A0A6E@cuttshome.net> Date: Thu, 10 Feb 2011 09:32:20 +0200 Message-ID: Subject: Re: Do supercolumns have a purpose? From: David Boxenhorn To: user@cassandra.apache.org Content-Type: multipart/alternative; boundary=90e6ba4fc4b6f0cfc0049be8944b X-Virus-Checked: Checked by ClamAV on apache.org --90e6ba4fc4b6f0cfc0049be8944b Content-Type: text/plain; charset=ISO-8859-1 Mike, my problem is that I have an database and codebase that already uses supercolumns. If I had to do it over, it wouldn't use them, for the reasons you point out. In fact, I have a feeling that over time supercolumns will become deprecated de facto, if not de jure. That's why I would like to see them represented internally as regular columns, with an upgrade path for backward compatibility. I would love to do it myself! (I haven't looked at the code base, but I don't understand why it should be so hard.) But my employer has other ideas... On Wed, Feb 9, 2011 at 8:14 PM, Mike Malone wrote: > On Tue, Feb 8, 2011 at 2:03 AM, David Boxenhorn wrote: > >> Shaun, I agree with you, but marking them as deprecated is not good enough >> for me. I can't easily stop using supercolumns. I need an upgrade path. >> > > David, > > Cassandra is open source and community developed. The right thing to do is > what's best for the community, which sometimes conflicts with what's best > for individual users. Such strife should be minimized, it will never be > eliminated. Luckily, because this is an open source, liberal licensed > project, if you feel strongly about something you should feel free to add > whatever features you want yourself. I'm sure other people in your situation > will thank you for it. > > At a minimum I think it would behoove you to re-read some of the comments > here re: why super columns aren't really needed and take another look at > your data model and code. I would actually be quite surprised to find a use > of super columns that could not be trivially converted to normal columns. In > fact, it should be possible to do at the framework/client library layer - > you probably wouldn't even need to change any application code. > > Mike > > On Tue, Feb 8, 2011 at 3:53 AM, Shaun Cutts wrote: >> >>> >>> I'm a newbie here, but, with apologies for my presumptuousness, I think >>> you should deprecate SuperColumns. They are already distracting you, and as >>> the years go by the cost of supporting them as you add more and more >>> functionality is only likely to get worse. It would be better to concentrate >>> on making the "core" column families better (and I'm sure we can all think >>> of lots of things we'd like). >>> >>> Just dropping SuperColumns would be bad for your reputation -- and for >>> users like David who are currently using them. But if you mark them clearly >>> as deprecated and explain why and what to do instead (perhaps putting a bit >>> of effort into migration tools... or even a "virtual" layer supporting >>> arbitrary hierarchical data), then you can drop them in a few years (when >>> you get to 1.0, say), without people feeling betrayed. >>> >>> -- Shaun >>> >>> On Feb 6, 2011, at 3:48 AM, David Boxenhorn wrote: >>> >>> "My main point was to say that it's think it is better to create tickets >>> for what you want, rather than for something else completely different that >>> would, as a by-product, give you what you want." >>> >>> Then let me say what I want: I want supercolumn families to have any >>> feature that regular column families have. >>> >>> My data model is full of supercolumns. I used them, even though I knew it >>> didn't *have to*, "because they were there", which implied to me that I was >>> supposed to use them for some good reason. Now I suspect that they will >>> gradually become less and less functional, as features are added to regular >>> column families and not supported for supercolumn families. >>> >>> >>> On Fri, Feb 4, 2011 at 10:58 AM, Sylvain Lebresne wrote: >>> >>>> On Fri, Feb 4, 2011 at 12:35 AM, Mike Malone wrote: >>>> >>>>> On Thu, Feb 3, 2011 at 6:44 AM, Sylvain Lebresne >>>> > wrote: >>>>> >>>>>> On Thu, Feb 3, 2011 at 3:00 PM, David Boxenhorn wrote: >>>>>> >>>>>>> The advantage would be to enable secondary indexes on supercolumn >>>>>>> families. >>>>>>> >>>>>> >>>>>> Then I suggest opening a ticket for adding secondary indexes to >>>>>> supercolumn families and voting on it. This will be 1 or 2 order of >>>>>> magnitude less work than getting rid of super column internally, and >>>>>> probably a much better solution anyway. >>>>>> >>>>> >>>>> I realize that this is largely subjective, and on such matters code >>>>> speaks louder than words, but I don't think I agree with you on the issue of >>>>> which alternative is less work, or even which is a better solution. >>>>> >>>> >>>> You are right, I put probably too much emphase in that sentence. My main >>>> point was to say that it's think it is better to create tickets for what you >>>> want, rather than for something else completely different that would, as a >>>> by-product, give you what you want. >>>> Then I suspect that *if* the only goal is to get secondary indexes on >>>> super columns, then there is a good chance this would be less work than >>>> getting rid of super columns. But to be fair, secondary indexes on super >>>> columns may not make too much sense without #598, which itself would require >>>> quite some work, so clearly I spoke a bit quickly. >>>> >>>> >>>>> If the goal is to have a hierarchical model, limiting the depth to two >>>>> seems arbitrary. Why not go all the way and allow an arbitrarily deep >>>>> hierarchy? >>>>> >>>>> If a more sophisticated hierarchical model is deemed unnecessary, or >>>>> impractical, allowing a depth of two seems inconsistent and >>>>> unnecessary. It's pretty trivial to overlay a hierarchical model on top of >>>>> the map-of-sorted-maps model that Cassandra implements. Ed Anuff has >>>>> implemented a custom comparator that does the job [1]. Google's Megastore >>>>> has a similar architecture and goes even further [2]. >>>>> >>>>> It seems to me that super columns are a historical artifact from >>>>> Cassandra's early life as Facebook's inbox storage system. They needed >>>>> posting lists of messages, sharded by user. So that's what they built. In my >>>>> dealings with the Cassandra code, super columns end up making a mess all >>>>> over the place when algorithms need to be special cased and branch based on >>>>> the column/supercolumn distinction. >>>>> >>>>> I won't even mention what it does to the thrift interface. >>>>> >>>> >>>> Actually, I agree with you, more than you know. If I were to start >>>> coding Cassandra now, I wouldn't include super columns (and I would probably >>>> not go for a depth unlimited hierarchical model either). But it's there and >>>> I'm not sure getting rid of them fully (meaning, including in thrift) is an >>>> option (it would be a big compatibility breakage). And (even though I >>>> certainly though about this more than once :)) I'm slightly >>>> less enthusiastic about keeping them in thrift but encoding them in regular >>>> column family internally: it would still be a lot of work but we would still >>>> probably end up with nasty tricks to stick to the thrift api. >>>> >>>> -- >>>> Sylvain >>>> >>>> >>>>> Mike >>>>> >>>>> [1] http://www.anuff.com/2010/07/secondary-indexes-in-cassandra.html >>>>> [2] http://www.cidrdb.org/cidr2011/Papers/CIDR11_Paper32.pdf >>>>> >>>> >>>> >>> >>> >> > --90e6ba4fc4b6f0cfc0049be8944b Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable
Mike, my problem is that I have an database and codebase t= hat already uses supercolumns. If I had to do it over, it wouldn't use = them, for the reasons you point out. In fact, I have a feeling that over ti= me supercolumns will become deprecated de facto, if not de jure. That's= why I would like to see them represented internally as regular columns, wi= th an upgrade path for backward compatibility.

I would love to do it myself! (I haven't looked at the code base, b= ut I don't understand why it should be so hard.) But my employer has ot= her ideas...


On Wed, Feb 9, 2011 at = 8:14 PM, Mike Malone <mike@simplegeo.com> wrote:
On Tue, Feb 8, 2011 at 2:03 AM, David Boxenhorn <david@lookin2.com>= ; wrote:
Shaun, I agree with you, but marking them as deprecated is= not good enough for me. I can't easily stop using supercolumns. I need= an upgrade path.

David,

Cassandra is open source and community developed. The right = thing to do is what's best for the community, which sometimes conflicts= with what's best for individual users. Such strife should be minimized= , it will never be eliminated. Luckily, because this is an open source, lib= eral licensed project, if you feel strongly about something you should feel= free to add whatever features you want yourself. I'm sure other people= in your situation will thank you for it.

At a minimum I think it would=A0behoove you to re-read = some of the comments here re: why super columns aren't really needed an= d take another look at your data model and code. I would actually be quite = surprised to find a use of super columns that could not be trivially conver= ted to normal columns. In fact, it should be possible to do at the framewor= k/client library layer - you probably wouldn't even need to change any = application code.

Mike

On Tue, Feb 8, 2011 a= t 3:53 AM, Shaun Cutts <shaun@cuttshome.net> wrote:

I'm a newbie here, but, with apolog= ies for my presumptuousness, I think you should deprecate SuperColumns. The= y are already distracting you, and as the years go by the cost of supportin= g them as you add more and more functionality is only likely to get worse. = It would be better to concentrate on making the "core" column fam= ilies better (and I'm sure we can all think of lots of things we'd = like).

Just dropping SuperColumns would be bad for your reputation = -- and for users like David who are currently using them. But if you mark t= hem clearly as deprecated and explain why and what to do instead (perhaps p= utting a bit of effort into migration tools... or even a "virtual"= ; layer supporting arbitrary hierarchical data), then you can drop them in = a few years (when you get to 1.0, say), without people feeling betrayed.
-- Shaun

On Feb= 6, 2011, at 3:48 AM, David Boxenhorn wrote:

"My main point was to say that it's think it= is better to create tickets=20 for what you want, rather than for something else completely different=20 that would, as a by-product, give you what you want."

Then let = me say what I want: I want supercolumn families to have any feature that re= gular column families have.

My data model is full of supercolumns. = I used them, even though I knew it didn't *have to*, "because they= were there", which implied to me that I was supposed to use them for = some good reason. Now I suspect that they will gradually become less and le= ss functional, as features are added to regular column families and not sup= ported for supercolumn families.


On Fri, Feb 4, 2011 at 10:58 AM, Sylvain= Lebresne <sylvain@datastax.com> wrote:
On Fri, Feb 4, 2011 at 12:35 AM, Mike Malone <mike@simplegeo.com= > wrote:
Then I suspect that *if* the only goal is to get secondary indexes on = super columns, then there is a good chance this would be less work than get= ting rid of super columns. But to be fair, secondary indexes on super colum= ns may not make too much sense without #598, which itself would require qui= te some work, so clearly I spoke a bit quickly.
=A0
If the goal is to have a hierarchical model, l= imiting the depth to two seems arbitrary. Why not go all the way and allow = an arbitrarily deep hierarchy?

If a more sophisticated hierarchical model is deemed un= necessary, or impractical, allowing a depth of two seems inconsistent and u= nnecessary.=A0It's pretty trivial to overlay a hierarchical model on to= p of the map-of-sorted-maps model that Cassandra implements. Ed Anuff has i= mplemented a custom comparator that does the job [1]. Google's Megastor= e has a similar architecture and goes even further [2].

It seems to me that super columns are a historical arti= fact from Cassandra's early life as Facebook's inbox storage system= . They needed posting lists of messages, sharded by user. So that's wha= t they built. In my dealings with the Cassandra code, super columns end up = making a mess all over the place when algorithms need to be special cased a= nd branch based on the column/supercolumn distinction.

I won't even mention what it does to the thrift int= erface.

=A0
--
Sylvain






--90e6ba4fc4b6f0cfc0049be8944b--