Return-Path: Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: (qmail 5300 invoked from network); 12 Feb 2011 07:41:57 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 12 Feb 2011 07:41:57 -0000 Received: (qmail 78054 invoked by uid 500); 12 Feb 2011 07:41:55 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 77875 invoked by uid 500); 12 Feb 2011 07:41:52 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 77864 invoked by uid 99); 12 Feb 2011 07:41:51 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 12 Feb 2011 07:41:51 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=FREEMAIL_FROM,HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of stuhood@gmail.com designates 74.125.82.44 as permitted sender) Received: from [74.125.82.44] (HELO mail-ww0-f44.google.com) (74.125.82.44) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 12 Feb 2011 07:41:44 +0000 Received: by wwa36 with SMTP id 36so3362308wwa.25 for ; Fri, 11 Feb 2011 23:41:24 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:in-reply-to:references:date :message-id:subject:from:to:content-type; bh=arweJo0lxoLfAOPxiYO8gvvnRab6TeVJoDEErBm1hB4=; b=FA1VCkgZcAjxySkmVD98Wqyxz8yMjqIttSRaeTqmJg2qaBbsisDA+Q3EPhcNYi1bt1 8W5Xqlok3C8KK/lkAUNWMXEbFUZd03FtkPJL6gMy7oinQyNzlY37OD1XXP8x/cmlm8Qb jsL1o1+exGa4n4FcSf4is+5gMWrKDA9s9SY7o= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; b=HHnT8nHE0sw9UF1uaSv9PTewW0chMqTDcK/vLp5c3JGs1SiIK3ID3izXSjZURGC63a rVtLPDyLiq9rP0trS1m+fLzXydhk87JZg6OmmYI4yGjaEFyUbspAXTNFGPN2Pz+llFX5 MfEYat7vIzP3qri2bYEN0Piwc3/YwwXxHPBTQ= MIME-Version: 1.0 Received: by 10.216.176.80 with SMTP id a58mr1385774wem.82.1297496483272; Fri, 11 Feb 2011 23:41:23 -0800 (PST) Received: by 10.216.50.198 with HTTP; Fri, 11 Feb 2011 23:41:23 -0800 (PST) In-Reply-To: References: <6E82BB36-A9AC-4AC3-911D-14D6F39A0A6E@cuttshome.net> Date: Fri, 11 Feb 2011 23:41:23 -0800 Message-ID: Subject: Re: Do supercolumns have a purpose? From: Stu Hood To: user@cassandra.apache.org Content-Type: multipart/alternative; boundary=0016e65c8872fb9858049c10f0b8 X-Virus-Checked: Checked by ClamAV on apache.org --0016e65c8872fb9858049c10f0b8 Content-Type: text/plain; charset=ISO-8859-1 I would like to continue to support super columns, but to slowly convert them into "compound column names", since that is really all they really are. On Thu, Feb 10, 2011 at 10:16 AM, Frank LoVecchio wrote: > I've found super column families quite useful when using > RandomOrderedPartioner on a low-maintenance cluster (as opposed to > Byte/Ordered), e.g. returning ordered data from a TimeUUID comparator type; > try doing that with one regular column family and secondary indexes (you > could obviously sort on the client side, but that is tedious and not logical > for older data). > > On Thu, Feb 10, 2011 at 12:32 AM, David Boxenhorn wrote: > >> Mike, my problem is that I have an database and codebase that already uses >> supercolumns. If I had to do it over, it wouldn't use them, for the reasons >> you point out. In fact, I have a feeling that over time supercolumns will >> become deprecated de facto, if not de jure. That's why I would like to see >> them represented internally as regular columns, with an upgrade path for >> backward compatibility. >> >> I would love to do it myself! (I haven't looked at the code base, but I >> don't understand why it should be so hard.) But my employer has other >> ideas... >> >> >> On Wed, Feb 9, 2011 at 8:14 PM, Mike Malone wrote: >> >>> On Tue, Feb 8, 2011 at 2:03 AM, David Boxenhorn wrote: >>> >>>> Shaun, I agree with you, but marking them as deprecated is not good >>>> enough for me. I can't easily stop using supercolumns. I need an upgrade >>>> path. >>>> >>> >>> David, >>> >>> Cassandra is open source and community developed. The right thing to do >>> is what's best for the community, which sometimes conflicts with what's best >>> for individual users. Such strife should be minimized, it will never be >>> eliminated. Luckily, because this is an open source, liberal licensed >>> project, if you feel strongly about something you should feel free to add >>> whatever features you want yourself. I'm sure other people in your situation >>> will thank you for it. >>> >>> At a minimum I think it would behoove you to re-read some of the comments >>> here re: why super columns aren't really needed and take another look at >>> your data model and code. I would actually be quite surprised to find a use >>> of super columns that could not be trivially converted to normal columns. In >>> fact, it should be possible to do at the framework/client library layer - >>> you probably wouldn't even need to change any application code. >>> >>> Mike >>> >>> On Tue, Feb 8, 2011 at 3:53 AM, Shaun Cutts wrote: >>>> >>>>> >>>>> I'm a newbie here, but, with apologies for my presumptuousness, I think >>>>> you should deprecate SuperColumns. They are already distracting you, and as >>>>> the years go by the cost of supporting them as you add more and more >>>>> functionality is only likely to get worse. It would be better to concentrate >>>>> on making the "core" column families better (and I'm sure we can all think >>>>> of lots of things we'd like). >>>>> >>>>> Just dropping SuperColumns would be bad for your reputation -- and for >>>>> users like David who are currently using them. But if you mark them clearly >>>>> as deprecated and explain why and what to do instead (perhaps putting a bit >>>>> of effort into migration tools... or even a "virtual" layer supporting >>>>> arbitrary hierarchical data), then you can drop them in a few years (when >>>>> you get to 1.0, say), without people feeling betrayed. >>>>> >>>>> -- Shaun >>>>> >>>>> On Feb 6, 2011, at 3:48 AM, David Boxenhorn wrote: >>>>> >>>>> "My main point was to say that it's think it is better to create >>>>> tickets for what you want, rather than for something else completely >>>>> different that would, as a by-product, give you what you want." >>>>> >>>>> Then let me say what I want: I want supercolumn families to have any >>>>> feature that regular column families have. >>>>> >>>>> My data model is full of supercolumns. I used them, even though I knew >>>>> it didn't *have to*, "because they were there", which implied to me that I >>>>> was supposed to use them for some good reason. Now I suspect that they will >>>>> gradually become less and less functional, as features are added to regular >>>>> column families and not supported for supercolumn families. >>>>> >>>>> >>>>> On Fri, Feb 4, 2011 at 10:58 AM, Sylvain Lebresne < >>>>> sylvain@datastax.com> wrote: >>>>> >>>>>> On Fri, Feb 4, 2011 at 12:35 AM, Mike Malone wrote: >>>>>> >>>>>>> On Thu, Feb 3, 2011 at 6:44 AM, Sylvain Lebresne < >>>>>>> sylvain@datastax.com> wrote: >>>>>>> >>>>>>>> On Thu, Feb 3, 2011 at 3:00 PM, David Boxenhorn wrote: >>>>>>>> >>>>>>>>> The advantage would be to enable secondary indexes on supercolumn >>>>>>>>> families. >>>>>>>>> >>>>>>>> >>>>>>>> Then I suggest opening a ticket for adding secondary indexes to >>>>>>>> supercolumn families and voting on it. This will be 1 or 2 order of >>>>>>>> magnitude less work than getting rid of super column internally, and >>>>>>>> probably a much better solution anyway. >>>>>>>> >>>>>>> >>>>>>> I realize that this is largely subjective, and on such matters code >>>>>>> speaks louder than words, but I don't think I agree with you on the issue of >>>>>>> which alternative is less work, or even which is a better solution. >>>>>>> >>>>>> >>>>>> You are right, I put probably too much emphase in that sentence. My >>>>>> main point was to say that it's think it is better to create tickets for >>>>>> what you want, rather than for something else completely different that >>>>>> would, as a by-product, give you what you want. >>>>>> Then I suspect that *if* the only goal is to get secondary indexes on >>>>>> super columns, then there is a good chance this would be less work than >>>>>> getting rid of super columns. But to be fair, secondary indexes on super >>>>>> columns may not make too much sense without #598, which itself would require >>>>>> quite some work, so clearly I spoke a bit quickly. >>>>>> >>>>>> >>>>>>> If the goal is to have a hierarchical model, limiting the depth to >>>>>>> two seems arbitrary. Why not go all the way and allow an arbitrarily deep >>>>>>> hierarchy? >>>>>>> >>>>>>> If a more sophisticated hierarchical model is deemed unnecessary, or >>>>>>> impractical, allowing a depth of two seems inconsistent and >>>>>>> unnecessary. It's pretty trivial to overlay a hierarchical model on top of >>>>>>> the map-of-sorted-maps model that Cassandra implements. Ed Anuff has >>>>>>> implemented a custom comparator that does the job [1]. Google's Megastore >>>>>>> has a similar architecture and goes even further [2]. >>>>>>> >>>>>>> It seems to me that super columns are a historical artifact from >>>>>>> Cassandra's early life as Facebook's inbox storage system. They needed >>>>>>> posting lists of messages, sharded by user. So that's what they built. In my >>>>>>> dealings with the Cassandra code, super columns end up making a mess all >>>>>>> over the place when algorithms need to be special cased and branch based on >>>>>>> the column/supercolumn distinction. >>>>>>> >>>>>>> I won't even mention what it does to the thrift interface. >>>>>>> >>>>>> >>>>>> Actually, I agree with you, more than you know. If I were to start >>>>>> coding Cassandra now, I wouldn't include super columns (and I would probably >>>>>> not go for a depth unlimited hierarchical model either). But it's there and >>>>>> I'm not sure getting rid of them fully (meaning, including in thrift) is an >>>>>> option (it would be a big compatibility breakage). And (even though I >>>>>> certainly though about this more than once :)) I'm slightly >>>>>> less enthusiastic about keeping them in thrift but encoding them in regular >>>>>> column family internally: it would still be a lot of work but we would still >>>>>> probably end up with nasty tricks to stick to the thrift api. >>>>>> >>>>>> -- >>>>>> Sylvain >>>>>> >>>>>> >>>>>>> Mike >>>>>>> >>>>>>> [1] http://www.anuff.com/2010/07/secondary-indexes-in-cassandra.html >>>>>>> [2] http://www.cidrdb.org/cidr2011/Papers/CIDR11_Paper32.pdf >>>>>>> >>>>>> >>>>>> >>>>> >>>>> >>>> >>> >> > > > -- > Frank LoVecchio > Senior Software Engineer | Isidorey, LLC > Google Voice +1.720.295.9179 > isidorey.com | facebook.com/franklovecchio | franklovecchio.com | > rodsandricers.com > > --0016e65c8872fb9858049c10f0b8 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable I would like to continue to support super columns, but to slowly convert th= em into "compound column names", since that is really all they re= ally are.

On Thu, Feb 10, 2011 at 10:16 A= M, Frank LoVecchio <frank@isidorey.com> wrote:
I've found super column families quite = useful when using RandomOrderedPartioner on a low-maintenance cluster (as o= pposed to Byte/Ordered), e.g. returning ordered data from a TimeUUID compar= ator type; try doing that with one regular column family and secondary inde= xes (you could obviously sort on the client side, but that is tedious and n= ot logical for older data).=A0=A0

On Thu, Feb 10, 2011 at 12:32 AM, David Boxe= nhorn <david@lookin2.com> wrote:
Mike, my problem is that I have an database and codebase t= hat already uses supercolumns. If I had to do it over, it wouldn't use = them, for the reasons you point out. In fact, I have a feeling that over ti= me supercolumns will become deprecated de facto, if not de jure. That's= why I would like to see them represented internally as regular columns, wi= th an upgrade path for backward compatibility.

I would love to do it myself! (I haven't looked at the code base, b= ut I don't understand why it should be so hard.) But my employer has ot= her ideas...


On Wed, Feb 9, 2011 at 8:14 PM, Mike Malone <mike@simplegeo.com>= wrote:
On Tue, Feb 8, 201= 1 at 2:03 AM, David Boxenhorn <david@lookin2.com> wrote:
Shaun, I agree with you, but marking them as deprecated is= not good enough for me. I can't easily stop using supercolumns. I need= an upgrade path.

David,

Cassandra is open source and community developed. The right = thing to do is what's best for the community, which sometimes conflicts= with what's best for individual users. Such strife should be minimized= , it will never be eliminated. Luckily, because this is an open source, lib= eral licensed project, if you feel strongly about something you should feel= free to add whatever features you want yourself. I'm sure other people= in your situation will thank you for it.

At a minimum I think it would=A0behoove you to re-read = some of the comments here re: why super columns aren't really needed an= d take another look at your data model and code. I would actually be quite = surprised to find a use of super columns that could not be trivially conver= ted to normal columns. In fact, it should be possible to do at the framewor= k/client library layer - you probably wouldn't even need to change any = application code.

Mike

On Tue, Feb 8, 2011 a= t 3:53 AM, Shaun Cutts <shaun@cuttshome.net> wrote:

I'm a newbie here, but, with apologies for = my presumptuousness, I think you should deprecate SuperColumns. They are al= ready distracting you, and as the years go by the cost of supporting them a= s you add more and more functionality is only likely to get worse. It would= be better to concentrate on making the "core" column families be= tter (and I'm sure we can all think of lots of things we'd like).
Just dropping SuperColumns would be bad for your reputation = -- and for users like David who are currently using them. But if you mark t= hem clearly as deprecated and explain why and what to do instead (perhaps p= utting a bit of effort into migration tools... or even a "virtual"= ; layer supporting arbitrary hierarchical data), then you can drop them in = a few years (when you get to 1.0, say), without people feeling betrayed.
-- Shaun

On Feb= 6, 2011, at 3:48 AM, David Boxenhorn wrote:

"My main point was to say that it's think it= is better to create tickets=20 for what you want, rather than for something else completely different=20 that would, as a by-product, give you what you want."

Then let = me say what I want: I want supercolumn families to have any feature that re= gular column families have.

My data model is full of supercolumns. = I used them, even though I knew it didn't *have to*, "because they= were there", which implied to me that I was supposed to use them for = some good reason. Now I suspect that they will gradually become less and le= ss functional, as features are added to regular column families and not sup= ported for supercolumn families.


On Fri, Feb 4, 2011 at 10:58 AM, Sylvain= Lebresne <sylvain@datastax.com> wrote:
On Fri, Feb 4, 2011 at 12:35 AM, Mike Malone <mike@simplegeo.com= > wrote:
Then I suspect that *if* the only goal is to get secondary indexes on = super columns, then there is a good chance this would be less work than get= ting rid of super columns. But to be fair, secondary indexes on super colum= ns may not make too much sense without #598, which itself would require qui= te some work, so clearly I spoke a bit quickly.
=A0
If the goal is to have a hierarchical model, limitin= g the depth to two seems arbitrary. Why not go all the way and allow an arb= itrarily deep hierarchy?

If a more sophisticated hierarchical model is deemed un= necessary, or impractical, allowing a depth of two seems inconsistent and u= nnecessary.=A0It's pretty trivial to overlay a hierarchical model on to= p of the map-of-sorted-maps model that Cassandra implements. Ed Anuff has i= mplemented a custom comparator that does the job [1]. Google's Megastor= e has a similar architecture and goes even further [2].

It seems to me that super columns are a historical arti= fact from Cassandra's early life as Facebook's inbox storage system= . They needed posting lists of messages, sharded by user. So that's wha= t they built. In my dealings with the Cassandra code, super columns end up = making a mess all over the place when algorithms need to be special cased a= nd branch based on the column/supercolumn distinction.

I won't even mention what it does to the thrift int= erface.

Actually, I agree= with you, more than you know. If I were to start coding Cassandra now, I w= ouldn't include super columns (and I would probably not go for a depth = unlimited hierarchical model either). But it's there and I'm not su= re getting rid of them fully (meaning, including in thrift) is an option (i= t would be a big compatibility breakage). And (even though I certainly thou= gh about this more than once :)) I'm slightly less=A0enthusiastic about= keeping them in thrift but encoding them in regular column family internal= ly: it would still be a lot of work but we would still probably end up with= nasty tricks to stick to the thrift api.=A0
=A0
--
Sylvain









--
Frank = LoVecchio
Senior Software Engineer | Isidorey, LLC
Goog= le Voice +1.720.295.9179


--0016e65c8872fb9858049c10f0b8--