Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@cassandra.apache.org
Received-SPF: neutral (nike.apache.org: local policy)
MIME-Version: 1.0
In-Reply-To: <AANLkTinZL6A7hxueGoXNWepWPLkELNiu0gYsT_7NK822@mail.gmail.com>
References: <AANLkTins4=jbpwFZn-kfaQXOU8fHeseg6XJ2wdB-8r3W@mail.gmail.com>
	<AANLkTinvah0VEeCYQFoXZXFcbLkJcwxo+32Z8mZBAsRM@mail.gmail.com>
	<AANLkTi=rFmgrMKdJf1jKTOkEqJpCSPyiOXWhg7quGtp_@mail.gmail.com>
	<AANLkTi=OagFC1M104Fq8Od09-+WMAndPm1m=nvhNPBOq@mail.gmail.com>
	<AANLkTi=dfYnSp9cpqoXXvS1bd9zFufQ3FL5suHF6KJnO@mail.gmail.com>
	<AANLkTinBxPTHRvzt4XaEXsLNg2CHMSuuD0fK7=RxdgLd@mail.gmail.com>
	<AANLkTi=17tZBKArQ9KidK=GYse1BJsqVQ2Df2eoYY=Vg@mail.gmail.com>
	<AANLkTinx+rBwhTd_CEzLngcBrrq803U-bPQtterwTYas@mail.gmail.com>
	<AANLkTimHqXyKs-PLLeYmsykXQfZqrx8QMXpCMo6k27cZ@mail.gmail.com>
	<6E82BB36-A9AC-4AC3-911D-14D6F39A0A6E@cuttshome.net>
	<AANLkTi=sVrM_xKJ-CTfpBxWAfrRTc70Ebh8if=JPUWdP@mail.gmail.com>
	<AANLkTinZL6A7hxueGoXNWepWPLkELNiu0gYsT_7NK822@mail.gmail.com>
Date: Thu, 10 Feb 2011 09:32:20 +0200
Message-ID: <AANLkTikA-Zx73CDtG_nafiNR3BCeLbKg95w5sWuFh1Eq@mail.gmail.com>
Subject: Re: Do supercolumns have a purpose?
From: David Boxenhorn <david@lookin2.com>
To: user@cassandra.apache.org
Content-Type: multipart/alternative; boundary=90e6ba4fc4b6f0cfc0049be8944b

--90e6ba4fc4b6f0cfc0049be8944b
Content-Type: text/plain; charset=ISO-8859-1

Mike, my problem is that I have an database and codebase that already uses
supercolumns. If I had to do it over, it wouldn't use them, for the reasons
you point out. In fact, I have a feeling that over time supercolumns will
become deprecated de facto, if not de jure. That's why I would like to see
them represented internally as regular columns, with an upgrade path for
backward compatibility.

I would love to do it myself! (I haven't looked at the code base, but I
don't understand why it should be so hard.) But my employer has other
ideas...


On Wed, Feb 9, 2011 at 8:14 PM, Mike Malone <mike@simplegeo.com> wrote:

> On Tue, Feb 8, 2011 at 2:03 AM, David Boxenhorn <david@lookin2.com> wrote:
>
>> Shaun, I agree with you, but marking them as deprecated is not good enough
>> for me. I can't easily stop using supercolumns. I need an upgrade path.
>>
>
> David,
>
> Cassandra is open source and community developed. The right thing to do is
> what's best for the community, which sometimes conflicts with what's best
> for individual users. Such strife should be minimized, it will never be
> eliminated. Luckily, because this is an open source, liberal licensed
> project, if you feel strongly about something you should feel free to add
> whatever features you want yourself. I'm sure other people in your situation
> will thank you for it.
>
> At a minimum I think it would behoove you to re-read some of the comments
> here re: why super columns aren't really needed and take another look at
> your data model and code. I would actually be quite surprised to find a use
> of super columns that could not be trivially converted to normal columns. In
> fact, it should be possible to do at the framework/client library layer -
> you probably wouldn't even need to change any application code.
>
> Mike
>
> On Tue, Feb 8, 2011 at 3:53 AM, Shaun Cutts <shaun@cuttshome.net> wrote:
>>
>>>
>>> I'm a newbie here, but, with apologies for my presumptuousness, I think
>>> you should deprecate SuperColumns. They are already distracting you, and as
>>> the years go by the cost of supporting them as you add more and more
>>> functionality is only likely to get worse. It would be better to concentrate
>>> on making the "core" column families better (and I'm sure we can all think
>>> of lots of things we'd like).
>>>
>>> Just dropping SuperColumns would be bad for your reputation -- and for
>>> users like David who are currently using them. But if you mark them clearly
>>> as deprecated and explain why and what to do instead (perhaps putting a bit
>>> of effort into migration tools... or even a "virtual" layer supporting
>>> arbitrary hierarchical data), then you can drop them in a few years (when
>>> you get to 1.0, say), without people feeling betrayed.
>>>
>>> -- Shaun
>>>
>>> On Feb 6, 2011, at 3:48 AM, David Boxenhorn wrote:
>>>
>>> "My main point was to say that it's think it is better to create tickets
>>> for what you want, rather than for something else completely different that
>>> would, as a by-product, give you what you want."
>>>
>>> Then let me say what I want: I want supercolumn families to have any
>>> feature that regular column families have.
>>>
>>> My data model is full of supercolumns. I used them, even though I knew it
>>> didn't *have to*, "because they were there", which implied to me that I was
>>> supposed to use them for some good reason. Now I suspect that they will
>>> gradually become less and less functional, as features are added to regular
>>> column families and not supported for supercolumn families.
>>>
>>>
>>> On Fri, Feb 4, 2011 at 10:58 AM, Sylvain Lebresne <sylvain@datastax.com>wrote:
>>>
>>>> On Fri, Feb 4, 2011 at 12:35 AM, Mike Malone <mike@simplegeo.com>wrote:
>>>>
>>>>> On Thu, Feb 3, 2011 at 6:44 AM, Sylvain Lebresne <sylvain@datastax.com
>>>>> > wrote:
>>>>>
>>>>>> On Thu, Feb 3, 2011 at 3:00 PM, David Boxenhorn <david@lookin2.com>wrote:
>>>>>>
>>>>>>> The advantage would be to enable secondary indexes on supercolumn
>>>>>>> families.
>>>>>>>
>>>>>>
>>>>>> Then I suggest opening a ticket for adding secondary indexes to
>>>>>> supercolumn families and voting on it. This will be 1 or 2 order of
>>>>>> magnitude less work than getting rid of super column internally, and
>>>>>> probably a much better solution anyway.
>>>>>>
>>>>>
>>>>> I realize that this is largely subjective, and on such matters code
>>>>> speaks louder than words, but I don't think I agree with you on the issue of
>>>>> which alternative is less work, or even which is a better solution.
>>>>>
>>>>
>>>> You are right, I put probably too much emphase in that sentence. My main
>>>> point was to say that it's think it is better to create tickets for what you
>>>> want, rather than for something else completely different that would, as a
>>>> by-product, give you what you want.
>>>> Then I suspect that *if* the only goal is to get secondary indexes on
>>>> super columns, then there is a good chance this would be less work than
>>>> getting rid of super columns. But to be fair, secondary indexes on super
>>>> columns may not make too much sense without #598, which itself would require
>>>> quite some work, so clearly I spoke a bit quickly.
>>>>
>>>>
>>>>> If the goal is to have a hierarchical model, limiting the depth to two
>>>>> seems arbitrary. Why not go all the way and allow an arbitrarily deep
>>>>> hierarchy?
>>>>>
>>>>> If a more sophisticated hierarchical model is deemed unnecessary, or
>>>>> impractical, allowing a depth of two seems inconsistent and
>>>>> unnecessary. It's pretty trivial to overlay a hierarchical model on top of
>>>>> the map-of-sorted-maps model that Cassandra implements. Ed Anuff has
>>>>> implemented a custom comparator that does the job [1]. Google's Megastore
>>>>> has a similar architecture and goes even further [2].
>>>>>
>>>>> It seems to me that super columns are a historical artifact from
>>>>> Cassandra's early life as Facebook's inbox storage system. They needed
>>>>> posting lists of messages, sharded by user. So that's what they built. In my
>>>>> dealings with the Cassandra code, super columns end up making a mess all
>>>>> over the place when algorithms need to be special cased and branch based on
>>>>> the column/supercolumn distinction.
>>>>>
>>>>> I won't even mention what it does to the thrift interface.
>>>>>
>>>>
>>>> Actually, I agree with you, more than you know. If I were to start
>>>> coding Cassandra now, I wouldn't include super columns (and I would probably
>>>> not go for a depth unlimited hierarchical model either). But it's there and
>>>> I'm not sure getting rid of them fully (meaning, including in thrift) is an
>>>> option (it would be a big compatibility breakage). And (even though I
>>>> certainly though about this more than once :)) I'm slightly
>>>> less enthusiastic about keeping them in thrift but encoding them in regular
>>>> column family internally: it would still be a lot of work but we would still
>>>> probably end up with nasty tricks to stick to the thrift api.
>>>>
>>>> --
>>>> Sylvain
>>>>
>>>>
>>>>> Mike
>>>>>
>>>>> [1] http://www.anuff.com/2010/07/secondary-indexes-in-cassandra.html
>>>>> [2] http://www.cidrdb.org/cidr2011/Papers/CIDR11_Paper32.pdf
>>>>>
>>>>
>>>>
>>>
>>>
>>
>

--90e6ba4fc4b6f0cfc0049be8944b
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr">Mike, my problem is that I have an database and codebase t=
hat already uses supercolumns. If I had to do it over, it wouldn&#39;t use =
them, for the reasons you point out. In fact, I have a feeling that over ti=
me supercolumns will become deprecated de facto, if not de jure. That&#39;s=
 why I would like to see them represented internally as regular columns, wi=
th an upgrade path for backward compatibility. <br>
<br>I would love to do it myself! (I haven&#39;t looked at the code base, b=
ut I don&#39;t understand why it should be so hard.) But my employer has ot=
her ideas... <br><br><br><div class=3D"gmail_quote">On Wed, Feb 9, 2011 at =
8:14 PM, Mike Malone <span dir=3D"ltr">&lt;<a href=3D"mailto:mike@simplegeo=
.com">mike@simplegeo.com</a>&gt;</span> wrote:<br>
<blockquote class=3D"gmail_quote" style=3D"margin: 0pt 0pt 0pt 0.8ex; borde=
r-left: 1px solid rgb(204, 204, 204); padding-left: 1ex;"><div class=3D"im"=
>On Tue, Feb 8, 2011 at 2:03 AM, David Boxenhorn <span dir=3D"ltr">&lt;<a h=
ref=3D"mailto:david@lookin2.com" target=3D"_blank">david@lookin2.com</a>&gt=
;</span> wrote:<br>
</div><div class=3D"gmail_quote"><div class=3D"im"><blockquote class=3D"gma=
il_quote" style=3D"margin: 0pt 0pt 0pt 0.8ex; border-left: 1px solid rgb(20=
4, 204, 204); padding-left: 1ex;">
<div dir=3D"ltr">Shaun, I agree with you, but marking them as deprecated is=
 not good enough for me. I can&#39;t easily stop using supercolumns. I need=
 an upgrade path.<br></div></blockquote><div><br></div></div><div>David,</d=
iv>
<div>
<br></div><div>Cassandra is open source and community developed. The right =
thing to do is what&#39;s best for the community, which sometimes conflicts=
 with what&#39;s best for individual users. Such strife should be minimized=
, it will never be eliminated. Luckily, because this is an open source, lib=
eral licensed project, if you feel strongly about something you should feel=
 free to add whatever features you want yourself. I&#39;m sure other people=
 in your situation will thank you for it.</div>

<div><br></div><div>At a minimum I think it would=A0behoove you to re-read =
some of the comments here re: why super columns aren&#39;t really needed an=
d take another look at your data model and code. I would actually be quite =
surprised to find a use of super columns that could not be trivially conver=
ted to normal columns. In fact, it should be possible to do at the framewor=
k/client library layer - you probably wouldn&#39;t even need to change any =
application code.</div>

<div><br></div><font color=3D"#888888"><div>Mike</div></font><div><div></di=
v><div class=3D"h5"><div><br></div><blockquote class=3D"gmail_quote" style=
=3D"margin: 0pt 0pt 0pt 0.8ex; border-left: 1px solid rgb(204, 204, 204); p=
adding-left: 1ex;">
<div dir=3D"ltr"><div><div><div class=3D"gmail_quote">On Tue, Feb 8, 2011 a=
t 3:53 AM, Shaun Cutts <span dir=3D"ltr">&lt;<a href=3D"mailto:shaun@cuttsh=
ome.net" target=3D"_blank">shaun@cuttshome.net</a>&gt;</span> wrote:<br>

<blockquote class=3D"gmail_quote" style=3D"margin: 0pt 0pt 0pt 0.8ex; borde=
r-left: 1px solid rgb(204, 204, 204); padding-left: 1ex;"><div style=3D"wor=
d-wrap: break-word;"><div><br></div>I&#39;m a newbie here, but, with apolog=
ies for my presumptuousness, I think you should deprecate SuperColumns. The=
y are already distracting you, and as the years go by the cost of supportin=
g them as you add more and more functionality is only likely to get worse. =
It would be better to concentrate on making the &quot;core&quot; column fam=
ilies better (and I&#39;m sure we can all think of lots of things we&#39;d =
like).<div>


<br></div><div>Just dropping SuperColumns would be bad for your reputation =
-- and for users like David who are currently using them. But if you mark t=
hem clearly as deprecated and explain why and what to do instead (perhaps p=
utting a bit of effort into migration tools... or even a &quot;virtual&quot=
; layer supporting arbitrary hierarchical data), then you can drop them in =
a few years (when you get to 1.0, say), without people feeling betrayed.<br=
>


<br></div><div>-- Shaun</div><div><div></div><div><div><br><div><div>On Feb=
 6, 2011, at 3:48 AM, David Boxenhorn wrote:</div><br><blockquote type=3D"c=
ite"><div dir=3D"ltr">&quot;My main point was to say that it&#39;s think it=
 is better to create tickets=20
for what you want, rather than for something else completely different=20
that would, as a by-product, give you what you want.&quot;<br><br>Then let =
me say what I want: I want supercolumn families to have any feature that re=
gular column families have. <br><br>My data model is full of supercolumns. =
I used them, even though I knew it didn&#39;t *have to*, &quot;because they=
 were there&quot;, which implied to me that I was supposed to use them for =
some good reason. Now I suspect that they will gradually become less and le=
ss functional, as features are added to regular column families and not sup=
ported for supercolumn families. <br>


<br><br><div class=3D"gmail_quote">On Fri, Feb 4, 2011 at 10:58 AM, Sylvain=
 Lebresne <span dir=3D"ltr">&lt;<a href=3D"mailto:sylvain@datastax.com" tar=
get=3D"_blank">sylvain@datastax.com</a>&gt;</span> wrote:<br><blockquote cl=
ass=3D"gmail_quote" style=3D"margin: 0pt 0pt 0pt 0.8ex; border-left: 1px so=
lid rgb(204, 204, 204); padding-left: 1ex;">


<div>On Fri, Feb 4, 2011 at 12:35 AM, Mike Malone <span dir=3D"ltr">&lt;<a =
href=3D"mailto:mike@simplegeo.com" target=3D"_blank">mike@simplegeo.com</a>=
&gt;</span> wrote:<br></div><div class=3D"gmail_quote"><div>
<blockquote class=3D"gmail_quote" style=3D"margin: 0pt 0pt 0pt 0.8ex; borde=
r-left: 1px solid rgb(204, 204, 204); padding-left: 1ex;">
<div>On Thu, Feb 3, 2011 at 6:44 AM, Sylvain Lebresne <span dir=3D"ltr">&lt=
;<a href=3D"mailto:sylvain@datastax.com" target=3D"_blank">sylvain@datastax=
.com</a>&gt;</span> wrote:<br></div><div class=3D"gmail_quote"><div>
<blockquote class=3D"gmail_quote" style=3D"margin: 0pt 0pt 0pt 0.8ex; borde=
r-left: 1px solid rgb(204, 204, 204); padding-left: 1ex;">
<div>On Thu, Feb 3, 2011 at 3:00 PM, David Boxenhorn <span dir=3D"ltr">&lt;=
<a href=3D"mailto:david@lookin2.com" target=3D"_blank">david@lookin2.com</a=
>&gt;</span> wrote:<br></div><div class=3D"gmail_quote"><div>
<blockquote class=3D"gmail_quote" style=3D"margin: 0pt 0pt 0pt 0.8ex; borde=
r-left: 1px solid rgb(204, 204, 204); padding-left: 1ex;">
<div dir=3D"ltr">The advantage would be to enable secondary indexes on supe=
rcolumn families.<br></div></blockquote><div><br></div></div><div>Then I su=
ggest opening a ticket for adding secondary indexes to supercolumn families=
 and voting on it. This will be 1 or 2 order of magnitude less work than ge=
tting rid of super column internally, and probably a much better solution a=
nyway.</div>


</div></blockquote><div><br></div></div><div>I realize that this is largely=
 subjective, and on such matters code speaks louder than words, but I don&#=
39;t think I agree with you on the issue of which alternative is less work,=
 or even which is a better solution.</div>


</div></blockquote><div><br></div></div><div>You are right, I put probably =
too much emphase in that sentence. My main point was to say that it&#39;s t=
hink it is better to create tickets for what you want, rather than for some=
thing else completely different that would, as a by-product, give you what =
you want.</div>


<div>Then I suspect that *if* the only goal is to get secondary indexes on =
super columns, then there is a good chance this would be less work than get=
ting rid of super columns. But to be fair, secondary indexes on super colum=
ns may not make too much sense without #598, which itself would require qui=
te some work, so clearly I spoke a bit quickly.</div>


<div>
<div>=A0</div><blockquote class=3D"gmail_quote" style=3D"margin: 0pt 0pt 0p=
t 0.8ex; border-left: 1px solid rgb(204, 204, 204); padding-left: 1ex;"><di=
v class=3D"gmail_quote"><div>If the goal is to have a hierarchical model, l=
imiting the depth to two seems arbitrary. Why not go all the way and allow =
an arbitrarily deep hierarchy?</div>


<div><br></div><div>If a more sophisticated hierarchical model is deemed un=
necessary, or impractical, allowing a depth of two seems inconsistent and u=
nnecessary.=A0It&#39;s pretty trivial to overlay a hierarchical model on to=
p of the map-of-sorted-maps model that Cassandra implements. Ed Anuff has i=
mplemented a custom comparator that does the job [1]. Google&#39;s Megastor=
e has a similar architecture and goes even further [2].</div>


<div><br></div><div>It seems to me that super columns are a historical arti=
fact from Cassandra&#39;s early life as Facebook&#39;s inbox storage system=
. They needed posting lists of messages, sharded by user. So that&#39;s wha=
t they built. In my dealings with the Cassandra code, super columns end up =
making a mess all over the place when algorithms need to be special cased a=
nd branch based on the column/supercolumn distinction.</div>


<div><br></div><div>I won&#39;t even mention what it does to the thrift int=
erface.</div></div></blockquote><div><br></div></div><div>Actually, I agree=
 with you, more than you know. If I were to start coding Cassandra now, I w=
ouldn&#39;t include super columns (and I would probably not go for a depth =
unlimited hierarchical model either). But it&#39;s there and I&#39;m not su=
re getting rid of them fully (meaning, including in thrift) is an option (i=
t would be a big compatibility breakage). And (even though I certainly thou=
gh about this more than once :)) I&#39;m slightly less=A0enthusiastic about=
 keeping them in thrift but encoding them in regular column family internal=
ly: it would still be a lot of work but we would still probably end up with=
 nasty tricks to stick to the thrift api.=A0</div>


<div>=A0</div><div>--</div><div>Sylvain</div><div><div><br></div><blockquot=
e class=3D"gmail_quote" style=3D"margin: 0pt 0pt 0pt 0.8ex; border-left: 1p=
x solid rgb(204, 204, 204); padding-left: 1ex;"><div class=3D"gmail_quote">
<div><br></div><div>Mike</div><div>
<br></div><div>[1]=A0<a href=3D"http://www.anuff.com/2010/07/secondary-inde=
xes-in-cassandra.html" target=3D"_blank">http://www.anuff.com/2010/07/secon=
dary-indexes-in-cassandra.html</a></div>
<div>[2]=A0<a href=3D"http://www.cidrdb.org/cidr2011/Papers/CIDR11_Paper32.=
pdf" target=3D"_blank">http://www.cidrdb.org/cidr2011/Papers/CIDR11_Paper32=
.pdf</a></div></div>
</blockquote></div></div><br>
</blockquote></div><br></div>
</blockquote></div><br></div></div></div></div></blockquote></div><br></div=
></div></div>
</blockquote></div></div></div><br>
</blockquote></div><br></div>

--90e6ba4fc4b6f0cfc0049be8944b--