cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sylvain Lebresne <sylv...@datastax.com>
Subject Re: Do supercolumns have a purpose?
Date Fri, 04 Feb 2011 08:58:56 GMT
On Fri, Feb 4, 2011 at 12:35 AM, Mike Malone <mike@simplegeo.com> wrote:

> On Thu, Feb 3, 2011 at 6:44 AM, Sylvain Lebresne <sylvain@datastax.com>wrote:
>
>> On Thu, Feb 3, 2011 at 3:00 PM, David Boxenhorn <david@lookin2.com>wrote:
>>
>>> The advantage would be to enable secondary indexes on supercolumn
>>> families.
>>>
>>
>> Then I suggest opening a ticket for adding secondary indexes to
>> supercolumn families and voting on it. This will be 1 or 2 order of
>> magnitude less work than getting rid of super column internally, and
>> probably a much better solution anyway.
>>
>
> I realize that this is largely subjective, and on such matters code speaks
> louder than words, but I don't think I agree with you on the issue of which
> alternative is less work, or even which is a better solution.
>

You are right, I put probably too much emphase in that sentence. My main
point was to say that it's think it is better to create tickets for what you
want, rather than for something else completely different that would, as a
by-product, give you what you want.
Then I suspect that *if* the only goal is to get secondary indexes on super
columns, then there is a good chance this would be less work than getting
rid of super columns. But to be fair, secondary indexes on super columns may
not make too much sense without #598, which itself would require quite some
work, so clearly I spoke a bit quickly.


> If the goal is to have a hierarchical model, limiting the depth to two
> seems arbitrary. Why not go all the way and allow an arbitrarily deep
> hierarchy?
>
> If a more sophisticated hierarchical model is deemed unnecessary, or
> impractical, allowing a depth of two seems inconsistent and
> unnecessary. It's pretty trivial to overlay a hierarchical model on top of
> the map-of-sorted-maps model that Cassandra implements. Ed Anuff has
> implemented a custom comparator that does the job [1]. Google's Megastore
> has a similar architecture and goes even further [2].
>
> It seems to me that super columns are a historical artifact from
> Cassandra's early life as Facebook's inbox storage system. They needed
> posting lists of messages, sharded by user. So that's what they built. In my
> dealings with the Cassandra code, super columns end up making a mess all
> over the place when algorithms need to be special cased and branch based on
> the column/supercolumn distinction.
>
> I won't even mention what it does to the thrift interface.
>

Actually, I agree with you, more than you know. If I were to start coding
Cassandra now, I wouldn't include super columns (and I would probably not go
for a depth unlimited hierarchical model either). But it's there and I'm not
sure getting rid of them fully (meaning, including in thrift) is an option
(it would be a big compatibility breakage). And (even though I certainly
though about this more than once :)) I'm slightly less enthusiastic about
keeping them in thrift but encoding them in regular column family
internally: it would still be a lot of work but we would still probably end
up with nasty tricks to stick to the thrift api.

--
Sylvain


> Mike
>
> [1] http://www.anuff.com/2010/07/secondary-indexes-in-cassandra.html
> [2] http://www.cidrdb.org/cidr2011/Papers/CIDR11_Paper32.pdf
>

Mime
View raw message