incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sylvain Lebresne <sylv...@datastax.com>
Subject Re: Do supercolumns have a purpose?
Date Thu, 03 Feb 2011 13:32:57 GMT
On Thu, Feb 3, 2011 at 1:33 PM, David Boxenhorn <david@lookin2.com> wrote:

> Thanks Sylvain!
>
> Can I vote for internally implementing supercolumn families as regular
> column families? (With a smooth upgrade process that doesn't require
> shutting down a live cluster.)
>

I forgot to add that I don't know if this make a lot of sense. That would be
a fairly major refactor (so error prone), you'd still have to deal with the
point I mentioned in my previous mail (for range deletes you would have to
change the on-disk format for instance), and all this for no actual
benefits, even downsides actually (encoded supercolumn will take more space
on-disk (and on-memory)). Super columns are there and work fairly well, so
what would be the point ?

I'm only just saying that 'in theory', super columns are not the super shiny
magical feature that give you stuff you can't hope to have with only regular
column family. That doesn't make then at least nice.

That being said, you are free to create whatever ticket you want and vote
for it. Don't expect too much support tough :)


> What if supercolumn families were supported as regular column families + an
> index (on what used to be supercolumn keys)? Would that solve some problems?
>

You'd still have to remember for each CF if it has this index on what used
to be supercolumn keys and handle those differently. Really not convince
this would make the code cleaner that how it is now. And making the code
cleaner is really the only reason I can thing of for wanting to get rid of
super columns internally, so ...


>
>
> On Thu, Feb 3, 2011 at 2:00 PM, Sylvain Lebresne <sylvain@datastax.com>wrote:
>
>> > Is there any advantage to using supercolumns
>> > (columnFamilyName[superColumnName[columnName[val]]]) instead of regular
>> > columns with concatenated keys
>> > (columnFamilyName[superColumnName@columnName[val]])?
>> >
>> > When I designed my data model, I used supercolumns wherever I needed two
>> > levels of key depth - just because they were there, and I figured that
>> they
>> > must be there for a reason.
>> >
>> > Now I see that in 0.7 secondary indexes don't work on supercolumns or
>> > subcolumns (is that right?), which seems to me like a very serious
>> > limitation of supercolumn families.
>> >
>> > It raises the question: Is there anything that supercolumn families are
>> good
>> > for?
>>
>> There is a bunch of queries that you cannot do (or less conveniently) if
>> you
>> encode super columns using regular columns with concatenated keys:
>>
>> 1) If you use regular columns with concatenated keys, the count argument
>> count simple columns. With super columns it counts super columns. It means
>> that you can't do "give me the 10 first super columns of this row".
>>
>> 2) If you need to get x super columns by name, you'll have to issue x
>> get_slice query (one of each super column). On the client side it sucks.
>> Internally in Cassandra we could do it reasonably well though.
>>
>> 3) You cannot remove entire super columns since there is no support for
>> range
>> deletions.
>>
>> Moreover, the encoding with concatenated keys uses more disk space (and
>> less
>> disk used for the same information means less things to read so it may
>> have
>> a slight impact on read performance too -- it's probably really slight on
>> most
>> usage but nevertheless).
>>
>> > And here's a related question: Why can't Cassandra implement supercolumn
>> > families as regular column families, internally, and give you that
>> > functionality?
>>
>> For the 1) and 2) above, we could deal with those internally fairly easily
>> I
>> think and rather well (which means it wouldn't be much worse
>> performance-wise
>> than with the actual implementaion of super columns, not that it would be
>> better). For 3), range deletes are harder and would require more
>> significant
>> changes (that doesn't mean that Cassandra will never have it). Even
>> without
>> that, there would be the disk space lost.
>>
>> --
>> Sylvain
>>
>>
>

Mime
View raw message