cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Terje Marthinussen <>
Subject Re: What is the future of supercolumns ?
Date Sat, 07 Jan 2012 02:35:12 GMT
Please realize that I do not make any decisions here and I am not part of the core Cassandra
developer team.

What has been said before is that they will most likely go away and at least under the hood
be replaced by composite columns.

Jonathan have however stated that he would like the supercolumn API/abstraction to remain
at least for backwards compatibility.

Please understand that under the hood, supercolumns are merely groups of columns serialized
as a single block of data. 

The fact that there is a specialized and hardcoded way to serialize these column groups into
supercolumns is a problem however and they should probably go away to make space for a more
generic implementation allowing more flexible data structures and less code specific for one
special data structure.

Today there are tons of extra code to deal with the slight difference in serialization and
features of supercolumns vs columns and hopefully most of that could go away if things got
structured a bit different.

I also hope that we keep APIs to allow simple access to groups of key/value pairs to simplify
application logic as working with just columns can add a lot of application code which should
not be needed.

If you almost always need all or mostly all of the columns in a supercolumn, and you normally
update all of them at the same time, they will most likely be faster than normal columns.

Processing wise, you will actually do a bit more work on serialization/deserialization of
SC's but the I/O part will usually be better grouped/require less operations.

I think we did some benchmarks on some heavy use cases with ~30 small columns per SC some
time back and I think we ended up with  SCs being 10-20% faster.


On Jan 5, 2012, at 2:37 PM, Aklin_81 wrote:

> I have seen supercolumns usage been discouraged most of the times.
> However sometimes the supercolumns seem to fit the scenario most
> appropriately not only in terms of how the data is stored but also in
> terms of how is it retrieved. Some of the queries supported by SCs are
> uniquely capable of doing the task which no other alternative schema
> could do.(Like recently I asked about getting the equivalent of
> retrieving a list of (full)supercolumns by name, through use of
> composite columns, unfortunately there was no way to do this without
> reading lots of extra columns).
> So I am really confused whether:
> 1. Should I really not use the supercolumns for any case at all,
> however appropriate, or I just need to be just careful while realizing
> that supercolumns fit my use case appropriately or what!?
> 2. Are there any performance concerns with supercolumns even in the
> cases where they are used most appropriately. Like when you need to
> retrieve the entire supercolumns everytime & max. no of subcolumns
> vary between 0-10.
> (I don't write all the subcolumns inside supercolumn, at once though!
> Does this also matter?)
> 3. What is their future? Are they going to be deprecated or may be
> enhanced later?

View raw message