On Thu, Feb 3, 2011 at 1:33 PM, David Boxenhorn <firstname.lastname@example.org> wrote:
Can I vote for internally implementing supercolumn families as regular column families? (With a smooth upgrade process that doesn't require shutting down a live cluster.)
What if supercolumn families were supported as regular column families + an index (on what used to be supercolumn keys)? Would that solve some problems?
On Thu, Feb 3, 2011 at 2:00 PM, Sylvain Lebresne <email@example.com> wrote:> Is there any advantage to using supercolumns> (columnFamilyName[superColumnName[columnName[val]]]) instead of regular> columns with concatenated keys> (columnFamilyName[superColumnName@columnName[val]])?>> When I designed my data model, I used supercolumns wherever I needed two> levels of key depth - just because they were there, and I figured that they> must be there for a reason.>> Now I see that in 0.7 secondary indexes don't work on supercolumns or> subcolumns (is that right?), which seems to me like a very serious> limitation of supercolumn families.>> It raises the question: Is there anything that supercolumn families are good> for?There is a bunch of queries that you cannot do (or less conveniently) if youencode super columns using regular columns with concatenated keys:1) If you use regular columns with concatenated keys, the count argumentcount simple columns. With super columns it counts super columns. It meansthat you can't do "give me the 10 first super columns of this row".2) If you need to get x super columns by name, you'll have to issue xget_slice query (one of each super column). On the client side it sucks.Internally in Cassandra we could do it reasonably well though.3) You cannot remove entire super columns since there is no support for rangedeletions.Moreover, the encoding with concatenated keys uses more disk space (and lessdisk used for the same information means less things to read so it may havea slight impact on read performance too -- it's probably really slight on mostusage but nevertheless).
> And here's a related question: Why can't Cassandra implement supercolumn> families as regular column families, internally, and give you that> functionality?
For the 1) and 2) above, we could deal with those internally fairly easily Ithink and rather well (which means it wouldn't be much worse performance-wisethan with the actual implementaion of super columns, not that it would bebetter). For 3), range deletes are harder and would require more significantchanges (that doesn't mean that Cassandra will never have it). Even withoutthat, there would be the disk space lost.--Sylvain