cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Terje Marthinussen <>
Subject Re: column bloat
Date Tue, 10 May 2011 15:30:34 GMT
> Anyway, to sum that up, expiring columns are 1 byte more and
> non-expiring ones are 7 bytes
> less. Not arguing, it's still fairly verbose, especially with tons of
> very small columns.

Yes, you are right, sorry.
Trying to do one thing to many at the same time.
My brain filtered out part of the "else if".

> > - inherit timestamps from the supercolumn
> Columns inside a supercolumn have no reason to share the same timestamp (or
> even close ones for that matter). But maybe you're talking about something
> more
> subtle, in which case yes there is ways to compress the data.

For a reasonable large amount of use cases (for me, 2 out of 3 at the
moment) supercolumns will be units of data where the columns (attributes)
will never change by themselves or where the data does not change anyway
(archived data).

It would seem like a good optimization to allow a timestamp on the
supercolumn instead and remove the one on columns?

I believe this may also work as an optimization on compactions? Just skip
merging of columns under the supercolumn if the supercolumn has a timestamp
and just replace the entire supercolumn in that case.

Could be just a variation of the supercolumn object on insert. No timestamp,
use the one in the columns, include timestamp, ignore timestamps in columns.

If that sounds like a sensible idea, I may be tempted to try to get time to
implement it.

I am also tempted to do some other things like make some of the "ints" and
"shorts" variable length as well.


View raw message