incubator-cassandra-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Stu Hood" <>
Subject Re: Symbolizing column names for storage and cache efficiency
Date Sun, 26 Jul 2009 21:11:45 GMT
Also, long term, I think it is safe to assume that we will be adding compression for ColumnFamilies,
which should have similar positive effects on cache-ability without too much application specific

-----Original Message-----
From: "Jonathan Ellis" <>
Sent: Sunday, July 26, 2009 4:46pm
Subject: Re: Symbolizing column names for storage and cache efficiency

On Sun, Jul 26, 2009 at 2:28 AM, Evan Weaver<> wrote:
> Would it be possible to add symbolized column names in a
> forward-compatible way? Maybe scoped per sstable, with the registries
> always kept in memory.

Maybe.  But it's not obvious to me how to do this in general.

The problem is the sparse nature of the column set.  We can't encode
_all_ the columns this way, or in the degenerate case we OOM just
trying to keep the mapping in memory.  Similarly, we can't encode just
the top N column names, since figuring out the top N requires keeping
each name in memory during the counting process.  (Besides slowing
down compaction -- instead of just deserializing columns where there
are keys in common in the merged fragments, we have to deserialize

ISTM that all we can do is encode the _first_ N column names we see,
which may be useful if the column name set is small for a given CF.


View raw message