incubator-cassandra-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jonathan Ellis <jbel...@gmail.com>
Subject Re: Symbolizing column names for storage and cache efficiency
Date Sun, 26 Jul 2009 20:46:47 GMT
On Sun, Jul 26, 2009 at 2:28 AM, Evan Weaver<eweaver@gmail.com> wrote:
> Would it be possible to add symbolized column names in a
> forward-compatible way? Maybe scoped per sstable, with the registries
> always kept in memory.

Maybe.  But it's not obvious to me how to do this in general.

The problem is the sparse nature of the column set.  We can't encode
_all_ the columns this way, or in the degenerate case we OOM just
trying to keep the mapping in memory.  Similarly, we can't encode just
the top N column names, since figuring out the top N requires keeping
each name in memory during the counting process.  (Besides slowing
down compaction -- instead of just deserializing columns where there
are keys in common in the merged fragments, we have to deserialize
all.)

ISTM that all we can do is encode the _first_ N column names we see,
which may be useful if the column name set is small for a given CF.

-Jonathan

Mime
View raw message