cassandra-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Evan Weaver <>
Subject Symbolizing column names for storage and cache efficiency
Date Sun, 26 Jul 2009 07:28:29 GMT
This article about MongoDB is interesting. They
prioritized low barriers to entry in their selection process, and
ignored performance/scaling of any kind.

Aside from that, they mention that for row-oriented storage,
serializing the same string column names to disk for every row is a
big waste of disk and cache space. As far as I know, this affects
Cassandra too.

Would it be possible to add symbolized column names in a
forward-compatible way? Maybe scoped per sstable, with the registries
always kept in memory. Each node could individually make a decision
about whether a column name is duplicated enough to be worth
symbolizing, and apply the transformation in the compaction phase.

Of course there are pitfalls, but it seems like it could be a big boon
to effective cache size in row-oriented applications.


Evan Weaver

View raw message