cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jonathan Ellis (JIRA)" <>
Subject [jira] [Resolved] (CASSANDRA-4175) Reduce memory, disk space, and cpu usage with a column name/id map
Date Tue, 11 Aug 2015 16:37:48 GMT


Jonathan Ellis resolved CASSANDRA-4175.
       Resolution: Duplicate
         Assignee:     (was: Jason Brown)
    Fix Version/s:     (was: 3.x)

Column name duplication is removed in CASSANDRA-8099.  (See

(We can do slightly better by encoding column ids in the schema, but doing in on a per-sstable
basis is almost as good from a disk space perspective.)

IMO we should leave dealing with highly duplicated column *values* to the compression layer.

> Reduce memory, disk space, and cpu usage with a column name/id map
> ------------------------------------------------------------------
>                 Key: CASSANDRA-4175
>                 URL:
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Jonathan Ellis
>              Labels: performance
> We spend a lot of memory on column names, both transiently (during reads) and more permanently
(in the row cache).  Compression mitigates this on disk but not on the heap.
> The overhead is significant for typical small column values, e.g., ints.
> Even though we intern once we get to the memtable, this affects writes too via very high
allocation rates in the young generation, hence more GC activity.
> Now that CQL3 provides us some guarantees that column names must be defined before they
are inserted, we could create a map of (say) 32-bit int column id, to names, and use that
internally right up until we return a resultset to the client.

This message was sent by Atlassian JIRA

View raw message