cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jonathan Ellis (Issue Comment Edited) (JIRA)" <j...@apache.org>
Subject [jira] [Issue Comment Edited] (CASSANDRA-4175) Reduce memory (and disk) space requirements with a column name/id map
Date Fri, 20 Apr 2012 05:35:37 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-4175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13258030#comment-13258030
] 

Jonathan Ellis edited comment on CASSANDRA-4175 at 4/20/12 5:34 AM:
--------------------------------------------------------------------

identityHashCode is basically the object's location in memory, so it's not going to be the
same on different nodes.  (So it would work for approach 2, I suppose, but I'd rather use
a simple int counter.)
                
      was (Author: jbellis):
    Hashcode just isn't designed to be collision-resistant; it prioritizes speed.  Even with
a better (from the standpoint of collisions) general-purpose hash like Murmur, 32bits is just
too small.  The smallest cryptographic hash I know of is md5, and ballooning to 128bits puts
a serious crimp in the potential savings here.
                  
> Reduce memory (and disk) space requirements with a column name/id map
> ---------------------------------------------------------------------
>
>                 Key: CASSANDRA-4175
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-4175
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Jonathan Ellis
>             Fix For: 1.2
>
>
> We spend a lot of memory on column names, both transiently (during reads) and more permanently
(in the row cache).  Compression mitigates this on disk but not on the heap.
> The overhead is significant for typical small column values, e.g., ints.
> Even though we intern once we get to the memtable, this affects writes too via very high
allocation rates in the young generation, hence more GC activity.
> Now that CQL3 provides us some guarantees that column names must be defined before they
are inserted, we could create a map of (say) 32-bit int column id, to names, and use that
internally right up until we return a resultset to the client.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message