incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tamar Fraenkel <ta...@tok-media.com>
Subject compression
Date Sun, 23 Sep 2012 15:29:47 GMT
Hi!
In datastax documentation<http://www.datastax.com/docs/1.0/ddl/column_family>there
is an explanation of what CFs are a good fit for compression:

When to Use Compression

Compression is best suited for column families where there are many rows,
with each row having the same columns, or at least many columns in common.
For example, a column family containing user data such as username, email,
etc., would be a good candidate for compression. The more similar the data
across rows, the greater the compression ratio will be, and the larger the
gain in read performance.

Compression is not as good a fit for column families where each row has a
different set of columns, or where there are just a few very wide rows.
Dynamic column families such as this will not yield good compression ratios.

I have many column families where rows share some of the columns and have
varied number of unique columns per row.
For example, I have a CF where each row has ~13 shared columns, but between
0 to many unique columns. Will such CF be a good fit for compression?

More generally, is there a rule of thumb for how many shared columns (or
percentage of columns which are shared) is considered a good fit for
compression?

Thanks,

*Tamar Fraenkel *
Senior Software Engineer, TOK Media

[image: Inline image 1]

tamar@tok-media.com
Tel:   +972 2 6409736
Mob:  +972 54 8356490
Fax:   +972 2 5612956

Mime
View raw message