I have no clue. I never did it even if I am planning to do so.

1 - Did you just spent 1 month with a cluster in an "unstable" state ? Had you any issue during this time related to the transitional state of your cluster ?

I am currently storing counters with:
row => objectId, column name => date#event, data => counter (date format 20121029).

2 - Is it a good Idea to compress this kind of data ?

I am looking for using composites columns.

3 - What are the benefits of using a column name like "CompositeType(UTF8Type, UTF8Type)" and a simple UTF8 column with event and date separated by a sharp as I am doing right now ?

4 - Would compression be a good idea in this case ?

Thanks for your help on any of these 4 points :).

Alain


2012/10/29 Tamar Fraenkel <tamar@tok-media.com>
Hi!
Thanks Aaron!
Today I restarted Cassandra on that node and ran scrub again, now it is fine.

I am worried though that if I decide to change another CF to use compression I will have that issue again. Any clue how to avoid it?

Thanks.

Tamar Fraenkel 
Senior Software Engineer, TOK Media 






On Wed, Sep 26, 2012 at 3:40 AM, aaron morton <aaron@thelastpickle.com> wrote:
Check the logs on  nodes 2 and 3 to see if the scrub started. The logs on 1 will be a good help with that. 

Cheers

-----------------
Aaron Morton
Freelance Developer
@aaronmorton

On 24/09/2012, at 10:31 PM, Tamar Fraenkel <tamar@tok-media.com> wrote:

Hi!
I ran
UPDATE COLUMN FAMILY cf_name WITH compression_options={sstable_compression:SnappyCompressor, chunk_length_kb:64};

I then ran on all my nodes (3)
sudo nodetool -h localhost scrub tok cf_name

I have replication factor 3. The size of the data on disk was cut in half in the first node and in the jmx I can see that indeed the compression ration is 0.46. But on nodes 2 and 3 nothing happened. In the jmx I can see that compression ratio is 0 and the size of the files of disk stayed the same.

In cli

ColumnFamily: cf_name
      Key Validation Class: org.apache.cassandra.db.marshal.UUIDType
      Default column value validator: org.apache.cassandra.db.marshal.UTF8Type
      Columns sorted by: org.apache.cassandra.db.marshal.CompositeType(org.apache.cassandra.db.marshal.UTF8Type,org.apache.cassandra.db.marshal.UTF8Type)
      Row cache size / save period in seconds / keys to save : 0.0/0/all
      Row Cache Provider: org.apache.cassandra.cache.SerializingCacheProvider
      Key cache size / save period in seconds: 200000.0/14400
      GC grace seconds: 864000
      Compaction min/max thresholds: 4/32
      Read repair chance: 1.0
      Replicate on write: true
      Bloom Filter FP chance: default
      Built indexes: []
      Compaction Strategy: org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy
      Compression Options:
        chunk_length_kb: 64
        sstable_compression: org.apache.cassandra.io.compress.SnappyCompressor

Can anyone help?
Thanks

Tamar Fraenkel 
Senior Software Engineer, TOK Media 

<tokLogo.png>





On Mon, Sep 24, 2012 at 8:37 AM, Tamar Fraenkel <tamar@tok-media.com> wrote:
Thanks all, that helps. Will start with one - two CFs and let you know the effect


Tamar Fraenkel 
Senior Software Engineer, TOK Media 

<tokLogo.png>





On Sun, Sep 23, 2012 at 8:21 PM, Hiller, Dean <Dean.Hiller@nrel.gov> wrote:
As well as your unlimited column names may all have the same prefix, right? Like "accounts".rowkey56, "accounts".rowkey78, etc. etc.  so the "accounts gets a ton of compression then.

Later,
Dean

From: Tyler Hobbs <tyler@datastax.com<mailto:tyler@datastax.com>>
Reply-To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>" <user@cassandra.apache.org<mailto:user@cassandra.apache.org>>
Date: Sunday, September 23, 2012 11:46 AM
To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>" <user@cassandra.apache.org<mailto:user@cassandra.apache.org>>
Subject: Re: compression

 column metadata, you're still likely to get a reasonable amount of compression.  This is especially true if there is some amount of repetition in the column names, values, or TTLs in wide rows.  Compression will almost always be beneficial unless you're already somehow CPU bound or are using large column values that are high in entropy, such as pre-compressed or encrypted data.