I was thinking more about the excessive (IMO) use of memory in Cassandra due to 8 bytes per column/row (cell) in Cassandra.

Any operation that is idempotent does not require a timestamp.  

For example, set membership.

A link adjacency list is a good example.

If you have a list of source->targets, adding a new member to 'targets' shouldn't require another timestamp because multiple additions end up with the same result (it is idempotent.)

This can be modeled by just adding another column.

The results of ETL jobs that are being bulk loaded back into Cassandra don't require timestamps.  You could create a long running ZK lock to represent each load to prevent multiple writers per key.

In these scenarios, timestamps are just a waste of memory.  It's a significant one as well. For our usage it will require 3-4x more memory to deploy Cassandra… I'm not really jumping at the bit to pay an extra $120-150k per month in hosting costs… though I'm sure my hosting provider would love it :)



Founder/CEO Spinn3r.com

Location: San Francisco, CA
Skype: burtonator

Skype-in: (415) 871-0687