cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Kevin Burton <>
Subject Not all data structures need timestamps (and don't require wasted memory).
Date Sat, 03 Sep 2011 23:00:39 GMT
I was thinking more about the excessive (IMO) use of memory in Cassandra due
to 8 bytes per column/row (cell) in Cassandra.

Any operation that is idempotent does not require a timestamp.

For example, set membership.

A link adjacency list is a good example.

If you have a list of source->targets, adding a new member to 'targets'
shouldn't require another timestamp because multiple additions end up with
the same result (it is idempotent.)

This can be modeled by just adding another column.

The results of ETL jobs that are being bulk loaded back into Cassandra don't
require timestamps.  You could create a long running ZK lock to represent
each load to prevent multiple writers per key.

In these scenarios, timestamps are just a waste of memory.  It's a
significant one as well. For our usage it will require 3-4x more memory to
deploy Cassandra… I'm not really jumping at the bit to pay an extra
$120-150k per month in hosting costs… though I'm sure my hosting provider
would love it :)




Location: *San Francisco, CA*
Skype: *burtonator*

Skype-in: *(415) 871-0687*

View raw message