2009/10/1 Joe Van Dyk <joe@fixieconsulting.com>
Hi,

How stupid would it be to use cassandra as a permanent datastore?

Say I have a service that tracks clicks on ads running on other sites.
 I'd need to keep track of who clicked what when and where.  And run
reports on it.  Cassandra is attractive because of the built-in
replication and the high write availability


Not stupid at all really.

The individual clicks won't be interesting to anyone, so you'll want to summarise the data after some time (say daily etc). You can store the summaries in something which allows for easier reporting, but only put the individual clicks in Cassandra.

Or you could even store the summaries in Cassandra.

However I'd say, don't use Cassandra until your data get either too big or too much write load for one machine, which typically means > 3Tb and > 1000 inserts/sec

Until you get there, a single (potentially replicated) mysql instance would do it and be far easier to program against.

One of the things Cassandra doesn't have right now is range_remove - I suggested it however and it shouldn't be hard to  implement. A range remove is pretty much vital for audit data (I'm assuming you're using ordered partitioner), otherwise you'd have to execute individual remove commands for every record - which would be a pain and probably very inefficient - much more workload than the inserts.

Mark